My goals in this talk:
In the academic literature, there is some ambiguity in the definitions of outliers and anomalies. Some authors mean them to be the same and other authors differentiate the two terms. I will follow the latter practice.
An outlier is something sufficiently different from the norm that we notice it.
An anomaly is an outlier of interest to humans.
Let's dive further into general concepts and technical definitions.
The non-technical definition of an anomaly is essentially “I’ll know it when I see it.” This can get muddled at the edges, but works really well because humans are great at pattern matching and picking out things which look dissimilar.
One of the best collections of information about how we process things visually is the Gestalt school of psychology. Their key insight is that our minds apply known and expected patterns to what our eyes see.
This leads to a few key Gestalt principles we can take advantage of.
We naturally fill in gaps and turn partial shapes into whole shapes.
We group things together based on their being inside or outside of a region.
We prefer to see the foreground rather than the background. Exceptions do exist, such as Rubin's vase:
Things which are nearer to each other are considered part of the same grouping, and "abnormal" separation creates new groups in our minds.
We group things together based on color, shape, and size.
We want to follow the smoothest path when viewing lines.
By contrast, this is a discomforting pattern because it breaks continuity.
We perceive ambiguous shapes in as simple a manner as possible. What is this?
Our minds put together that it's a mixture of multiple, slightly overlapping shapes.
We do this because we've never seen a character looking like this, and so don't think of the complex shape as "one" thing.
By contrast...
Because humans are pattern-matchers who try to apply fairly simple heuristics to visual inputs, we tend to see things that aren’t there. People can take advantage of this with optical illusions, but it also lets us make cogent observations.
Our eyes try to fit a line to the scatterplot and tell us direction and magnitude. And they also make us wonder about those two outliers dragging down our best-fit line.
A layman’s concept of anomalies is great, but it is ambiguous. Some things which might look strange actually aren’t anomalous behavior, whereas some anomalies might look reasonable from a first glance.
There are dozens of anomaly detection techniques available to us. Some commonalities among techniques are:
The standard deviation is a calculation of variance in our data.
For normal distributions:
Standard deviation is sensitive to outliers. With each example of an outlier, our standard deviation increases.
With a few outlier data points, we can raise the standard deviation so much that it loses most of its predictive value for catching outliers.
Median Absolute Deviation is a robust statistic: it can handle a limited number of outliers without breaking down.
Even better, outliers in opposite directions cancel each other out.
Suppose we have a trend with an anomalous jump. How do we separate the anomaly increase from the trend?
De-trend: fit the data with a line...
De-trend: fit the data with a line and track the difference from the line.
Changepoint detection looks for abrupt shifts in time series data.
Another common technique is to measure the difference between points and perform statistical analysis on those differences.
We can perform all of the same analyses on deltas that we do on raw values.
Here are a few examples of pre-written packages for anomaly detection:
If you decide to build your own anomaly detection process, check out MathNet.
MathNet is a series of .NET libraries for numerical and statistical analysis.
This allows you to customize the statistical tests to run and generate results very quickly in C# or F# code.
Many of these sorts of tests are one-liners with MathNet.Numerics.
Another alternative is to use anomaly detection within the ML.NET package.
ML.NET is an actively-developed library for machine learning within .NET and supports both F# and C#.
Prep steps in Visual Studio Code or at the command line:
Over the course of this talk, we have looked at the concept of anomalies, some techniques for detecting them, and .NET packages to make it easy.
To learn more, go here:
https://csmore.info/on/anomalies
And for help, contact me:
feasel@catallaxyservices.com | @feaselkl
Catallaxy Services consulting:
https://CSmore.info/on/contact