Finding Ghosts in Your Data

Finding Ghosts in Your Data cover


Finding Ghosts in Your Data takes you through the field of anomaly detection, building an outlier detection engine along the way. From the book's description:

Discover key information buried in the noise of data by learning a variety of anomaly detection techniques and using the Python programming language to build a robust service for anomaly detection against a variety of data types. The book starts with an overview of what anomalies and outliers are and uses the Gestalt school of psychology to explain just why it is that humans are naturally great at detecting anomalies. From there, you will move into technical definitions of anomalies, moving beyond "I know it when I see it" to defining things in a way that computers can understand.

The core of the book involves building a robust, deployable anomaly detection service in Python. You will start with a simple anomaly detection service, which will expand over the course of the book to include a variety of valuable anomaly detection techniques, covering descriptive statistics, clustering, and time series scenarios. Finally, you will compare your anomaly detection service head-to-head with a publicly available cloud offering and see how they perform.

The anomaly detection techniques and examples in this book combine psychology, statistics, mathematics, and Python programming in a way that is easily accessible to software developers. They give you an understanding of what anomalies are and why you are naturally a gifted anomaly detector. Then, they help you to translate your human techniques into algorithms that can be used to program computers to automate the process. You’ll develop your own anomaly detection service, extend it using a variety of techniques such as including clustering techniques for multivariate analysis and time series techniques for observing data over time, and compare your service head-on against a commercial service.
Order a Copy Order from Amazon


Here are the key topics in the book.

Anomalies and Outliers

Understand what anomalies and outliers are and how they differ.

Pattern Matching

Learn how we are natural pattern matchers.

Formal Definitions

Go beyond "I'll know it when I see it" for detecting outliers.

Building a Framework

Create an API for outlier detection.

Build a Test Suite

Code is only as good as the tests which support it!

Create Univariate Checks

Add the first outlier detection methods.

Extend the Univariate Ensemble

Support a broader class of problem and get better results with ensembles.

Visualize the Results

Humans are visual interpreters--make life easier for them.

Clustering Problems

Learn how to use clustering to approach outlier detection.


Implement Connectivity-Based Outlier Factor.


Extend COF with another useful algorithm.


Go beyond clusters and incorporate copulas.

Time Series

Understand how time series outlier detection differs from non-time series.

Change Point Detection

Learn how to spot when a time series changes.

Multi-Series Time Series

Compare time series to one another.


Implement a simple technique for multi-series time series comparison.


Extend multi-series comparisons with Symbolic Aggregate Approximation.

Stacking up to the Competition

Learn about the state of the art for cloud-based outlier detection.

The Bake-Off

Compare our outlier detection engine to the Azure Cognitive Services Anomaly Detector.