Abstract

Forensic accountants and fraud examiners use a range of techniques to uncover fraudulent journal entries and illegal activities. As data professionals, most of us will never unravel a Bernie Madoff scheme, but we can apply these same techniques in our own environments to uncover dirty data. This session will use a combination of SQL Server and R to apply these fraud detection techniques, which include Benford's Law, outlier analysis, time series analysis, and cohort analysis.


Slides

The slides are available in HTML 5 format. All modern browsers (including tablets and phones) should be able to navigate the slides successfully.

The slides are licensed under Creative Commons Attribution-ShareAlike.


Demo Code

The demonstration code is available on my GitHub repository. This includes all of the SQL and R code, as well as data sources used in demos. All demos are in Jupyter notebook form.

The source code is licensed under the terms offered by the GPL. The slides are licensed under Creative Commons Attribution-ShareAlike.


Additional Media

On May 3, 2017, I gave a version of this talk at 24 Hours of PASS. You can get the recording and slides notes on the event website.

On November 5, 2017, I recorded a full-length version of this talk, which you can find on my YouTube channel.


Links And Further Information

Jupyter Notebooks

If you are not familiar with Jupyter notebooks, I have a guide on how to install Jupyter on Windows. This will allow you to try the notebooks out on your own.

Data Sets

Here are links for the individual data sets used in this talk. Please note that the versions I have archived are for specific points in time, so the data format may change, and the data itself will likely have changed. For the actual data sets I use, check out the demo code above.

Wake County Transportation Fraud

Benford's Law

Benford's Law is one of the most interesting numeral analysis findings, given its wide-ranging and unexpected appearances in data sets.

Fraud Detection Techniques

Although I did not use most of the techniques in this section, I want to provide links for people interested in learning more.

SQL Server Techniques

Other Fun Topics