Java-Based Fraud Detection With Spark MLlib

In this post, we are going to develop an algorithm in Java using Spark MLlib. The full working code can be download from GitHub. It is possible to run the code with several different configurations and experiments on your own without deep Java knowledge (using the configuration file).

In a previous post, we implemented the same anomaly detection algorithm using Octave. We filtered out 500,000 records (only of type TRANSFER) from seven million to investigate and get insight into the available data. Also, several graphs were plotted to show what the data and anomalies (frauds) look like. Since Octave loads all the data in-memory, it has limitations for large data. For this reason, we’ll use Spark to run anomaly detection on a larger dataset of seven million.

Read full news article on Dzone

 


Date:

Categorie(s):