This Project was from my one of the classes from USC Marshall School of Business, DSO 562 - Fraud Analytics. Credit card fraud is a burden for organizations across the globe. Specifically, $24.26 billion were lost due to credit card fraud worldwide in 2018, according to shiftprocessing.com. In this project, our goal was to build an effective and efficient model to predict transaction fraud.
Project report - Contains the full description of the project Project code - Python Jupyter notebook containing the full code with proper comments with instructions on how to run it in Google colab. Data Quality report - Contains the data visualisations and initial analysis of data for fully understanding the given data. Data Quality report code - Contains the code for the data quality report Dataset - Card transactions data
Dataset Name: Card transactions data Description: This dataset contains the information of the card transactions that have occurred in USA. It contains fields like Card number, merchant number, merchant description, and amount of the transaction. It also contains a fraud label field which tells whether the transaction is good or bad. Time Period: 1 January 2010 – 31 December 2010 No. of Fields: 10 No. of Records: 96,753 Size of Dataset file: 7 MB
We analyzed a real-world dataset that contained a list of government related credit card transactions over the 2010 calendar year. The data presented a supervised problem as it included a column showing the transaction’s fraud label (whether a transaction was fraudulent or not). It also contained identifying information about each transaction such as the credit card number, merchant, merchant state, etc. The dataset had 96,753 records and 10 data fields. We first described and visualized each of the 10 data fields, cleaned the dataset, and filled in missing values. Then we created many variables and performed feature selection. Finally, we created a variety of machine learning models (both linear and nonlinear) and highlighted our results.
Just download the notebook into your system with the dataset and run it.