Skip to content

Harshi06-code/fraud-detection-monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Financial Fraud Detection & Production Monitoring System

A production-grade ML system that detects credit card fraud and monitors model performance degradation over time — built to simulate real-world MLOps workflows.

Live Demo

Run locally with streamlit run app/streamlit_app.py

Live Demo

Watch Demo Video: https://youtu.be/aPIeh-j6ELk

Problem Statement

Credit card fraud detection is a critical imbalanced classification problem. With only 0.17% fraud rate (492 of 284,807 transactions), standard accuracy metrics are meaningless — a model predicting "legit" for everything achieves 99.83% accuracy while catching zero fraud. This project solves that.

Results

Model PR-AUC Precision Recall
Logistic Regression (baseline) 0.743 0.83 0.64
LR + SMOTE 0.725 0.06 0.92
XGBoost + Optuna (final) 0.885 0.87 0.85

Tech Stack

Layer Tool Why
Core ML XGBoost Best performance on tabular imbalanced data
Tuning Optuna Efficient hyperparameter search vs GridSearch
Imbalance scale_pos_weight=577 Native XGBoost handling, no oversampling artifacts
Explainability SHAP Feature-level explanations for regulatory compliance
Experiment Tracking MLflow Reproducible runs, parameter logging
Drift Detection Evidently AI + PSI Production distribution shift monitoring
Dashboard Streamlit Interactive 5-page monitoring interface

Key Technical Decisions

Why PR-AUC over ROC-AUC? ROC-AUC is optimistic under class imbalance. PR-AUC focuses on the minority class (fraud) and better reflects real-world detection performance.

Why XGBoost over neural networks? Tabular data with 30 features doesn't benefit from deep learning. XGBoost with scale_pos_weight handles imbalance natively and trains in seconds vs hours.

Why Optuna over GridSearch? Optuna uses TPE (Tree-structured Parzen Estimator) sampling — smarter than exhaustive grid search for continuous hyperparameter spaces.

Why scale_pos_weight over SMOTE? SMOTE increased recall to 92% but collapsed precision to 6% — 94% of fraud alerts became false positives. scale_pos_weight achieved both high precision (87%) and recall (85%) without generating synthetic samples.

Project Structure

fraud-detection-monitor/
├── notebooks/
│   ├── 01_eda.ipynb                    # Exploratory analysis, 3 hypotheses
│   ├── 02_baseline.ipynb               # Logistic regression baseline
│   ├── 03_xgboost_mlflow.ipynb         # XGBoost + Optuna + MLflow
│   ├── 04_shap_explainability.ipynb    # SHAP feature importance
│   ├── 05_drift_detection.ipynb        # PSI drift detection
│   └── 06_evidently.ipynb              # Evidently AI reports
├── app/
│   └── streamlit_app.py                # 5-page monitoring dashboard
├── reports/                            # Generated plots and HTML reports
├── requirements.txt
└── README.md

Key Findings

  • V14 is the dominant fraud signal (SHAP=2.57) — when V14 drops below -6, fraud probability increases sharply
  • Fraud is disproportionate at night — fraud rate per transaction is 2x higher between 0-4AM despite lower total volume
  • V3 shows catastrophic drift by Week 4 (PSI=1.55) — would trigger automatic retraining alert in production
  • Model remains stable despite drift — XGBoost compensates using V14 and V4 when V3 drifts

Running Locally

git clone https://github.com/Harshi06-code/fraud-detection-monitor.git
cd fraud-detection-monitor
pip install -r requirements.txt
# Add creditcard.csv to data/ folder from Kaggle
streamlit run app/streamlit_app.py

Dataset

Credit Card Fraud Detection — 284,807 transactions, 492 fraud cases (0.17%)

Author

Harshitha | B.Tech Computer Science | Amrita Vishwa Vidyapeetham

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors