Project Overview
This project, NetReaper, is a machine learning security hackathon project focused on developing an adversarial-resilient intrusion detection system using the NSL-KDD dataset. It employs a three-stage pipeline complemented by an interactive Streamlit dashboard for visualization and analysis.
Architecture and Workflow
- Stage A: Baseline Classification: Implemented in stage_a_baseline.py, this stage establishes a foundational intrusion classification model. Outputs include accuracy metrics, confusion matrices, ROC curves, and feature importance charts.
- Stage B: Adversarial Simulation: This stage, handled by attack_simulation.py and run_stage_b_attack.py, simulates adversarial attacks to test the resilience of the detection system. It can operate in dummy or real-data modes, supporting models like Random Forest (rf) and XGBoost (xgb). The output includes adversarial samples (X_adversarial_samples.csv).
- Stage C: Anomaly Safety Net: Utilizing stage_c_anomaly.py and stage_c_test.py, this stage implements an Isolation Forest model to act as a safety net, detecting anomalies. It generates a trained model (stage_c_isolation_forest.pkl) and SHAP summary plots (stage_c_shap_summary.png).
- Streamlit Dashboard: The app.py script powers an interactive dashboard that integrates the outputs from all stages. The current version includes cached data and model artifacts for efficiency, a unified adversarial path, a tabbed layout for different views (Dashboard, Detailed Analysis, Raw Data), and metric deltas to show attack impact and recovery.
Dataset and Setup
The project utilizes the KDDTrain_with_headers.csv dataset, with the 'label' column as the target. Setup involves installing dependencies via pip install -r requirements.txt.
Integration and Conventions
The anomaly convention in Stage C is -1 for anomaly and 1 for normal. The final defensive decision rule classifies an event as an attack if the classifier predicts an attack (1) OR the anomaly detector flags it as an anomaly (-1).
Documentation
Additional documentation is available in PROJECT_EXPLANATION.md, context.md, and handover.md, providing technical details, implementation status, and handover information.