This repository contains a self-contained, runnable demo of an AI-powered vulnerability risk prioritization workflow. It mirrors a production pipeline that blends exploit likelihood prediction with business context, producing an explainable risk score and a prioritized remediation list.
TL;DR: Run a single script to generate a realistic dataset, train an ensemble (RF + GBDT + MLP) over tabular+text features, compute exploit probabilities, blend them with CVSS/criticality/exposure, and emit a prioritized CSV and ROC curve.
git clone https://github.com/ritesh-gupta-git/AI-Powered-Vulnerability-Management
cd ai-vmf-demo
python -m venv .venv && source .venv/bin/activate # on Windows: .venv\Scripts\activate
pip install -r requirements.txtpython ai_vmf_demo.pyThe script will generate a timestamped CSV under the project directory, e.g.:
ai_vmf_prioritized_YYYYMMDD_HHMMSS.csv
It will also display a ROC curve window (if a GUI is available).
-
Data synthesis (stand-in for a live ingest)
- Builds ~900 synthetic vulnerability records with:
- CVSS, exploit complexity, required privileges, PoC flag, internet exposure, asset criticality, days since disclosure
- Free-text description (TF‑IDF features)
- Simulates a 30‑day exploitation label using an intuitive latent function (PoC, low complexity, no privs, high CVSS, internet-exposed, recent).
- Builds ~900 synthetic vulnerability records with:
-
Features (sklearn
ColumnTransformer)- Numeric →
StandardScaler - Categorical →
OneHotEncoder - Text →
TfidfVectorizer(min_df=2, ngram_range=(1,2))
- Numeric →
-
Models + Ensemble
- Trains RandomForest, GradientBoosting, and a small MLP (sklearn) and combines them via soft voting.
- Produces
pred_exploit_probfor each vulnerability.
-
Risk Scoring (explainable)
risk = 0.60*prob + 0.25*cvss_norm + 0.10*criticality_norm + 0.05*internet_exposed -
Outputs
- Prioritized CSV: sorted by
risk_scorewith key fields for triage. - ROC curve and printed metrics (ROC‑AUC, accuracy, precision/recall/F1).
- Prioritized CSV: sorted by
- Actionable: moves beyond raw CVSS by incorporating likelihood and asset context.
- Explainable: deterministic, auditable transforms and a transparent risk formula.
- Extensible: swap TF‑IDF for BERT, add SHAP, integrate real data feeds (NVD, Exploit‑DB), or wire into ticketing.
ai-vmf-demo/
├── ai_vmf_demo.py # main script: data → model → risk → CSV
├── requirements.txt
├── LICENSE # MIT
├── .gitignore
└── README.md
- Uses
Pipeline/ColumnTransformerso train and inference paths are identical. - Sets random seeds where appropriate.
- Weights for the risk formula are explicitly documented and easily adjustable.
- The data is synthetic and safe to publish.
- For headless servers, matplotlib may need a non-interactive backend; or simply ignore/disable plotting.
- To integrate with real data sources, replace the data synthesis function with NVD/Exploit‑DB/Nessus feeds, keeping the same feature/model pipeline.