Skip to content

ritesh-gupta-git/AI-Powered-Vulnerability-Management

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-VMF: AI-Powered Vulnerability Management Framework

This repository contains a self-contained, runnable demo of an AI-powered vulnerability risk prioritization workflow. It mirrors a production pipeline that blends exploit likelihood prediction with business context, producing an explainable risk score and a prioritized remediation list.

TL;DR: Run a single script to generate a realistic dataset, train an ensemble (RF + GBDT + MLP) over tabular+text features, compute exploit probabilities, blend them with CVSS/criticality/exposure, and emit a prioritized CSV and ROC curve.


Quick Start

1) Clone and install

git clone https://github.com/ritesh-gupta-git/AI-Powered-Vulnerability-Management
cd ai-vmf-demo
python -m venv .venv && source .venv/bin/activate   # on Windows: .venv\Scripts\activate
pip install -r requirements.txt

2) Run

python ai_vmf_demo.py

The script will generate a timestamped CSV under the project directory, e.g.:

ai_vmf_prioritized_YYYYMMDD_HHMMSS.csv

It will also display a ROC curve window (if a GUI is available).


What it does

  1. Data synthesis (stand-in for a live ingest)

    • Builds ~900 synthetic vulnerability records with:
      • CVSS, exploit complexity, required privileges, PoC flag, internet exposure, asset criticality, days since disclosure
      • Free-text description (TF‑IDF features)
    • Simulates a 30‑day exploitation label using an intuitive latent function (PoC, low complexity, no privs, high CVSS, internet-exposed, recent).
  2. Features (sklearn ColumnTransformer)

    • NumericStandardScaler
    • CategoricalOneHotEncoder
    • TextTfidfVectorizer(min_df=2, ngram_range=(1,2))
  3. Models + Ensemble

    • Trains RandomForest, GradientBoosting, and a small MLP (sklearn) and combines them via soft voting.
    • Produces pred_exploit_prob for each vulnerability.
  4. Risk Scoring (explainable)
    risk = 0.60*prob + 0.25*cvss_norm + 0.10*criticality_norm + 0.05*internet_exposed

  5. Outputs

    • Prioritized CSV: sorted by risk_score with key fields for triage.
    • ROC curve and printed metrics (ROC‑AUC, accuracy, precision/recall/F1).

Why this is useful

  • Actionable: moves beyond raw CVSS by incorporating likelihood and asset context.
  • Explainable: deterministic, auditable transforms and a transparent risk formula.
  • Extensible: swap TF‑IDF for BERT, add SHAP, integrate real data feeds (NVD, Exploit‑DB), or wire into ticketing.

Repo Structure

ai-vmf-demo/
├── ai_vmf_demo.py        # main script: data → model → risk → CSV
├── requirements.txt
├── LICENSE               # MIT
├── .gitignore
└── README.md

Reproducibility

  • Uses Pipeline/ColumnTransformer so train and inference paths are identical.
  • Sets random seeds where appropriate.
  • Weights for the risk formula are explicitly documented and easily adjustable.

Notes

  • The data is synthetic and safe to publish.
  • For headless servers, matplotlib may need a non-interactive backend; or simply ignore/disable plotting.
  • To integrate with real data sources, replace the data synthesis function with NVD/Exploit‑DB/Nessus feeds, keeping the same feature/model pipeline.

License

MIT — see LICENSE. "# AI-Powered-Vulnerability-Management"