This project predicts whether a flight will be delayed or on time using various Machine Learning algorithms.
We explore flight delay data, perform data preprocessing, train multiple classification models, and evaluate them using accuracy, ROC curves, and feature importance.
pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost
-
Source: Airline on-time performance dataset (Kaggle-https://www.kaggle.com/datasets/ulrikthygepedersen/airlines-delay)
-
Features:
Airline– Airline carrier codeFlight– Flight numberOrigin– Departure airportDestination– Arrival airportDistance– Distance between airportsScheduled_Departure– Scheduled departure time
-
Target:
Delayed– 1 if delayed, 0 if on-time
- Import Libraries
- Exploratory Data Analysis (EDA)
- Summary statistics
- Class distribution check
- Outlier detection
- Correlation heatmaps
- Data Preprocessing
- Handling missing values
- Encoding categorical features
- Normalization/Scaling
- Model Training
- Decision Tree Classifier
- Random Forest Classifier
- Logistic Regression
- XGBoost Classifier
- AdaBoost Classifier
- Model Evaluation
- Accuracy Score
- ROC Curve Visualization
- Feature Importance Plot
| Model | Accuracy |
|---|---|
| Decision Tree | 68% |
| Random Forest | 70% |
| Logistic Regression | 58% |
| XGBoost | 66% |
| AdaBoost | 62% |
- Class Distribution
- Correlation Heatmap
- ROC Curves
- Feature Importance (Random Forest)
jupyter notebook Airlines_Delay_Flight_Prediction_using_ML.ipynb