Skip to content

fparismusic/ECG_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECG Arrhythmia Classification

MIT-BIH Dataset · Feature Engineering · Data Balancing · Augmentation · SVM Classifier

This project implements a complete classification pipeline for ECG heartbeat signals using the MIT-BIH Arrhythmia dataset.
The goal is to classify heartbeats into five arrhythmia types using rhythmic features, data balancing, augmentation techniques, and an SVM classifier with an RBF kernel.


Dataset Overview

The dataset contains ECG beats labeled into 5 classes:

  • N — Normal
  • S — Fusion of paced and normal beats
  • V — Premature Ventricular Contraction
  • F — Atrial premature contraction
  • Q — Fusion of ventricular and normal beats

Each row contains 187 ECG samples + 1 class label.

Dataset size:

  • Training set: 87,553 samples
  • Test set: 21,891 samples

Dataset Exploration

Initial operations included:

  1. Loading the CSV datasets using Pandas
  2. Inspecting structure and label distribution
  3. Visualizing ECG waveforms
  4. Identifying severe class imbalance (class N dominates)

Waveform visualization was performed using Matplotlib by sampling signals at fixed intervals.


Subsampling

Due to computational constraints, 10% of the original training and testing sets were extracted using:

train_test_split(..., stratify=labels)

Resulting shapes:

  • Train: 8755 × 188
  • Test: 2189 × 188

Preprocessing

Normalization

Signals were normalized with MinMaxScaler (-1, 1):

  • fit_transform() → training set
  • transform() → test set

Feature/Label Separation

The last column was split from the ECG samples.


Feature Extraction

Feature extraction was performed through the compute_feature_vector() function, producing rhythmic and statistical descriptors.

Extracted features:

Statistical features

  • Mean
  • Standard deviation

Zero-Crossing Rate (ZCR)

  • Frame-wise ZCR
  • Mean of ZCR
  • Standard deviation of ZCR

Short-Time Fourier Transform (STFT)

  • Magnitude spectrum

Spectral Flux

  • Spectral flux using librosa.onset.onset_strength
  • Mean and standard deviation

All features (statistics + frame features) are concatenated into a single vector.

Final shapes:

  • Train feature matrix: (8755, 30)
  • Test feature matrix: (2189, 30)

Dataset Balancing

To remove bias toward class N, a perfectly balanced dataset was created:

  • Train: 641 samples per class → 3205 samples
  • Test: 162 samples per class → 810 samples

This was done via random sampling for each class.


Data Augmentation

Two augmentation techniques were implemented:

1. Stretch

Non-linear resampling and padding/truncation to 187 samples.

2. Amplify

Amplitude scaling with a factor α ∈ [-0.5, 0.5].

Augmentation strategy

The perform() method applies stretch and/or amplify with 50% probability.

For each class:

  • 100 augmented samples were generated
  • Added to the balanced dataset

Final augmented training size:

  • 3705 samples

Normalization was re-applied afterward.


SVM Classifier

Classifier setup:

  • Kernel: RBF
  • C: 10 (initial run)

Performance (balanced dataset):

  • Test accuracy: 67.53%
  • Train accuracy: 65.75%
    → No overfitting.

Evaluation Metrics

Classification Report (balanced test set — 810 samples)

Class Precision Recall F1-score
N 0.59 0.69 0.64
S 0.84 0.67 0.75
V 0.80 0.71 0.75
F 0.79 0.89 0.84
Q 0.89 0.91 0.90
  • Overall accuracy: 78%
  • Macro F1: 0.78

Compared to the original imbalanced dataset, minority class performance improved dramatically.


Hyperparameter Tuning

Several C values were tested:

C Accuracy Macro F1 Macro Recall False Negatives
0.1 67.53% 0.6666 0.6753 263
1 72.72% 0.7252 0.7272 221
100 79.26% 0.7926 0.7926 168
1000 80.00% 0.8003 0.8000 162
10000 77.28% 0.7732 0.7728 184
100000 77.04% 0.7722 0.7704 186

Best configuration:

  • Kernel: RBF
  • C: 1000
  • Gamma: auto
  • N: 64
  • H: 8
  • Accuracy: 80.49%

Final Notes and Future Work

Key takeaways

  • Balancing the dataset is crucial.
  • Rhythmic features significantly help distinguish arrhythmias.
  • Augmentation improves generalization.
  • SVM with RBF delivers robust results (~80%).

Potential improvements

  • 1D CNNs for automatic feature extraction
  • Additional augmentations (noise, shifts, warping)
  • PCA or feature selection
  • Class-weighted SVM or focal loss

About

The main topic of this project is ECG classification based on rhythmic features.

Topics

Resources

License

Stars

Watchers

Forks