Skip to content

DhanushHR31/Data-Analysis-Platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Data-Analysis-Platform

Data Analysis Platform Data Analysis Platform - Streamlit Application Project Overview Data Analysis Platform is a comprehensive web-based tool for performing end-to-end data analysis tasks. Built with Python and Streamlit, it provides an intuitive interface for data preprocessing, clustering, classification, and visualization without requiring any coding knowledge.

Key Features Core Capabilities 📊 Data Management: Upload, view, and edit datasets

🧹 Data Preprocessing: Filter, clean, and prepare data for analysis

🧩 Clustering: K-means clustering with visualization

🧠 Classification: Multiple algorithms with detailed evaluation metrics

📈 Visualization: Interactive charts and dashboards

Technical Highlights Machine Learning Integration: Scikit-learn for clustering and classification

Comprehensive Metrics: 8+ evaluation metrics for classification models

Interactive UI: Streamlit-based interface with real-time updates

Data Exploration: Dynamic filtering and visualization tools

Technology Stack Backend Python Data Ecosystem:

Pandas (Data manipulation)

NumPy (Numerical computing)

Scikit-learn (Machine learning)

Visualization:

Matplotlib

Seaborn

Streamlit native charts

Frontend Streamlit: Interactive web interface

Dynamic Components:

Data editors

Interactive sliders

Multi-select filters

Installation Guide Prerequisites Python 3.8+

pip package manager

Setup Steps Clone repository:

bash git clone https://github.com/yourrepo/data-analysis-platform.git cd data-analysis-platform Create virtual environment:

bash python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate Install dependencies:

bash pip install -r requirements.txt Run application:

bash streamlit run app.py Usage Guide Workflow Upload Data: Start by uploading your CSV file

Preprocess: Filter and clean your data

Analyze:

Perform clustering to find patterns

Build classification models

Visualize: Explore through interactive dashboards

Export: Save your processed dataset

Key Functions Data Filtering: Slider controls for numeric columns, multi-select for categorical

Clustering: Adjustable number of clusters with visual output

Classification: 4 algorithm choices with detailed performance metrics

Dashboard: Multiple chart types for data exploration

Architecture text [Data Input] │ ▼ [Preprocessing]───▶[Clustering]───▶[Visualization] │ │ ▼ ▼ [Classification] [Dashboard] │ ▼ [Evaluation Metrics] Configuration The application requires no configuration files. All settings are adjustable through the UI:

Clustering: Adjust number of clusters (2-10)

Classification: Choose from 4 algorithms

Visualization: Select columns and chart types

Performance Typical performance metrics:

Operation Time Complexity Notes Data Loading O(n) Scales with file size Clustering O(nki) n=samples, k=clusters, i=iterations Classification Varies by algorithm Random Forest typically slowest License MIT License

Roadmap Planned Features Additional clustering algorithms (DBSCAN, Hierarchical)

Regression analysis capabilities

Dimensionality reduction (PCA, t-SNE)

Automated feature engineering

Export capabilities for models and visualizations

Research Areas Integration with AutoML solutions

Custom model deployment

Big data handling optimizations

About

Data Analysis Platform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages