Skip to content

salarkhannn/AutoML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoML

A comprehensive Streamlit-based Automated Machine Learning (AutoML) platform for classification tasks. Upload your data and let the system guide you through exploration, preprocessing, model training, and report generation—all with an interactive, code-free UI.

Live Demo

Python Streamlit License


Screenshots

Data Upload & Overview

Dataset Overview

Exploratory Data Analysis

Exploratory Analysis

Data Preprocessing

Data Preprocessing

Model Training

Model Training

Results & Reports

Results

AI-Powered Data Insights

AI Assistant


Features

1. Data Upload & Overview

  • Upload CSV files or use the built-in sample dataset.
  • Automatic data profiling: shape, data types, missing values, and class distribution.
  • Select your target column for classification.

2. Exploratory Data Analysis (EDA)

  • Missing Values: Visualize and analyze columns with missing data.
  • Outlier Detection: Identify outliers using IQR or Z-score methods.
  • Correlation Analysis: Interactive correlation heatmaps.
  • Distributions: Histograms for numerical features, bar charts for categorical features.

3. Data Preprocessing

  • Automated Issue Detection: Detects missing values, outliers, high cardinality, class imbalance, and constant features.
  • Interactive Fixes: Choose how to handle each issue (impute, drop, cap, encode, etc.).
  • Scaling: StandardScaler or MinMaxScaler.
  • Encoding: One-Hot or Ordinal encoding for categorical variables.
  • Train/Test Split: Configurable split ratio and random seed.

4. Model Training & Hyperparameter Tuning

  • Multiple Algorithms: Logistic Regression, Random Forest, Gradient Boosting, SVM, KNN, Decision Tree, AdaBoost, and a Baseline model.
  • Hyperparameter Optimization: Grid Search or Randomized Search with cross-validation.
  • Class Imbalance Handling: Optional class weights.
  • Performance Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC, Confusion Matrix, and ROC Curves.

5. Results & Reports

  • Model Comparison: Side-by-side metrics and visualizations.
  • Best Model Selection: Automatic recommendation based on F1-Score.
  • Report Generation: Download comprehensive PDF or HTML reports.

6. AI-Powered Data Insights Chat

  • Conversational Analysis: Ask questions about your data in natural language.
  • Auto-Generated Visualizations: Request charts and the AI will generate Plotly code.
  • Powered by GROQ API: Uses the LLaMA 4 model for intelligent responses.

Project Structure

AutoML/
├── app.py                 # Main Streamlit application
├── requirements.txt       # Python dependencies
├── .env                   # Environment variables (GROQ_API_KEY)
├── sample_data/           # Sample dataset for testing
│   └── loan_classification.csv
└── src/
    ├── __init__.py
    ├── chat.py            # AI chat assistant module (GROQ API)
    ├── data_loader.py     # CSV loading utilities
    ├── eda.py             # Exploratory data analysis functions
    ├── models.py          # ML models and training logic
    ├── preprocessing.py   # Data cleaning and transformation
    └── report.py          # PDF/HTML report generation

Installation

Prerequisites

  • Python 3.8 or higher
  • (Optional) GROQ API key for the AI chat feature

Steps

  1. Clone the repository:

    git clone <repository-url>
    cd AutoML
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. (Optional) Set up GROQ API key for AI Chat: Create a .env file in the project root:

    GROQ_API_KEY=your_api_key_here
    

Usage

  1. Run the application:

    streamlit run app.py
  2. Open your browser and navigate to http://localhost:8501.

  3. Follow the workflow:

    • Upload & Data Info: Load your CSV or use the sample dataset.
    • Exploratory Analysis: Review EDA visualizations.
    • Data Preprocessing: Apply fixes and transformations.
    • Model Training: Select and train models.
    • Results & Reports: Compare models and download reports.
    • Data Insights Chat: Ask questions about your data (requires GROQ API key).

Dependencies

Package Purpose
streamlit Web application framework
pandas Data manipulation
numpy Numerical operations
scikit-learn Machine learning models and preprocessing
matplotlib Static plotting (used internally)
seaborn Statistical visualizations
plotly Interactive charts
fpdf PDF report generation
groq GROQ API client for AI chat
python-dotenv Environment variable management

Configuration

Environment Variables

Variable Description Required
GROQ_API_KEY API key for GROQ (AI Chat feature) Optional

License

This project is licensed under the MIT License.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages