A comprehensive Streamlit-based Automated Machine Learning (AutoML) platform for classification tasks. Upload your data and let the system guide you through exploration, preprocessing, model training, and report generation—all with an interactive, code-free UI.
- Upload CSV files or use the built-in sample dataset.
- Automatic data profiling: shape, data types, missing values, and class distribution.
- Select your target column for classification.
- Missing Values: Visualize and analyze columns with missing data.
- Outlier Detection: Identify outliers using IQR or Z-score methods.
- Correlation Analysis: Interactive correlation heatmaps.
- Distributions: Histograms for numerical features, bar charts for categorical features.
- Automated Issue Detection: Detects missing values, outliers, high cardinality, class imbalance, and constant features.
- Interactive Fixes: Choose how to handle each issue (impute, drop, cap, encode, etc.).
- Scaling: StandardScaler or MinMaxScaler.
- Encoding: One-Hot or Ordinal encoding for categorical variables.
- Train/Test Split: Configurable split ratio and random seed.
- Multiple Algorithms: Logistic Regression, Random Forest, Gradient Boosting, SVM, KNN, Decision Tree, AdaBoost, and a Baseline model.
- Hyperparameter Optimization: Grid Search or Randomized Search with cross-validation.
- Class Imbalance Handling: Optional class weights.
- Performance Metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC, Confusion Matrix, and ROC Curves.
- Model Comparison: Side-by-side metrics and visualizations.
- Best Model Selection: Automatic recommendation based on F1-Score.
- Report Generation: Download comprehensive PDF or HTML reports.
- Conversational Analysis: Ask questions about your data in natural language.
- Auto-Generated Visualizations: Request charts and the AI will generate Plotly code.
- Powered by GROQ API: Uses the LLaMA 4 model for intelligent responses.
AutoML/
├── app.py # Main Streamlit application
├── requirements.txt # Python dependencies
├── .env # Environment variables (GROQ_API_KEY)
├── sample_data/ # Sample dataset for testing
│ └── loan_classification.csv
└── src/
├── __init__.py
├── chat.py # AI chat assistant module (GROQ API)
├── data_loader.py # CSV loading utilities
├── eda.py # Exploratory data analysis functions
├── models.py # ML models and training logic
├── preprocessing.py # Data cleaning and transformation
└── report.py # PDF/HTML report generation
- Python 3.8 or higher
- (Optional) GROQ API key for the AI chat feature
-
Clone the repository:
git clone <repository-url> cd AutoML
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
(Optional) Set up GROQ API key for AI Chat: Create a
.envfile in the project root:GROQ_API_KEY=your_api_key_here
-
Run the application:
streamlit run app.py
-
Open your browser and navigate to
http://localhost:8501. -
Follow the workflow:
- Upload & Data Info: Load your CSV or use the sample dataset.
- Exploratory Analysis: Review EDA visualizations.
- Data Preprocessing: Apply fixes and transformations.
- Model Training: Select and train models.
- Results & Reports: Compare models and download reports.
- Data Insights Chat: Ask questions about your data (requires GROQ API key).
| Package | Purpose |
|---|---|
streamlit |
Web application framework |
pandas |
Data manipulation |
numpy |
Numerical operations |
scikit-learn |
Machine learning models and preprocessing |
matplotlib |
Static plotting (used internally) |
seaborn |
Statistical visualizations |
plotly |
Interactive charts |
fpdf |
PDF report generation |
groq |
GROQ API client for AI chat |
python-dotenv |
Environment variable management |
| Variable | Description | Required |
|---|---|---|
GROQ_API_KEY |
API key for GROQ (AI Chat feature) | Optional |
This project is licensed under the MIT License.





