Skip to content

DeveshShukla23/BCG-Data-Science-Job-Simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Header

Typing SVG


Status Python Scikit-Learn Pandas Jupyter


LinkedIn GitHub


🧠 About This Project

project = {
    "name"        : "BCG X Data Science Job Simulation",
    "platform"    : "Forage",
    "company"     : "BCG X β€” Boston Consulting Group's Tech & Digital Ventures",
    "completed"   : "March 23rd, 2026",
    "client"      : "PowerCo β€” SME Gas & Electricity Provider",
    "tasks"       : 5,
    "dataset"     : "14,606 SME Customers | Energy Consumption & Pricing Data",
    "tools"       : ["Python", "Pandas", "NumPy", "Scikit-Learn", "Matplotlib", "Seaborn"],
    "skills"      : ["EDA", "Feature Engineering", "Random Forest", "ROC-AUC",
                     "Business Hypothesis Testing", "SCQA Framework", "Executive Reporting"],
    "outcome"     : "πŸ† Certificate Issued β€” Data Science Job Simulation"
}

πŸ’‘ "PowerCo thought price was killing their business. I let the data speak β€” and it told a completely different story."


πŸ“Š Key Metrics

πŸ‘₯ Total Customers 🚨 Churned πŸ“‰ Churn Rate πŸ€– Model ROC-AUC
14,606 1,419 9.7% 0.71

πŸ“ Repository Structure

πŸ“¦ BCG-Data-Science-Job-Simulation/
β”‚
β”œβ”€β”€ πŸ““ BCG_Task3_EDA.ipynb                    β†’ Exploratory Data Analysis
β”œβ”€β”€ πŸ““ BCG_Task4_Feature_Engineering.ipynb    β†’ Feature Engineering
β”œβ”€β”€ πŸ““ BCG_Task5_Modeling.ipynb               β†’ Random Forest Modeling
β”œβ”€β”€ πŸ“„ BCG_Executive_Summary.pdf              β†’ SCQA Business Report
β”œβ”€β”€ πŸ“Š churn_distribution.png
β”œβ”€β”€ πŸ“Š consumption_analysis.png
β”œβ”€β”€ πŸ“Š correlation_heatmap.png
β”œβ”€β”€ πŸ“Š feature_importance.png
β”œβ”€β”€ πŸ“Š margin_analysis.png
β”œβ”€β”€ πŸ“Š model_evaluation.png
β”œβ”€β”€ πŸ“Š price_analysis.png
β”œβ”€β”€ πŸ“Š tenure_analysis.png
└── πŸ“„ README.md

⚠️ Note: Raw data files excluded per BCG X confidentiality policy.


πŸ”¬ Executive Summary β€” SCQA Framework

πŸ“ SITUATION
└── PowerCo has a 9.7% churn rate (1,419 of 14,606 customers)
    Churned customers have HIGHER avg margin (€228 vs €185)
    β†’ The business is losing its most valuable clients first

⚠️ COMPLICATION
└── PowerCo hypothesised price sensitivity as the primary churn driver
    Analysis shows price is NOT the key driver
    β†’ Consumption, margin & tenure are stronger predictors

❓ QUESTION
└── Is price sensitivity the primary driver of SME churn at PowerCo?
    Can a targeted discount strategy reduce churn while protecting margins?

βœ… ANSWER
└── Random Forest model (ROC-AUC: 0.71) identifies at-risk customers
    Offer 20% discounts ONLY to high-margin, high-consumption customers
    Focus on early-tenure customers (1–3 yrs) β€” ~27% churn rate
    Refine model recall before full rollout

🎯 Task 3 β€” Exploratory Data Analysis

Methodology

Step Action
🧹 Data Cleaning Handled missing values, outliers, data type conversions
πŸ“Š Churn Analysis Distribution of churned vs retained across all features
⚑ Consumption Electricity & gas usage patterns by churn status
πŸ’° Price Analysis Off-peak variable & fixed price distributions
πŸ“ˆ Margin Analysis Net margin comparison β€” churned vs retained
πŸ“… Tenure Analysis Churn rate by years with PowerCo
πŸ”₯ Correlation Feature correlation heatmap

πŸ” Key EDA Findings

πŸ“Œ 9.7% churn rate          β†’ 1,419 of 14,606 customers churned
πŸ“Œ Churned avg margin €228  β†’ vs €185 retained β€” losing best clients!
πŸ“Œ New customers (1-2 yrs)  β†’ ~27% churn rate β€” 3x the average
πŸ“Œ Price distributions      β†’ nearly identical for churned vs retained
πŸ“Œ Consumption patterns     β†’ clearly differ between churned & retained
πŸ“Œ cons_12m & cons_last_month β†’ 0.97 correlation β€” highly related

πŸ“Έ EDA Visualizations

Churn Distribution Consumption Analysis Price Analysis Margin Analysis Tenure Analysis Correlation Heatmap


βš™οΈ Task 4 β€” Feature Engineering

Methodology

Step 1 β†’ Price variability features (off-peak vs peak mean differences)
Step 2 β†’ Tenure-based features (months active, months since product change)
Step 3 β†’ Consumption ratio features (last month vs 12-month avg)
Step 4 β†’ Margin-based features (gross vs net power electricity margin)
Step 5 β†’ Final feature selection β†’ exported as final_features.csv

Key Engineered Features

Feature Description
off_peak_peak_var_mean_diff Price variability between off-peak & peak periods
off_peak_mid_peak_var_mean_diff Price variability between off-peak & mid-peak
months_activ Number of months customer has been active
months_modif_prod Months since last product modification
var_year_price_off_peak Year-on-year off-peak price change

πŸ€– Task 5 β€” Modeling & Evaluation

Methodology

Step 1 β†’ Train/Test Split (80/20)
Step 2 β†’ Handle class imbalance
Step 3 β†’ Random Forest Classifier training
Step 4 β†’ ROC-AUC evaluation
Step 5 β†’ Feature importance extraction
Step 6 β†’ Business interpretation of results

πŸ“Š Model Results

Metric Score
ROC-AUC 0.706
True Negatives (Correctly Retained) 2,635
False Negatives (Missed Churners) 260

πŸ† Top 15 Feature Importances

Rank Feature Importance Score
πŸ₯‡ 1 cons_12m β€” 12-month electricity consumption 0.0525
πŸ₯ˆ 2 margin_net_pow_ele β€” Net power electricity margin 0.0524
πŸ₯‰ 3 margin_gross_pow_ele β€” Gross power electricity margin 0.0519
4 forecast_meter_rent_12m β€” Forecasted meter rent 0.0502
5 net_margin β€” Overall net margin 0.0448
6 forecast_cons_12m β€” Forecasted consumption 0.0440
7 cons_last_month β€” Last month consumption 0.0372
8 pow_max β€” Max power subscribed 0.0333
9 months_activ β€” Months active 0.0330
10 months_modif_prod β€” Months since product change 0.0312

πŸ”‘ Price features ranked well below consumption & margin β€” confirming price is NOT the primary churn driver!

πŸ“Έ Model Visualizations

Feature Importance Model Evaluation


πŸ’‘ Business Recommendations

# Recommendation Data Behind It
1️⃣ Do NOT apply blanket 20% discounts Price is NOT the primary churn driver
2️⃣ Target high-margin + high-consumption customers Top features in Random Forest model
3️⃣ Focus on 1–3 year tenure customers ~27% churn rate β€” 3x the average
4️⃣ Improve model recall before full rollout Current model misses some churners
5️⃣ Use RF model to proactively flag at-risk customers ROC-AUC: 0.71 β€” good discriminatory power

πŸ› οΈ Tech Stack

Python Pandas NumPy Scikit-Learn Matplotlib Seaborn Jupyter


πŸ’Ό Skills Demonstrated

Analytics Machine Learning Business
βœ… End-to-end EDA βœ… Random Forest Classifier βœ… SCQA Framework Reporting
βœ… Feature Engineering βœ… ROC-AUC Evaluation βœ… Executive Summary Writing
βœ… Data Visualisation βœ… Feature Importance Analysis βœ… Hypothesis Testing
βœ… Correlation Analysis βœ… Class Imbalance Handling βœ… Business Recommendations

πŸ† Certificate of Completion

Field Details
πŸ… Certificate Data Science Job Simulation
🏒 Issued By BCG X via Forage
πŸ“… Completed March 23rd, 2026
πŸ”— View Certificate Click Here

πŸ‘¨β€πŸ’» Author

Devesh Shukla Data Analyst | ML Enthusiast | Insight Storyteller

LinkedIn GitHub Email


⭐ If you find this useful, please give it a star! ⭐

Footer

About

πŸ“Š 14,606 customers | 9.7% churn | One hypothesis to test | Built end-to-end ML pipeline to uncover real churn drivers for PowerCo | BCG X Data Science Job Simulation on Forage

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors