You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
β οΈNote: Raw data files excluded per BCG X confidentiality policy.
π¬ Executive Summary β SCQA Framework
π SITUATION
βββ PowerCo has a 9.7% churn rate (1,419 of 14,606 customers)
Churned customers have HIGHER avg margin (β¬228 vs β¬185)
β The business is losing its most valuable clients first
β οΈ COMPLICATION
βββ PowerCo hypothesised price sensitivity as the primary churn driver
Analysis shows price is NOT the key driver
β Consumption, margin & tenure are stronger predictors
β QUESTION
βββ Is price sensitivity the primary driver of SME churn at PowerCo?
Can a targeted discount strategy reduce churn while protecting margins?
β ANSWER
βββ Random Forest model (ROC-AUC: 0.71) identifies at-risk customers
Offer 20% discounts ONLY to high-margin, high-consumption customers
Focus on early-tenure customers (1β3 yrs) β ~27% churn rate
Refine model recall before full rollout
π― Task 3 β Exploratory Data Analysis
Methodology
Step
Action
π§Ή Data Cleaning
Handled missing values, outliers, data type conversions
π Churn Analysis
Distribution of churned vs retained across all features
β‘ Consumption
Electricity & gas usage patterns by churn status
π° Price Analysis
Off-peak variable & fixed price distributions
π Margin Analysis
Net margin comparison β churned vs retained
π Tenure Analysis
Churn rate by years with PowerCo
π₯ Correlation
Feature correlation heatmap
π Key EDA Findings
π 9.7% churn rate β 1,419 of 14,606 customers churned
π Churned avg margin β¬228 β vs β¬185 retained β losing best clients!
π New customers (1-2 yrs) β ~27% churn rate β 3x the average
π Price distributions β nearly identical for churned vs retained
π Consumption patterns β clearly differ between churned & retained
π cons_12m & cons_last_month β 0.97 correlation β highly related
πΈ EDA Visualizations
βοΈ Task 4 β Feature Engineering
Methodology
Step 1 β Price variability features (off-peak vs peak mean differences)
Step 2 β Tenure-based features (months active, months since product change)
Step 3 β Consumption ratio features (last month vs 12-month avg)
Step 4 β Margin-based features (gross vs net power electricity margin)
Step 5 β Final feature selection β exported as final_features.csv
Key Engineered Features
Feature
Description
off_peak_peak_var_mean_diff
Price variability between off-peak & peak periods
off_peak_mid_peak_var_mean_diff
Price variability between off-peak & mid-peak
months_activ
Number of months customer has been active
months_modif_prod
Months since last product modification
var_year_price_off_peak
Year-on-year off-peak price change
π€ Task 5 β Modeling & Evaluation
Methodology
Step 1 β Train/Test Split (80/20)
Step 2 β Handle class imbalance
Step 3 β Random Forest Classifier training
Step 4 β ROC-AUC evaluation
Step 5 β Feature importance extraction
Step 6 β Business interpretation of results
π Model Results
Metric
Score
ROC-AUC
0.706
True Negatives (Correctly Retained)
2,635
False Negatives (Missed Churners)
260
π Top 15 Feature Importances
Rank
Feature
Importance Score
π₯ 1
cons_12m β 12-month electricity consumption
0.0525
π₯ 2
margin_net_pow_ele β Net power electricity margin
0.0524
π₯ 3
margin_gross_pow_ele β Gross power electricity margin
0.0519
4
forecast_meter_rent_12m β Forecasted meter rent
0.0502
5
net_margin β Overall net margin
0.0448
6
forecast_cons_12m β Forecasted consumption
0.0440
7
cons_last_month β Last month consumption
0.0372
8
pow_max β Max power subscribed
0.0333
9
months_activ β Months active
0.0330
10
months_modif_prod β Months since product change
0.0312
π Price features ranked well below consumption & margin β confirming price is NOT the primary churn driver!
πΈ Model Visualizations
π‘ Business Recommendations
#
Recommendation
Data Behind It
1οΈβ£
Do NOT apply blanket 20% discounts
Price is NOT the primary churn driver
2οΈβ£
Target high-margin + high-consumption customers
Top features in Random Forest model
3οΈβ£
Focus on 1β3 year tenure customers
~27% churn rate β 3x the average
4οΈβ£
Improve model recall before full rollout
Current model misses some churners
5οΈβ£
Use RF model to proactively flag at-risk customers
Devesh ShuklaData Analyst | ML Enthusiast | Insight Storyteller
β If you find this useful, please give it a star! β
About
π 14,606 customers | 9.7% churn | One hypothesis to test | Built end-to-end ML pipeline to uncover real churn drivers for PowerCo | BCG X Data Science Job Simulation on Forage