Skip to content

Neginde/Predictive-Customer-Lifetime-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Driven Customer Segmentation & Predictive Lifetime Analytics Executive Summary: in modern e-commerce, treating all customers equally is a massive operational inefficiency. This end-to-end enterprise project transforms $8.91M in raw transaction data into an automated, highly actionable marketing engine. By combining advanced Feature Engineering, Unsupervised Clustering (PCA + K-Means), and Supervised Machine Learning (Random Forest), we successfully grouped 4,338 distinct customers into 4 strategic business personas. Furthermore, we built a predictive model with 96.1% accuracy to instantly classify future buyers based on their initial shopping footprint. Finally, these insights are funneled into an interactive executive dashboard in Power BI to drive data-backed retention and loyalty campaigns. Business Problem: Raw transaction data contains hidden behavioral patterns. Without proper analytics, marketing teams suffer from: High Acquisition Costs (CAC): Wasting budget by sending generic discounts to high-value shoppers who would buy anyway.

Customer Churn: Failing to detect high-value customers who are gradually slipping away (dormancy) until it's too late.

Operational Friction: High order cancellation rates that disrupt supply chain and inventory management. Advanced Feature Engineering (The Secret Sauce): While traditional models rely solely on basic Recency, Frequency, and Monetary (RFM) metrics, this project engineered specialized behavioral features that significantly boosted the model’s business depth:

PurchaseGap: Measures the average velocity and rhythm (in days) between a customer's transactions.

CancelRate: The percentage of a customer's orders that were canceled—a vital proxy for customer friction and financial risk.

ProductDiversity: The count of unique product categories purchased, highlighting brand engagement. Data Pipeline & Machine Learning Architecture: [Raw Transaction Data] ➔ [Feature Engineering] ➔ [PowerTransformer Scaling] ↓ [Random Forest Classifier] 🔀 [Train/Test Split] ⚛️ [K-Means (K=4)] 🔑 [PCA (3 Components)] (96.1% Accuracy Model)

  1. Dimensionality Reduction (PCA) To eliminate mathematical noise and multicollinearity, Principal Component Analysis (PCA) was applied, compressing our multi-dimensional features into 3 optimal components while retaining 82.3% of the total variance.
  2. Strategic Clustering (K-Means)Using the Silhouette Score optimization framework, we bypassed the rigid mathematical suggestion of K=2 n favor of K=4 lusters. While K=2 is mathematically distinct, it yields zero marketing value (e.g., splitting customers into just "Good" vs "Bad"). Choosing 4 clusters unlocked operational, real-world personas
  3. Predictive Classifier (Random Forest) To make the system production-ready, we trained a Random Forest Classifier to predict which cluster a new customer will belong to. Overall Accuracy: 96.08% Key Insight: The model flagged CancelRate and PurchaseGap as the two most powerful predictive features, proving that our custom-engineered metrics carry the most critical business signals. Executive Power BI Dashboard Architecture: 📈 Top-Level Strategic KPIs Total Revenue Portfolio: $8.91M (Validates the massive enterprise scale of the dataset) Global Ecosystem Friction: 10.60% Average Cancel Rate Active Customer Base: 4,338 Unique Buyers

About

End-to-end e-commerce customer analytics using Python and Power BI. Features unsupervised clustering, a 96% accurate predictive classifier, and custom metrics (PurchaseGap & CancelRate) for churn mitigation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors