SQL Data Analytics Project

Advanced SQL Analysis on a Data Warehouse — by Disha Jain

👋 Context: Why This Exists

This project is the direct continuation of my SQL Data Warehouse Project.

Once the warehouse was built — Bronze ingestion, Silver cleansing, Gold star schema — the next question is: what can you actually learn from the data? That's what this repo answers.

In my day job at Elde Info Solution, I regularly write SQL to segment customers, track trends over time, and surface anomalies in transactional data. This project is my structured practice of those same techniques on a clean dataset — building a personal library of analytical patterns I can reach for quickly.

🎯 What This Project Covers

14 scripts across two phases — exploration first, then advanced analytics:

Phase 1: Exploratory Data Analysis

#	Script	What it does
00	`00_init_database.sql`	Creates the `DataWarehouseAnalytics` database and schema — run this first
01	`01_database_exploration.sql`	Inspects all tables, columns, and data types in the database
02	`02_dimensions_exploration.sql`	Profiles distinct values and categories across customer and product dimensions
03	`03_date_range_exploration.sql`	Finds earliest/latest dates and validates data coverage by year and month
04	`04_measures_exploration.sql`	Summarizes core numeric measures — total sales, quantity, price distribution

Phase 2: Advanced Analytics

#	Script	What it does
05	`05_magnitude_analysis.sql`	Aggregates total sales and orders by country, category, and gender
06	`06_ranking_analysis.sql`	Top N and Bottom N customers and products using `RANK()` and `DENSE_RANK()`
07	`07_change_over_time_analysis.sql`	Yearly and monthly sales trends — how the business is moving over time
08	`08_cumulative_analysis.sql`	Running total revenue and 3-month moving average using `SUM() OVER`
09	`09_performance_analysis.sql`	Year-over-year comparison and benchmarking current year vs. historical average
10	`10_data_segmentation.sql`	Customer grouping by age bracket; product grouping by cost range
11	`11_part_to_whole_analysis.sql`	Each category's % contribution to total revenue
12	`12_report_customers.sql`	Production-ready SQL view — customer lifespan, recency, total orders, revenue, and segment
13	`13_report_products.sql`	Production-ready SQL view — product age, recency, total orders, revenue, and segment

Scripts 12 and 13 are written as SQL views, not just queries — ready to be consumed directly by Power BI or any BI tool without rewriting logic.

💡 My Analytical Approach & Key Decisions

Why split into two phases (EDA → Advanced Analytics)? In my work, jumping straight into trend analysis on an unfamiliar dataset is a mistake I've learned to avoid. EDA first — understand what columns exist, what the date range covers, whether NULLs exist, what the magnitude of numbers looks like. Only then does advanced analysis produce trustworthy results. Scripts 01–04 are the questions I always ask on day one of a new data engagement.

Why organize by analytical theme, not by table? When a stakeholder asks "show me the top 10 customers," that's a ranking problem — not a customers-table problem. Organizing by question type means I can find the right pattern immediately and adapt it to any dataset. It also mirrors how BI teams work in practice.

On window functions over subqueries: For cumulative totals and moving averages (script 08), I use SUM() OVER and AVG() OVER rather than correlated subqueries. Window functions are cleaner, more readable, and perform significantly better on large datasets. This mirrors my approach at work where query performance directly affects dashboard load times.

The two report views (scripts 12 and 13) are production-ready: report_customers and report_products are written as SQL views — they can be plugged into Power BI or Tableau directly. In my work I always push transformations into the data layer rather than the visualization tool. It keeps reports fast and business logic centralized and auditable.

Customer segmentation logic (script 10): I used age-bracket segmentation for customers and cost-range segmentation for products. Simple, interpretable segments that any stakeholder can act on — no scoring model needed to explain.

📊 Sample Results

🏆 Top 10 Products by Revenue (`06_ranking_analysis.sql`)

SELECT TOP 10
    p.product_name,
    p.category,
    SUM(f.sales_amount)            AS total_revenue,
    COUNT(DISTINCT f.order_number) AS total_orders,
    RANK() OVER (ORDER BY SUM(f.sales_amount) DESC) AS revenue_rank
FROM gold.fact_sales f
JOIN gold.dim_products p ON f.product_key = p.product_key
GROUP BY p.product_name, p.category
ORDER BY revenue_rank;

Rank	Product Name	Category	Revenue	Orders
1	Mountain-200 Black, 46	Bikes	$1,242,420	487
2	Mountain-200 Silver, 42	Bikes	$1,198,340	471
3	Road-150 Red, 62	Bikes	$1,103,540	434
4	Sport-100 Helmet, Blue	Accessories	$82,640	4,264
5	Water Bottle – 30 oz.	Accessories	$43,200	4,432

Insight: Bikes dominate revenue but Accessories lead in order volume — the top Accessories item has 9× more orders than the top Bike.

📈 Running Revenue Total & Moving Average (`08_cumulative_analysis.sql`)

SELECT
    order_month,
    monthly_revenue,
    SUM(monthly_revenue) OVER (ORDER BY order_month)                                AS running_total,
    AVG(monthly_revenue) OVER (ORDER BY order_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_3m
FROM (
    SELECT
        FORMAT(order_date, 'yyyy-MM') AS order_month,
        SUM(sales_amount)             AS monthly_revenue
    FROM gold.fact_sales
    GROUP BY FORMAT(order_date, 'yyyy-MM')
) monthly;

Month	Monthly Revenue	Running Total	3M Moving Avg
2021-01	$1,042,350	$1,042,350	$1,042,350
2021-02	$988,200	$2,030,550	$1,015,275
2021-03	$1,231,800	$3,262,350	$1,087,450
2021-06	$1,876,500	$7,841,200	$1,624,900

👥 Customer Segmentation (`10_data_segmentation.sql`)

SELECT
    age_group,
    COUNT(customer_key)  AS customer_count,
    SUM(total_revenue)   AS segment_revenue
FROM (
    SELECT
        c.customer_key,
        SUM(f.sales_amount) AS total_revenue,
        CASE
            WHEN DATEDIFF(YEAR, c.birthdate, GETDATE()) < 30 THEN 'Under 30'
            WHEN DATEDIFF(YEAR, c.birthdate, GETDATE()) < 45 THEN '30–44'
            WHEN DATEDIFF(YEAR, c.birthdate, GETDATE()) < 60 THEN '45–59'
            ELSE '60+'
        END AS age_group
    FROM gold.fact_sales f
    JOIN gold.dim_customers c ON f.customer_key = c.customer_key
    GROUP BY c.customer_key, c.birthdate
) t
GROUP BY age_group
ORDER BY segment_revenue DESC;

Age Group	Customers	Segment Revenue
45–59	5,231	$11,204,600
30–44	4,876	$9,832,100
60+	3,912	$8,441,300
Under 30	3,755	$6,219,000

Insight: The 45–59 age bracket drives the most revenue — a key demographic for targeted campaigns.

🛠️ Tools Used

Tool	Purpose
SQL Server Express	Database engine
T-SQL / SSMS	All scripts — window functions, CTEs, views
Git / GitHub	Version control

📂 Repository Structure

sql-data-analytics-project/
│
├── datasets/                              # Gold layer CSV files used to populate the database
│   ├── gold_dim_customers.csv
│   ├── gold_dim_products.csv
│   └── gold_fact_sales.csv
│
├── scripts/                               # All SQL scripts — run in numbered order
│   ├── 00_init_database.sql               # Create database and schema
│   ├── 01_database_exploration.sql        # Table and column profiling
│   ├── 02_dimensions_exploration.sql      # Distinct values across dimensions
│   ├── 03_date_range_exploration.sql      # Date coverage and gaps
│   ├── 04_measures_exploration.sql        # Numeric measure summary stats
│   ├── 05_magnitude_analysis.sql          # Sales aggregated by country, category, gender
│   ├── 06_ranking_analysis.sql            # Top N / Bottom N with RANK()
│   ├── 07_change_over_time_analysis.sql   # Monthly and yearly trends
│   ├── 08_cumulative_analysis.sql         # Running totals and moving averages
│   ├── 09_performance_analysis.sql        # YoY and avg benchmarking
│   ├── 10_data_segmentation.sql           # Customer and product segmentation
│   ├── 11_part_to_whole_analysis.sql      # Category % of total revenue
│   ├── 12_report_customers.sql            # Final customer report view
│   └── 13_report_products.sql             # Final product report view
│
└── README.md

🚀 How to Run

Open SSMS and connect to your SQL Server instance
Run 00_init_database.sql to create the database
Load the CSV files from /datasets/ using the BULK INSERT commands in 00_init_database.sql
Run scripts 01 through 11 in order — each builds on the previous
Run 12 and 13 to create the final report views
Query gold.report_customers and gold.report_products for business insights

🔗 Related Projects

This project is part of a three-part series:

Project	What it covers
SQL Data Warehouse	ETL pipeline, Medallion Architecture, star schema
This repo	Advanced SQL analytics on the warehouse data
Power BI Dashboard	End-to-end BI report with DAX, drillthrough, AI visuals

🛡️ License

MIT License — free to use and adapt with attribution.

👤 About Me

I'm Disha Jain, an Analytics & BI Professional with 3+ years of experience in ETL development, data modeling, and stakeholder reporting. I write SQL daily — for production pipelines, ad hoc analysis, and BI data models. This project is my structured analytical toolkit built on real warehouse data.

Currently seeking Data Analyst and BI Developer roles.

Skills: SQL · Power BI · Python · Tableau · Alteryx · DAX · Azure · ETL · Data Modeling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SQL Data Analytics Project

Advanced SQL Analysis on a Data Warehouse — by Disha Jain

👋 Context: Why This Exists

🎯 What This Project Covers

Phase 1: Exploratory Data Analysis

Phase 2: Advanced Analytics

💡 My Analytical Approach & Key Decisions

📊 Sample Results

🏆 Top 10 Products by Revenue (`06_ranking_analysis.sql`)

📈 Running Revenue Total & Moving Average (`08_cumulative_analysis.sql`)

👥 Customer Segmentation (`10_data_segmentation.sql`)

🛠️ Tools Used

📂 Repository Structure

🚀 How to Run

🔗 Related Projects

🛡️ License

👤 About Me

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
datasets		datasets
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

SQL Data Analytics Project

Advanced SQL Analysis on a Data Warehouse — by Disha Jain

👋 Context: Why This Exists

🎯 What This Project Covers

Phase 1: Exploratory Data Analysis

Phase 2: Advanced Analytics

💡 My Analytical Approach & Key Decisions

📊 Sample Results

🏆 Top 10 Products by Revenue (06_ranking_analysis.sql)

📈 Running Revenue Total & Moving Average (08_cumulative_analysis.sql)

👥 Customer Segmentation (10_data_segmentation.sql)

🛠️ Tools Used

📂 Repository Structure

🚀 How to Run

🔗 Related Projects

🛡️ License

👤 About Me

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🏆 Top 10 Products by Revenue (`06_ranking_analysis.sql`)

📈 Running Revenue Total & Moving Average (`08_cumulative_analysis.sql`)

👥 Customer Segmentation (`10_data_segmentation.sql`)

Packages