- Introduction
- Problem Statement
- Requirement Scoping
- Installation Instructions
- Data Collection
- Data Cleaning and Transformation in Python Pandas
- Data Transformation in Power Query
- Data Modeling and Building Parameters using DAX
- Building an Interactive Dashboard in Power BI
- Conclusion
- Acknowledgments
This project is designed to provide meaningful insights into cricket data using web scraping, Python, Pandas, and Power BI. By following an end-to-end data analytics workflow, we will extract, clean, transform, and visualize the data, creating an interactive dashboard for better analysis and decision-making.
The objective of this project is to analyze cricket data to uncover valuable insights and trends. The process includes:
- Scraping data from ESPN Cricinfo
- Cleaning and transforming the data using Python and Pandas
- Performing additional transformations in Power Query
- Modeling the data and building parameters using DAX (Data Analysis Expressions)
- Creating an interactive dashboard in Power BI
- Python: Data extraction and preprocessing
- Pandas: Data manipulation
- Power BI: Data visualization and analysis
- Web Scraping Libraries: BeautifulSoup, Requests
- Web Scraping
- Data Cleaning & Transformation
- Data Modeling
- Data Visualization
- Python: Download and install from python.org
- Power BI: Download and install from powerbi.microsoft.com
Run the following command to install dependencies:
pip install pandas beautifulsoup4 requests- Identify Data Source: ESPN Cricinfo website.
- Web Scraping: Extract cricket data using BeautifulSoup and Requests.
- Store Data: Save the scraped data into a CSV file for further processing.
- Load Data: Read the CSV file into a Pandas DataFrame.
- Data Cleaning:
- Handle missing values
- Remove duplicates
- Correct data types
- Data Transformation:
- Perform aggregations
- Create calculated columns
- Implement feature engineering
- Import Data: Load the cleaned data into Power BI using Power Query.
- Transform Data:
- Merge tables
- Create calculated columns
- Filter data for better analysis
- Data Modeling:
- Define relationships between tables
- Optimize data schema
- DAX Calculations:
- Create measures
- Develop calculated columns
- Build parameters for enhanced analysis
- Dashboard Design:
- Utilize charts, tables, slicers, and KPIs for visualization
- Enhance Interactivity:
- Implement drill-through, tooltips, and filters
- Publish & Share:
- Deploy the dashboard to Power BI Service for collaboration
This project showcases the complete lifecycle of a data analytics project, from data collection to visualization. By integrating Python, Pandas, Power BI, and DAX, we successfully extract meaningful insights from cricket data and present them in an interactive format.
- ESPN Cricinfo for the data source.
- Python & Power BI Communities for valuable tools and resources.
Do change your paths in the "\Cricket Data Analytics Project\WebScrapping\config\config.json" file:
{
"base_url": "https://www.espncricinfo.com",
"t20_url": "https://www.espncricinfo.com/records/tournament/team-match-results/icc-men-s-t20-world-cup-2022-23-14450",
"espn_t20_wc_2022_results_csv_file_path": "yourpath//Cricket Data Analytics Project//archive//csv//t20_wc_match_results.csv",
"batting_summary_csv_file_path": "yourpath//Cricket Data Analytics Project//archive//csv//t20_wc_batting_summary.csv",
"bowling_summary_csv_file_path": "yourpath//Cricket Data Analytics Project//archive//csv//t20_wc_bowling_summary.csv",
"player_details_csv_file_path": "yourpath//Cricket Data Analytics Project//archive//csv//t20_wc_player_info.csv",
"player_details_with_imageurl_csv_file_path": "yourpath//Cricket Data Analytics Project//archive//csv//t20_wc_player_info_with_imageurl.csv",
"espn_t20_wc_2022_results_json_filepath": "yourpath//Cricket Data Analytics Project//archive//json//t20_wc_match_results.json",
"batting_summary_json_file_path": "yourpath//Cricket Data Analytics Project//archive//json//t20_wc_batting_summary.json",
"bowling_summary_json_file_path": "yourpath//Cricket Data Analytics Project//archive//json//t20_wc_bowling_summary.json",
"player_details_json_file_path": "yourpath//Cricket Data Analytics Project//archive//json//t20_wc_player_info.json",
"player_details_with_imageurl_json_file_path": "yourpath//Cricket Data Analytics Project//archive//json//t20_wc_player_info_with_imageurl.json",
"scorecard_urls_json_file_path": "yourpath//Cricket Data Analytics Project//archive//json//scorecard_urls.json"
}Similarly, change paths in your script files accordingly.
🌟 Happy Analyzing!