📊 Web Scraping and End-to-End Data Visualization & Analysis Project on combined ODI, T20 & Test Cricket Statistics
This project presents a comprehensive cricket analytics dashboard that merges ODI, T20, and Test match statistics into a unified visualization. Built using Python (for data scraping & preprocessing) and Power BI (for visual storytelling), it answers critical questions about player performance, country dominance, and trends across decades.
Key questions addressed:
- 🏆 Who are the top performers across formats?
- 🌍 Which countries dominate international cricket?
- 📈 What are the performance trends over time?
- 🎯 Which captains, bowlers, and batsmen have had the biggest impact?
| Tool | Purpose |
|---|---|
| Python | Web scraping, data wrangling |
| BeautifulSoup | Extracting data from web pages |
| Pandas, NumPy | Cleaning, transformation, aggregation |
| Jupyter Notebook | EDA and preprocessing |
| Power BI | Dashboard creation and modeling |
- ESPNcricinfo – Player and country stats
- Cricwindow.com – Historical data on captains, keepers, match awards, etc.
📄 Web Scrapping for cricket data.ipynb
| Data Type | Source | Details |
|---|---|---|
| 🏏 Most Runs | ESPNcricinfo | Player, Matches, Runs, Start–End Year |
| 🎯 Most Wickets | ESPNcricinfo | Wickets, Matches |
| 🧤 Most Catches | Cricwindow | Fielder Name, Dismissals |
| 🧢 Match Awards | Cricwindow | MoM & MoS data |
| 🧑 |
Cricwindow | Matches, Wins, Losses, Win % |
| 🌍 Country Performance | ESPNcricinfo | Matches, Wins, Losses, Win rate |
Tech Used: requests, BeautifulSoup, pandas
✅ Special logic was written to clean raw scraped strings like "1.2K", "10K", "234M" into numeric values.
📄 Exploratory Data Analysis on Cricket data.ipynb
- Converted strings like
'1.3K','1M'→ integers - Extracted StartYear and EndYear for all players
- Created
combined_odi_t20_test_stats.xlsxfile with merged data - Ensured data consistency across formats and metrics
📄 Cricket data Visualisation.pbix
| Visual Tile | Description |
|---|---|
| Top Batsman | Most international runs |
| Top Bowlers | Leading wicket-takers |
| Best Keepers | Dismissals by wicket-keepers |
| Best Fielders | Most catches |
| Top Captains | Matches led, wins |
| Man of the Match | Top award winners |
| Man of the Series | Series performance highlights |
| Top Countries | Country-level wins, losses, win % |
-
✅ Country Filter: Slice stats by selected country
-
🕒 (Coming Soon): A unified Year Slicer via
YearTable = GENERATESERIES(1980, 2025, 1)- You’ll be able to filter all visuals by specific year or range
Cricket-Analysis/
│
├── Dataset/
│ └── combined_odi_t20_test_stats.xlsx # Final unified dataset
│
├── Python Scripting/
│ ├── Web Scrapping for cricket data.ipynb # Web scraping logic
│ └── Exploratory Data Analysis on Cricket data.ipynb
│
├── Visualisation/
│ ├── Cricket data Visualisation.pbix # Power BI dashboard
│ └── dashboard_screenshot.png
│
└── README.md
- Clone the repo
- Run
Web Scrapping for cricket data.ipynbto refresh data (optional) - Open
Exploratory Data Analysis...ipynbto preprocess - Load
Cricket data Visualisation.pbixin Power BI Desktop - Interact with the dashboard
- 🔗 Complex relationship modeling in Power BI
- 🧹 Transforming scraped text like "1.2K" or "1M"
- 📊 Using DAX to normalize metrics like Runs/Year
- 📈 Creating meaningful dashboards for storytelling

