A MongoDB-based data science project that tracks and analyses CS:GO championship matches using web scraping and aggregation pipelines.
During the COVID-19 lockdown, Counter-Strike: Global Offensive (CS:GO) served as more than a game — it was a way to connect with friends. Years later in university, inspired by those shared experiences and a memorable esports tournament (PGL Major Antwerp 2022), this project explores CS:GO match data by sourcing, storing, and analysing statistics in a structured MongoDB environment.
- Source: Scraped from bo3.gg
- Matches Analysed: 2022 PGL Major matches
- Data Type: Match metadata, teams, players, maps, stats, and simulated commentary
The project uses six MongoDB collections:
match_id,title,tournament,dateteams: list of team IDsscore,map_ids
team_id,team_nameplayers: list of player IDstournament
player_id,name,tournament,team_id
map_id,name,times_played
match_id,player_id,kills,deaths,assists
commentary_id,match_id,timestamp,text,tags
All keys are uniquely generated using UUIDs for consistency and ease of reference.
- Web scraper built using:
BeautifulSoup4,requests,uuid,pymongo
- Robust error-handling and duplicate prevention mechanisms
- HTML parsing logic refined over 30+ iterations
Since timestamped highlights were unavailable, a 2-minute mock simulation of live commentary was scripted. See the .ipynb notebook for execution instructions.
Implemented in Python with pymongo:
verify_single_team_per_player()verify_teams_in_matches()verify_player_stats_references()verify_maps_in_matches()verify_commentary_matches()
These ensure referential integrity across the dataset.
-
Full Match Summary
Nested lookups and array queries to compile match data, participating teams, players, and commentary. -
Top KDA Players
Identifies top players by computing KDA = (kills + assists) / deaths. -
Top Performer per Team/Match
Lists the highest KDA scorer from each team in every match.
Additional explanation is available in the Jupyter Notebook.
match_tracker.ipynb: Code for scraping, ingestion, and aggregationREADME.md: This filedata/: (optional) Any manually backed up datasets