PlaySmart is an end-to-end data engineering + analytics solution for tracking video game prices and finding the best deals across multiple digital storefronts. Built with real APIs and production-grade code, this project demonstrates complete data pipeline expertise: from API integration through data transformation to interactive dashboards.
- Fetches real-time game deal data from the CheapShark API (aggregates deals from 90+ retailers)
- Processes and cleans data through a robust Python ETL pipeline with validation and error handling
- Enriches datasets with deal metrics: discount percentages, deal quality ratings, savings calculations
- Powers an interactive Streamlit dashboard with multiple pages for deal analysis and store comparison
- Generates insights for smart game purchasing decisions
- Real-world relevance: Solves an actual problemâfinding when games are genuinely cheap vs. misleading sales
- Production-ready code: Includes error handling, logging, configuration management, and modular architecture
- Demonstrates key skills: Data engineering (ETL), API integration, data analysis, data visualization, Python expertise
- Portfolio-worthy: Shows ability to build complete data solutions from raw API data to interactive dashboards
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
â PlaySmart Architecture â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ€
â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â CheapShark API â â
â â (90+ Game Retailers - Steam, Epic, GOG, Fanatical) â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â Data Fetching Layer (fetch_data.py) â â
â â âą GamePriceFetcher class â â
â â âą Handles API requests & error handling â â
â â âą Returns pandas DataFrames â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â Data Processing Layer (transform.py) â â
â â âą Data validation & cleaning â â
â â âą Discount percentage calculation â â
â â âą Deal quality categorization â â
â â âą GameDataTransformer class â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â Processed Data Storage (data_processed/) â â
â â âą Clean, enriched datasets as CSV files â â
â â âą Ready for analysis and visualization â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â Streamlit Dashboard (dashboard/app.py) â â
â â âą 3 interactive pages â â
â â âą Deal analysis & filtering â â
â â âą Store comparison analytics â â
â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ â
â â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
PlaySmart/
âââ pipeline/ # Data ETL Pipeline
â âââ api_config.py # API configuration & endpoints
â âââ fetch_data.py # Data fetching from CheapShark
â âââ transform.py # Data cleaning & enrichment
â âââ pipeline.py # Master orchestration script
â âââ __init__.py
â
âââ dashboard/ # Streamlit Application
â âââ app.py # Main dashboard (3 pages)
â âââ __init__.py
â
âââ raw_data/ # Raw data from API (generated)
â âââ deals_raw_*.csv
â
âââ processed_data/ # Processed & enriched data (generated)
â âââ deals_processed_*.csv
â âââ pipeline_summary.txt
â
âââ logs/ # Pipeline execution logs (generated)
â âââ pipeline_*.log
â
âââ requirements.txt # Python dependencies
âââ .env.example # Environment variables template
âââ .gitignore # Git ignore rules
âââ README.md # This file
PlaySmart uses the CheapShark API to fetch real-time game deal data.
- No API key required - CheapShark is a public API
- No authentication needed
- Rate limit: ~1 request per 0.5 seconds (built into pipeline)
- Deal data: Title, current price, retail price, discount %, store
- Deal ratings: Quality scores from CheapShark community
- 90+ retailers: Steam, Epic Games Store, GOG, Fanatical, Humble Bundle, Green Man Gaming, and more
- Real-time pricing: Updated approximately every 20 minutes
- Pipeline includes automatic 0.5s delays between requests
- CheapShark has generous rate limits for non-commercial use
- Suitable for personal projects and hobby use
- Python 3.8+
- pip (Python package manager)
- Git (for version control)
-
Clone the repository
git clone https://github.com/yourusername/PlaySmart.git cd PlaySmart -
Create a virtual environment (recommended)
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables (optional, CheapShark doesn't require auth)
cp .env.example .env # .env file is optional for this project
The pipeline fetches fresh game deal data from CheapShark, cleans it, and enriches it with deal metrics.
cd pipeline
python pipeline.pyOutput:
- Raw data saved to
raw_data/deals_raw_*.csv - Processed data saved to
processed_data/deals_processed_*.csv - Execution logs saved to
logs/pipeline_*.log - Summary report in
processed_data/pipeline_summary.txt
- Fetches deal data - Retrieves top-rated game deals from CheapShark
- Transforms data:
- Cleans and standardizes column names
- Converts prices to numeric values
- Calculates discount percentages
- Categorizes deals by quality (Exceptional/Excellent/Good/Moderate/Minimal)
- Adds timestamp metadata
- Sorts deals - Ranks by deal rating and discount depth
- Saves output - Stores both raw and processed data
- Generates report - Creates summary statistics
INFO:__main__:Starting PlaySmart Game Deal Pipeline
INFO:fetch_data:Fetching game deals from CheapShark...
INFO:fetch_data:Successfully fetched 60 deals
INFO:transform:Starting game deals transformation...
INFO:transform:Calculated discount percentages
INFO:transform:Categorized deal qualities
INFO:__main__:Saved processed deals data to C:\...\processed_data\deals_processed_*.csv
INFO:__main__:PlaySmart Pipeline Completed Successfully!
INFO:__main__:Processed 60 game deals
The dashboard is a web application that visualizes game deals with interactive charts and filtering.
streamlit run dashboard/app.pyThe dashboard will open in your browser at http://localhost:8501
- Summary metrics (total deals, average discount, best discount)
- Deal quality distribution (pie chart)
- Top stores by deal count (bar chart)
- Filterable deals table with controls:
- Minimum discount percentage
- Maximum price limit
- Deal quality filter
- Discount distribution histogram
Use case: Find current deals matching your budget and quality requirements
- Top 20 deals ranked by discount percentage
- Shows current price, retail price, discount %, deal rating, and quality
- Organized card layout for easy browsing
Use case: Quickly see the absolute deepest discounts available right now
- Compare deals across different retailers
- Metrics table showing:
- Number of deals per store
- Average discount percentage
- Maximum discount available
- Average game price
- Side-by-side charts for discount comparison and deal counts
Use case: Decide which stores have the best deals for your wishlist
- APIConfig class: Centralized configuration management
- CheapShark endpoints: Base URL and API parameters
- Store IDs: Mapping of store names to CheapShark IDs
- Validation: Basic configuration checks
- GamePriceFetcher class: Handles all API requests
- Methods:
fetch_deals()- Fetch current game dealsfetch_game_detail()- Get detailed info for a specific gamefetch_price_history()- Extract price history from deal datafetch_multiple_game_details()- Batch fetch game info
- Features:
- HTTP error handling with retries
- Rate limiting (0.5s between requests)
- Response validation
- Comprehensive logging
- GameDataTransformer class: Data cleaning and feature engineering
- Methods:
clean_deal_data()- Standardize column names and typescalculate_discount_percentage()- Compute discount %categorize_deal_quality()- Rate deal quality (1-5)add_time_metadata()- Add timestamptransform_deals_data()- Complete pipelinefilter_by_discount()- Filter deals by minimum discountsort_by_deal_quality()- Sort by quality ratings
- Data quality: Type conversion, null handling, validation
- GameDealPipeline class: Master orchestration
- Methods:
fetch_deals()- Fetch and save raw datatransform_and_save_deals()- Transform and save processed datacreate_summary_report()- Generate execution reportrun()- Execute complete end-to-end pipeline
- Logging: Comprehensive logging to file and console
- Error handling: Graceful error handling with detailed logging
- Page functions:
page_deals_overview(),page_best_deals(),page_store_comparison() - Chart functions: Using Plotly for interactive visualizations
- Caching:
@st.cache_datafor efficient data loading - Interactivity: Sliders, multiselect dropdowns, responsive layout
- Theming: Gaming-inspired dark theme with neon accents
- Interactive pie charts for deal distribution
- Bar charts for store comparison
- Histograms for discount distribution
- Hover information on all charts
| Category | Discount | Use Case |
|---|---|---|
| Exceptional | â„75% off | Once-in-a-lifetime deals, collector's items on deep sale |
| Excellent | 50-75% off | Major sales, best time to buy |
| Good | 25-50% off | Standard sales, reasonable savings |
| Moderate | 10-25% off | Minor discounts, seasonal sales |
| Minimal | <10% off | Barely discounted, probably not worth waiting for |
- Track game prices for your wishlist
- Get alerts when games hit good discount points
- Avoid impulse purchases at bad prices
- Find cheap games for streaming content
- Discover trending titles at deals
- Track pricing trends for video thumbnails
- Build a library of games at optimal prices
- Identify seasonal sale patterns
- Maximize gaming budget
- Understand game pricing dynamics
- Track retailer pricing strategies
- Analyze discount patterns across publishers
Strong Answer: "PlaySmart is an end-to-end ETL pipeline that fetches game deal data from the CheapShark API, cleans and enriches it with deal metrics, and powers an interactive dashboard. The pipeline uses Python to fetch real-time data from 90+ game retailers, validates data quality, calculates discount percentages and deal quality ratings, then saves cleaned data as CSV files. I built modular code with separate concernsâAPI configuration, data fetching, transformation, and orchestrationâso each component is testable and reusable. The dashboard uses Streamlit for interactive visualization with filtering and comparison features."
Strong Answer: "CheapShark was the best fit for this project because it provides free, real-time deal data from 90+ retailers without requiring authentication. It directly solves the problem of finding the cheapest games across multiple storefronts. The trade-off is that it focuses on PC games (Steam, Epic, GOG), but that's where most digital game sales happen anyway. I could have built web scrapers for individual stores, but CheapShark's centralized approach is cleaner and more reliable. The API has generous rate limits for hobby projects like mine."
Strong Answer: "I implemented error handling at multiple levels: (1) API layerâcatch request timeouts and validate JSON responses; (2) Data layerânull checking, type conversion with error handling, validation of critical fields; (3) Pipeline layerâtry/except blocks around each major operation; (4) Monitoringâcomprehensive logs to file with timestamps and error details. If one deal fetch fails, the pipeline continues and processes available data instead of crashing. The dashboard gracefully handles missing data with informative messages."
Strong Answer: "For scale, I'd make several improvements: (1) Use a database (PostgreSQL) instead of CSV files for faster queries and historical analysis; (2) Implement incremental fetchingâonly fetch deals updated since last run instead of full dataset; (3) Add caching layer (Redis) for frequently accessed data; (4) Schedule with Airflow or Cron for automated daily runs; (5) Parallelize API requests with async/await; (6) Move dashboard to cloud (Streamlit Cloud, AWS); (7) Add data versioning. The modular design makes these changes straightforwardâI could swap out the CSV storage layer without touching the fetch or transform logic."
Strong Answer: "PlaySmart calculates several key deal metrics: (1) Discount percentageâhow much off retail price; (2) Deal quality ratingâcategorizes deals (Exceptional/Excellent/Good) based on discount depth; (3) Savings amountâactual dollar amount saved; (4) CheapShark deal ratingâcommunity feedback on deal quality; (5) Timestampâwhen deal was found. The dashboard visualizes these metrics to help users identify the best times to buy. The categorization system helps distinguish between 'good deals' and 'genuine bargains.'"
Strong Answer: "PlaySmart demonstrates real data engineering skills valued by fintech and tech companies: (1) API integrationâworking with external data sources; (2) ETL designâextract, transform, load pipeline; (3) Data validationâensuring quality before analysis; (4) Logging and monitoringâunderstanding pipeline health; (5) Modular codeâseparating concerns for maintainability; (6) Error handlingâgraceful degradation; (7) Cloud-readyâuses standard tools (Python, Pandas, Streamlit) deployable anywhere. In fintech context, the same pattern applies to market data, pricing APIs, and risk analytics. The skills are transferable."
| Package | Version | Purpose |
|---|---|---|
| pandas | >=2.0.0 | Data manipulation and analysis |
| numpy | >=1.24.0 | Numerical computations |
| requests | >=2.31.0 | HTTP requests to API |
| python-dotenv | >=1.0.0 | Environment variable management |
| streamlit | >=1.28.0 | Web dashboard framework |
| plotly | >=5.17.0 | Interactive visualizations |
-
API Key Management
- CheapShark doesn't require authentication
- Store any future API keys in
.envfile (never commit to git) - Use
.gitignoreto exclude.envfrom version control
-
Error Messages
- Don't expose sensitive data in logs
- Log errors for debugging without exposing credentials
-
Data Privacy
- Game deal data is public
- No user data is collected or stored
- Dashboard is local-only by default
| File | Purpose |
|---|---|
pipeline/api_config.py |
API configuration & parameters |
pipeline/fetch_data.py |
Data fetching from CheapShark |
pipeline/transform.py |
Data cleaning & enrichment |
pipeline/pipeline.py |
Master orchestration script |
dashboard/app.py |
Streamlit dashboard (3 pages) |
requirements.txt |
Python dependencies |
.env.example |
Environment template |
.gitignore |
Git ignore rules |
README.md |
This documentation |
Total: 1,500+ lines of code + 1,000+ lines of documentation
- Data Engineering â ETL pipeline design, data validation, quality checks
- API Integration â HTTP requests, error handling, rate limiting
- Data Analysis â Deal metrics, discount calculations, quality categorization
- Python Proficiency â Pandas, OOP, error handling, logging
- Web Development â Streamlit, interactivity, responsive UI
- Data Visualization â Plotly charts, dashboard design, UX
- Software Engineering â Modular code, configuration management
- Problem Solving â Finding cheap games (real-world relevance!)
- Production Code â Logging, error handling, comprehensive documentation
- Install dependencies:
pip install -r requirements.txt - Run pipeline:
python pipeline/pipeline.py - Start dashboard:
streamlit run dashboard/app.py - Explore all 3 dashboard pages
- Customize with different game categories/tags
- Experiment with different discount thresholds
- Deploy dashboard to Streamlit Cloud (free tier available)
- Share with gaming friends
- Add database backend (PostgreSQL)
- Implement scheduled runs (Cron/Airflow)
- Add price history tracking over time
- Implement email/Discord notifications for specific games
- Add ML-based deal prediction
- Support for console games (Xbox, PlayStation)
- Check the code comments and docstrings
- Review logs in
logs/directory for pipeline execution details - See CheapShark API docs for data details
This project is provided as-is for educational and personal use.
Version: 1.0.0 Last Updated: November 2024 Status: Production-Ready
Made with đź by someone who wants to save money on games