Transform coding problem ideas into ready-to-test bundles with a single click
Features β’ Quick Start β’ Usage β’ Contributing β’ License
A powerful FastAPI web application that scrapes coding problems from top competitive programming platforms, generates test cases with .in/.out files, and packages everything into downloadable bundles. Perfect for educators, competitive programmers, and coding interview preparation.
- π Multi-Platform Support: Scrapes from Codeforces, LeetCode, CodeChef, GeeksforGeeks, and AtCoder
- π¨ Modern Web UI: Clean, single-page interface with real-time progress tracking
- π Live Updates: Watch your scraping job progress with streaming logs
- π― Smart Filtering: Select specific platforms and difficulty levels
- β Problem Curation: Review, accept, or reject problems before downloading
- π¦ Instant Downloads: Get filtered ZIP bundles with organized test cases
- β‘ Placeholder Mode: Generate synthetic problems instantly for testing
- π Job Queue: Background processing with concurrent job support
Note: Screenshots and demo video coming soon! In the meantime, try the app yourself following the setup below.
- Enter: "Give me 5 medium difficulty sorting problems"
- Select platforms: Codeforces, LeetCode
- Watch real-time scraping progress
- Review and curate problems
- Download filtered bundle with organized test cases
# 1. Clone the repository
git clone https://github.com/AVPthegreat/codebase-problem-scrapper.git
cd codebase-problem-scrapper
# 2. Create virtual environment
python3 -m venv .venv
# 3. Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
# 4. Install dependencies
pip install -e '.[dev]'
# 5. Launch the app
python scripts/run_web.pyOpen your browser and navigate to http://127.0.0.1:8000
Try this example:
- Prompt: "Give me 3 easy array problems"
- Platforms: Select Codeforces
- Difficulty: Easy
- Click: "Generate Problems"
-
Enter Your Prompt
- Describe what problems you want (e.g., "5 dynamic programming problems")
- Be specific about topics, difficulty, or quantity
-
Configure Options
- Platforms: Choose one or more (Codeforces, LeetCode, etc.)
- Difficulty: Easy, Medium, Hard, or Mixed
- Placeholder Mode: Enable for instant synthetic test problems
-
Monitor Progress
- Real-time progress bar and live logs
- See which platforms are being scraped
- Track problem discovery in real-time
-
Curate Results
- Review all discovered problems
- Accept β or reject β individual problems
- See problem descriptions and metadata
-
Download Bundle
- Click "Download Selected Problems"
- Get a ZIP file with organized folders
- Each problem includes
.inand.outtest files
Placeholder Mode π
- Instant synthetic problems for UI testing
- No network requests or rate limiting
- Perfect for demos and development
Platform Filtering π
- Select specific platforms for targeted scraping
- Combine multiple sources in one bundle
- Leave all unchecked to search everywhere
Job Queue π
- Multiple jobs can be queued
- Background processing doesn't block UI
- Check
/recentfor job history
codebase-problem-scrapper/
βββ src/
β βββ app/
β β βββ services/
β β βββ orchestrator.py # Core bundle generation logic
β β βββ scrapers/ # Platform-specific scrapers
β β βββ codeforces.py
β β βββ leetcode.py
β β βββ codechef.py
β β βββ geeksforgeeks.py
β β βββ atcoder.py
β βββ webapp/
β βββ main.py # FastAPI application
β βββ templates/ # Jinja2 HTML templates
β βββ index.html # Main UI
β βββ job.html # Job details
β βββ recent.html # Job history
βββ scripts/
β βββ run_web.py # Server launcher
βββ tests/
β βββ test_orchestrator.py # Unit tests
βββ .github/
β βββ ISSUE_TEMPLATE/ # Bug & feature templates
β βββ pull_request_template.md # PR template
βββ pyproject.toml # Dependencies & config
βββ LICENSE # MIT License
βββ CODE_OF_CONDUCT.md # Community guidelines
βββ SECURITY.md # Security policy
βββ README.md # You are here!
Currently, the app runs without external API keys. Future enhancements may include:
- OpenAI integration for intelligent test case generation
- OAuth for platform authentication
- Custom scraping rate limits
Edit scripts/run_web.py to customize:
- Port: Change from default
8000 - Host: Bind to
0.0.0.0for network access (add auth first!) - Workers: Adjust concurrent scraping jobs
# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=src --cov-report=html
# Run specific test file
pytest tests/test_orchestrator.py -v
# Check code style
ruff check src/ tests/- β
Never commit
.envfiles or virtual environments - β Respect rate limits when scraping live platforms
- β Use placeholder mode for demos and testing
- β Add authentication before exposing beyond localhost
- β Review the Security Policy before reporting vulnerabilities
Port 8000 already in use
# Kill process on port 8000
lsof -ti:8000 | xargs kill -9
# Or run on different port
# Edit scripts/run_web.py and change port numberMissing dependencies error
# Reinstall all dependencies
pip install -e '.[dev]'
# Or install specific package
pip install <package-name>Virtual environment not activating
# On macOS/Linux
source .venv/bin/activate
# On Windows
.venv\Scripts\activate
# On Windows PowerShell
.venv\Scripts\Activate.ps1Tests failing
# Ensure you're in virtual environment
source .venv/bin/activate # or .venv\Scripts\activate
# Reinstall dependencies
pip install -e '.[dev]'
# Run tests with verbose output
pytest tests/ -vScraping returns no results
- Check your internet connection
- Some platforms may have rate limits or anti-scraping measures
- Try placeholder mode for testing
- Check platform availability (some may be down temporarily)
We love contributions! Whether it's bug fixes, new features, or documentation improvements.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'feat: add amazing feature') - Push to your branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
- π¨ Add more platform scrapers
- π Improve scraping accuracy and speed
- π± Create mobile-responsive UI
- π€ Integrate AI for test case generation
- π Add internationalization support
- π Add analytics and statistics
This project is licensed under the MIT License - see the LICENSE file for details.
- FastAPI - Modern Python web framework
- Uvicorn - Lightning-fast ASGI server
- Jinja2 - Powerful templating engine
- All the competitive programming platforms for providing great problems
- Author: Anant Vardhan Pandey
- GitHub: @AVPthegreat
- Issues: Report a bug or request a feature
Built with β€οΈ by AVPTHEGREAT
β Star this repo if you find it helpful!