Skip to content

AVPthegreat/codebase-problem-scrapper

🎯 Competitive Programming Test Case Generator

Transform coding problem ideas into ready-to-test bundles with a single click

License: MIT Python 3.11+ FastAPI Code style: ruff

Features β€’ Quick Start β€’ Usage β€’ Contributing β€’ License


πŸš€ What is This?

A powerful FastAPI web application that scrapes coding problems from top competitive programming platforms, generates test cases with .in/.out files, and packages everything into downloadable bundles. Perfect for educators, competitive programmers, and coding interview preparation.

✨ Features

  • 🌐 Multi-Platform Support: Scrapes from Codeforces, LeetCode, CodeChef, GeeksforGeeks, and AtCoder
  • 🎨 Modern Web UI: Clean, single-page interface with real-time progress tracking
  • πŸ“Š Live Updates: Watch your scraping job progress with streaming logs
  • 🎯 Smart Filtering: Select specific platforms and difficulty levels
  • βœ… Problem Curation: Review, accept, or reject problems before downloading
  • πŸ“¦ Instant Downloads: Get filtered ZIP bundles with organized test cases
  • ⚑ Placeholder Mode: Generate synthetic problems instantly for testing
  • πŸ”„ Job Queue: Background processing with concurrent job support

πŸ“Έ Demo

Note: Screenshots and demo video coming soon! In the meantime, try the app yourself following the setup below.

πŸ’‘ Example Workflow

  1. Enter: "Give me 5 medium difficulty sorting problems"
  2. Select platforms: Codeforces, LeetCode
  3. Watch real-time scraping progress
  4. Review and curate problems
  5. Download filtered bundle with organized test cases

πŸƒ Quick Start

Prerequisites

Installation

# 1. Clone the repository
git clone https://github.com/AVPthegreat/codebase-problem-scrapper.git
cd codebase-problem-scrapper

# 2. Create virtual environment
python3 -m venv .venv

# 3. Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# 4. Install dependencies
pip install -e '.[dev]'

# 5. Launch the app
python scripts/run_web.py

First Run

Open your browser and navigate to http://127.0.0.1:8000

Try this example:

  • Prompt: "Give me 3 easy array problems"
  • Platforms: Select Codeforces
  • Difficulty: Easy
  • Click: "Generate Problems"

πŸ“– Usage

Basic Workflow

  1. Enter Your Prompt

    • Describe what problems you want (e.g., "5 dynamic programming problems")
    • Be specific about topics, difficulty, or quantity
  2. Configure Options

    • Platforms: Choose one or more (Codeforces, LeetCode, etc.)
    • Difficulty: Easy, Medium, Hard, or Mixed
    • Placeholder Mode: Enable for instant synthetic test problems
  3. Monitor Progress

    • Real-time progress bar and live logs
    • See which platforms are being scraped
    • Track problem discovery in real-time
  4. Curate Results

    • Review all discovered problems
    • Accept βœ… or reject ❌ individual problems
    • See problem descriptions and metadata
  5. Download Bundle

    • Click "Download Selected Problems"
    • Get a ZIP file with organized folders
    • Each problem includes .in and .out test files

Advanced Features

Placeholder Mode 🎭

  • Instant synthetic problems for UI testing
  • No network requests or rate limiting
  • Perfect for demos and development

Platform Filtering πŸ”

  • Select specific platforms for targeted scraping
  • Combine multiple sources in one bundle
  • Leave all unchecked to search everywhere

Job Queue πŸ“‹

  • Multiple jobs can be queued
  • Background processing doesn't block UI
  • Check /recent for job history

πŸ—οΈ Project Structure

codebase-problem-scrapper/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   └── services/
β”‚   β”‚       β”œβ”€β”€ orchestrator.py      # Core bundle generation logic
β”‚   β”‚       └── scrapers/            # Platform-specific scrapers
β”‚   β”‚           β”œβ”€β”€ codeforces.py
β”‚   β”‚           β”œβ”€β”€ leetcode.py
β”‚   β”‚           β”œβ”€β”€ codechef.py
β”‚   β”‚           β”œβ”€β”€ geeksforgeeks.py
β”‚   β”‚           └── atcoder.py
β”‚   └── webapp/
β”‚       β”œβ”€β”€ main.py                   # FastAPI application
β”‚       └── templates/                # Jinja2 HTML templates
β”‚           β”œβ”€β”€ index.html            # Main UI
β”‚           β”œβ”€β”€ job.html              # Job details
β”‚           └── recent.html           # Job history
β”œβ”€β”€ scripts/
β”‚   └── run_web.py                    # Server launcher
β”œβ”€β”€ tests/
β”‚   └── test_orchestrator.py         # Unit tests
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ ISSUE_TEMPLATE/               # Bug & feature templates
β”‚   └── pull_request_template.md     # PR template
β”œβ”€β”€ pyproject.toml                    # Dependencies & config
β”œβ”€β”€ LICENSE                           # MIT License
β”œβ”€β”€ CODE_OF_CONDUCT.md               # Community guidelines
β”œβ”€β”€ SECURITY.md                       # Security policy
└── README.md                         # You are here!

βš™οΈ Configuration

Environment Variables

Currently, the app runs without external API keys. Future enhancements may include:

  • OpenAI integration for intelligent test case generation
  • OAuth for platform authentication
  • Custom scraping rate limits

Custom Settings

Edit scripts/run_web.py to customize:

  • Port: Change from default 8000
  • Host: Bind to 0.0.0.0 for network access (add auth first!)
  • Workers: Adjust concurrent scraping jobs

πŸ§ͺ Testing

# Run all tests
pytest tests/

# Run with coverage
pytest tests/ --cov=src --cov-report=html

# Run specific test file
pytest tests/test_orchestrator.py -v

# Check code style
ruff check src/ tests/

πŸ›‘οΈ Security & Best Practices

  • βœ… Never commit .env files or virtual environments
  • βœ… Respect rate limits when scraping live platforms
  • βœ… Use placeholder mode for demos and testing
  • βœ… Add authentication before exposing beyond localhost
  • βœ… Review the Security Policy before reporting vulnerabilities

πŸ› Troubleshooting

Port 8000 already in use
# Kill process on port 8000
lsof -ti:8000 | xargs kill -9

# Or run on different port
# Edit scripts/run_web.py and change port number
Missing dependencies error
# Reinstall all dependencies
pip install -e '.[dev]'

# Or install specific package
pip install <package-name>
Virtual environment not activating
# On macOS/Linux
source .venv/bin/activate

# On Windows
.venv\Scripts\activate

# On Windows PowerShell
.venv\Scripts\Activate.ps1
Tests failing
# Ensure you're in virtual environment
source .venv/bin/activate  # or .venv\Scripts\activate

# Reinstall dependencies
pip install -e '.[dev]'

# Run tests with verbose output
pytest tests/ -v
Scraping returns no results
  • Check your internet connection
  • Some platforms may have rate limits or anti-scraping measures
  • Try placeholder mode for testing
  • Check platform availability (some may be down temporarily)

🀝 Contributing

We love contributions! Whether it's bug fixes, new features, or documentation improvements.

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to your branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

Contribution Ideas

  • 🎨 Add more platform scrapers
  • πŸš€ Improve scraping accuracy and speed
  • πŸ“± Create mobile-responsive UI
  • πŸ€– Integrate AI for test case generation
  • 🌐 Add internationalization support
  • πŸ“Š Add analytics and statistics

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • FastAPI - Modern Python web framework
  • Uvicorn - Lightning-fast ASGI server
  • Jinja2 - Powerful templating engine
  • All the competitive programming platforms for providing great problems

πŸ“¬ Contact & Support


Built with ❀️ by AVPTHEGREAT

⭐ Star this repo if you find it helpful!

Report Bug β€’ Request Feature

About

A mini project to help my startup CODEBASE

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published