This is a tool to generate Python programming exercise solutions and distill writing styles of Python exercises using large language models (LLMs). It supports few-shot learning from example exercises and can fine-tune models for improved performance. This is primarily an exploratory project to investigate LLM capabilities and limitations of fine-tuning.
Attempts at getting LLMs to generate high-quality programming exercise solutions should start with good prompt engineering and few-shot learning from example exercises. To do that, we can first attempt to have an LLM distill the writing style from a large set of example exercises (using prompts/distillation/default.md by default):
uv run python-exercises-generator distill --examples "example1,example2,example3,example4,example5,example6"
That distilled style can then be used as part of a prompt such as prompts/generation/default.md to generate new solutions for other exercises (as part of generation we also include a few shot examples):
uv run python-exercises-generator generate --model google/gemma-3-27b-it:free --exercise new_exercise --examples "example1,example2"We can also generate solutions for a batch of exercises in one go using batch-generate. Here is an example generating solutions for the default set of exercises and examples defined in .env for some common models:
uv run python-exercises-generator batch-generate --model google/gemma-3-27b-it:free
uv run python-exercises-generator batch-generate --model qwen/qwen3-coder-30b-a3b-instruct
uv run python-exercises-generator batch-generate --model openai/gpt-4o-mini-2024-07-18
uv run python-exercises-generator batch-generate --model google/gemini-3-flash-preview
uv run python-exercises-generator batch-generate --model anthropic/claude-sonnet-4.5We can also attempt to fine-tune a model on a set of example exercises to improve generation quality further (NVIDIA GPU with CUDA v12.8+ required) with the defined presets in finetuning.py.
uv run python-exercises-generator fine-tune qwen3-coder-30b-a3b-instruct
uv run python-exercises-generator fine-tune gemma-3-27b-itThese will save LoRA adapters to output/finetuned_models/<model>-finetuned-python-exercises. We can then use these fine-tuned models for generation or batch generation via the --finetuned-model option by running direct inference with unsloth on the fine-tuned model with LoRA adapter:
uv run python-exercises-generator batch-generate --finetuned-model qwen3-coder-30b-a3b-instruct
uv run python-exercises-generator batch-generate --finetuned-model gemma-3-27b-itIt is also possible to serve the fine tuned models via vllm if you fine-tune with --save-merged to save a merged full model. See vllm documentation for details on serving models. Once available at an HTTP endpoint, you can use the --base-url and --api-key options to run generation/batch generation against the served model.
Finally, we can attempt to fine-tune OpenAI models using their fine-tuning API:
uv run python-exercises-generator openai-finetune --model gpt-4o-mini-2024-07-18 --waitOnce the fine-tuning job is complete, we can use the fine-tuned model ID (e.g. ft:...) with the generate or batch-generate subcommands by passing the model ID via the --model option along with the OpenAI base URL and API key:
uv run python-exercises-generator batch-generate --model ft:your-model-id --base-url https://api.openai.com/v1 --api-key $OPENAI_API_KEYRequires Python 3.12+.
Install the base dependencies:
uv syncIf fine tuning (NVIDIA GPU required with CUDA v12.8+), install the additional finetuning dependencies:
uv sync --extra finetuneThen set up a default set of examples/exercises to use when generating/distilling by editing the .env file:
cp .env.sample .envFinally, place any sample exercises you want to use in the data/exercises directory. Each exercise should be in its own subdirectory with:
problem.md(optional): problem statement used for generation.solution.md(optional): solution used for distillation or finetuning.
Exercise IDs used by CLI flags and .env defaults correspond to the subdirectory names under data/exercises.
The CLI reads defaults from .env and environment variables:
DEFAULT_GENERATION_EXAMPLES: comma-separated example exercise IDs used for few-shot generation.DEFAULT_GENERATION_EXERCISES: comma-separated exercise IDs used bybatch-generate.DEFAULT_DISTILLATION_EXERCISES: comma-separated exercise IDs used for style distillation.- Prompt templates live in
prompts/generation,prompts/distillation, andprompts/finetune(pass the filename stem via--prompt).
Set your API keys for LLM integration (defaults to OpenRouter via OPENROUTER_API_KEY; LLM_API_KEY is also supported):
export OPENROUTER_API_KEY="your-key-here"Alternatively, use a custom LLM endpoint:
export LLM_BASE_URL="https://your-api-endpoint.com"
export LLM_API_KEY="your-key-here"If neither LLM_API_KEY nor OPENROUTER_API_KEY is set, OPENAI_API_KEY is used as a fallback for non-ft: models.
For OpenAI fine-tuning and fine-tuned inference (also used when you pass an ft: model ID):
export OPENAI_API_KEY="your-openai-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"Generate solutions for programming exercise problems using few-shot learning from example exercises.
Generate a solution from a problem statement defined in an exercise:
uv run python-exercises-generator generate --exercise countdown --prettyGenerate a solution from a problem statement via stdin:
echo "Write a function that counts down from n to 0" | uv run python-exercises-generator generateGenerate with custom examples:
uv run python-exercises-generator generate --exercise flatten --examples "ages,compact,easydict"--pretty: Print result with markdown formatting--examples: Comma-separated list of example exercise names (default: usesDEFAULT_GENERATION_EXAMPLESfrom.env)--prompt: Name of prompt template to use (default: "default")--exercise: Name of exercise to use as problem statement (alternative to stdin)--save: Saves the generated solution tooutput/generations/<exercise>/<model>[_<prompt>].md(model name is sanitized)--model: Model to use for generation (default: "meta-llama/llama-3.3-70b-instruct:free")--finetuned-model: Preset name for a fine tuned model to use (overrides--modelif provided)--base-url: Override the LLM base URL (otherwise usesLLM_BASE_URLorOPENROUTER_API_KEYdefaults)--api-key: Override the LLM API key (otherwise usesLLM_API_KEY,OPENROUTER_API_KEY, orOPENAI_API_KEY)
Distill and analyze the writing style from a collection of example exercise solutions.
Distill style from default examples:
uv run python-exercises-generator distill--pretty: Print result with markdown formatting--examples: Comma-separated list of example exercise names (default: usesDEFAULT_DISTILLATION_EXERCISESfrom.env)--prompt: Name of prompt template to use (default: "default")--model: Model to use for distillation (default: "meta-llama/llama-3.3-70b-instruct:free")
Generate solutions for a batch of exercises in one go (uses threading to parallelize unless using a finetuned model). Saves output to output/generations/<exercise>/<model>[_<prompt>].md:
uv run python-exercises-generator batch-generate --model google/gemma-3-27b-it:free
uv run python-exercises-generator batch-generate --model qwen/qwen3-coder-30b-a3b-instruct
uv run python-exercises-generator batch-generate --model openai/gpt-4o-mini-2024-07-18--exercises: Comma-separated list of exercises to generate for (usesproblem.mdin each). Defaults toDEFAULT_GENERATION_EXERCISESfrom.env--examples: Comma-separated list of example exercise names (default: usesDEFAULT_GENERATION_EXAMPLESfrom.env)--prompt: Name of prompt template to use (default: "default")--model: Model to use for generation (default: "meta-llama/llama-3.3-70b-instruct:free")--finetuned-model: Preset name for a fine tuned model to use (overrides--modelif provided)--base-url: Override the LLM base URL--api-key: Override the LLM API key
Fine tune a model on a set of example exercises (NVIDIA GPU with CUDA v12.8+ required) based on a preset name (see src/python_exercises_generator/finetune/trainer.py for details). Current presets include:
qwen3-coder-30b-a3b-instructgemma-3-27b-it
uv run python-exercises-generator fine-tune gemma-3-27b-it--prompt: Name of prompt template to use (default: "default")--save-merged: Normally we just save a LoRA adapter. With this flag, we also save a merged full model (may be very large).
Given a model that has already been fine tuned, run inference on a problem statement:
uv run python-exercises-generator fine-tune-inference gemma-3-27b-it --message "Write a python program to play sudoku."To run inference via the generate or batch-generate subcommands, use the --finetuned-model option with the preset name (see above).
Merge a LoRA adapter into the base model and save the merged weights:
uv run python-exercises-generator finetune-save-merged gemma-3-27b-it--output-dir: Directory to save the merged model (default:output/finetuned_models/<model>-finetuned-python-exercises-merged)--save-method: One ofmerged_16bit,merged_4bit, orlora(default:merged_16bit)--push-to-hub: Push the merged model to Hugging Face Hub after saving
Once you have access to a remote GPU host with CUDA v12.8+ and have set up SSH access, rsync or SCP the project directory to the remote host, ensure uv is installed, then run the Installation steps above. After that, you can run the fine tuning commands above via SSH. Due to the potentially long runtime of fine tuning jobs, it is recommended to use tmux or screen to keep the session alive.
Modal.com can be used to run fine tuning jobs on their GPU instances. First, ensure you have a Modal account and have set up the Modal CLI. Then, you can run the fine tuning job using the provided src/python_exercises_generator/integrations/modal_app.py script:
uv run modal run src/python_exercises_generator/integrations/modal_app.py::app.finetune --model gemma-3-27b-itYou can then run a batch generation job on Modal as well to generate for all default exercises:
uv run modal run src/python_exercises_generator/integrations/modal_app.py::app.batch_generate --model gemma-3-27b-itPrepare JSONL data, upload to OpenAI, and start a fine-tuning job:
uv run python-exercises-generator openai-finetune --model gpt-4o-mini-2024-07-18 --prompt default --waitExport only (no upload/job):
uv run python-exercises-generator openai-finetune --model gpt-4o-mini-2024-07-18 --export-onlyCheck job status or wait on a job:
uv run python-exercises-generator openai-finetune-status --job-id ftjob_... --watch--prompt: Prompt template name (default: "default")--model: Base OpenAI model to fine-tune (required)--validation-split: Fraction of examples for validation (default: 0.1)--seed: Random seed for the train/validation split (default: 3407)--suffix: Optional suffix for the fine-tuned model name--output-dir: Directory for JSONL files (default:output/openai_finetune/<prompt>)--export-only: Only export JSONL files without upload/job creation--wait: Wait for the fine-tuning job to complete--base-url: Override OpenAI base URL--api-key: Override OpenAI API key
--job-id: Fine-tuning job ID (required)--watch: Poll the job until it completes--interval: Polling interval in seconds (default: 30)--timeout: Optional timeout in seconds--base-url: Override OpenAI base URL--api-key: Override OpenAI API key
Once you have a fine-tuned model ID (e.g. ft:...), run inference using generate or batch-generate with the --model and --base-url options:
uv run python-exercises-generator generate --model ft:your-model-id --base-url https://api.openai.com/v1 --api-key $OPENAI_API_KEY
uv run python-exercises-generator batch-generate --model ft:your-model-id --base-url https://api.openai.com/v1 --api-key $OPENAI_API_KEY