This repository contains experiments comparing different approaches for few-shot text classification:
- Direct training on seed examples (SetFit, Encoder baselines)
- Static LLM prompting to generate synthetic training data
- Agentic LLM generation with iterative refinement
# Install dependencies with uv
uv sync
# Or with pip
pip install -e .Train SetFit or Encoder baseline directly on seed examples without any data generation.
# Single experiment
python src/orch_setfit.py --dataset-name CR --examples-per-label 8 --baseline setfit
# Parallel experiments with multiple seeds
python src/orch_setfit.py --dataset-name CR --baseline setfit --parallel \
--seeds 10 20 30 --examples-counts 2 4 8 --max-workers 3Programmatic usage:
from src.orch_setfit import run_parallel_experiments
run_parallel_experiments(
seeds=[10, 20, 30],
examples_counts=[2, 4, 8],
dataset_name="CR",
baseline="setfit", # or "encoder"
max_workers=3,
output_dir="results",
)Generate synthetic examples using simple LLM prompting, then train SetFit baseline.
from src.simple_example import run_parallel_experiments
run_parallel_experiments(
seeds=[10, 20, 30],
examples_counts=[0, 2, 4, 8], # 0 = zero-shot
dataset_name="sst5",
llm_model="openrouter/openai/gpt-4o-mini",
max_workers=3,
output_dir="results",
)Use agentic LLM approach with iterative refinement via SetFitTool feedback.
from src.agentic_simple_example import run_parallel_agentic_experiments
run_parallel_agentic_experiments(
seeds=[10, 20, 30],
examples_counts=[0, 2, 4, 8],
dataset_name="sst5",
baseline_type="setfit", # or "encoder"
llm_model="openrouter/openai/gpt-5",
max_workers=2,
output_dir="results",
)| Script | Approach | Data Generation | Description |
|---|---|---|---|
orch_setfit.py |
Direct training | None | Baseline - train on seed examples only |
simple_example.py |
Static prompting | ExampleGenerator | LLM generates synthetic data |
agentic_simple_example.py |
Agentic | ToolCallingAgent | Agent iteratively refines data |
Contrastive learning with sentence transformers. Fast, efficient for few-shot.
Standard transformer fine-tuning with AutoModelForSequenceClassification. Default model: EuroBERT-210m
Common arguments for orch_setfit.py:
--dataset-name: Dataset name (CR, sst5, emotion, ag_news, etc.)--examples-per-label: Number of seed examples per label--seed: Random seed for reproducibility--baseline: Baseline type ("setfit" or "encoder")--model-name: Override default model
Parallel experiment arguments (orch_setfit.py --parallel):
--seeds: List of seeds (e.g., 10 20 30)--examples-counts: List of example counts (e.g., 2 4 8)--max-workers: Number of parallel workers--output-dir: Output directory for results
# 1. Direct SetFit baseline
from src.orch_setfit import run_parallel_experiments
run_parallel_experiments(
seeds=[10, 20, 30],
examples_counts=[2, 4, 8],
dataset_name="sst5",
baseline="setfit",
max_workers=3,
output_dir="results",
)
# 2. Static prompting + SetFit
from src.simple_example import run_parallel_experiments
run_parallel_experiments(
seeds=[10, 20, 30],
examples_counts=[0, 2, 4, 8],
dataset_name="sst5",
max_workers=3,
output_dir="results",
)
# 3. Agentic + SetFit
from src.agentic_simple_example import run_parallel_agentic_experiments
run_parallel_agentic_experiments(
seeds=[10, 20, 30],
examples_counts=[0, 2, 4, 8],
dataset_name="sst5",
baseline_type="setfit",
max_workers=2,
output_dir="results",
)Results are saved to results/ directory as JSON files with accuracy statistics.
src/
├── baselines/
│ ├── setfit.py # SetFit baseline
│ └── encoder.py # Encoder baseline (EuroBERT, ModernBERT)
├── dataset_folder/
│ └── huggingface_dataset_reader.py # Dataset loading
├── agentic/
│ └── setfit_wrapper_tool.py # SetFitTool for agents
├── orch_setfit.py # Direct training (parallel experiments)
├── simple_example.py # Static LLM prompting (parallel experiments)
├── agentic_simple_example.py # Agentic generation (parallel experiments)
└── llm_example_generator.py # LLM-based data generator
Set these in .env file:
OPENROUTER_API_KEY=your_key_here