Official website: https://rsa-llm.github.io/
Run ARC-AGI tasks against multiple model adapters (OpenAI, Anthropic, Gemini, Fireworks, Grok, OpenRouter, X.AI, custom etc.) with built-in rate limiting, retries, and scoring. We currently recommend gemini-3-flash-preview which achieves strong performance at low cost, and was used for our evals.
- Clone this repo:
git clone https://github.com/HyperPotatoNeo/RSA-ARC.git
cd RSA-ARC- Install (installs all adapters + SDKs):
pip install .- Download the ARC-AGI dataset:
- ARC-AGI-1 (2019):
git clone https://github.com/fchollet/ARC-AGI.git data/arc-agi - ARC-AGI-2 (2025):
git clone https://github.com/arcprize/ARC-AGI-2.git data/arc-agi
- RSA run on ARC-AGI-2 with
gemini-3-flash-preview, aggregation sizeK=4, population sizeN=16, sequential stepsT=10:
python cli/rsa_eval.py \
--config "gemini-3-flash-preview-thinking-high" \
--data_dir data/arc-agi/data/evaluation \
--save_submission_dir submissions/arc_rsa \
--k 4 \
--population 16 \
--loops 10- Score the outputs generated from the final RSA step from submission_dir:
python src/arc_agi_benchmarking/scoring/scoring.py \
--task_dir data/arc-agi/data/evaluation \
--submission_dir submissions/arc_rsa/loop_9 \
--results_dir results/arc_results