ManyICLBench

📄 Accepted to ACL 2025 (Main Conference)
🔗 Paper | 🏆 Leaderboard | 📊 Dataset

📌 Overview

ManyICLBench is a benchmark designed to evaluate long-context language models (LCLMs) via many-shot in-context learning (ICL). We investigate whether performance improves with additional demonstrations and introduce a new metric, Sample Learning Ratio (SLR), to characterize task types:

SSL (Similar-Sample Learning): Tasks where models benefit from retriving similar demostrations.
ASL (All-Sample Learning): Tasks where models need to understand all demonstrations.

⚙️ Getting Started

1. Install dependencies

pip install -r requirements.txt

2. Run evaluation on a model

bash start_vllm_serve.sh

Wait till the server starts

bash evaluate.sh

3. Generate your final results

python create_csv.py

🚀 Leaderboard

Submit your model results at:
📍 https://huggingface.co/spaces/launch/ManyICLBench_Leaderboard

📑 Citation

If you use our benchmark or results in your work, please cite us:

@article{zou2025manyshotincontextlearninglongcontext,
  title={On Many-Shot In-Context Learning for Long-Context Evaluation}, 
  author={Kaijian Zou and Muhammad Khalifa and Lu Wang},
  journal={arXiv preprint arXiv:2411.07130},
  year={2025}
}

📬 Contact

🧑‍💻 Lead author: Kaijian Zou(zkjzou@umich.edu)
❓ For questions or bugs: please open an issue

🙏 Acknowledgements

Part of the codebase is based on RULER

We thank the reviewers at ICLR 2025 and ACL 2025 for their insightful feedback. We also appreciate the Hugging Face and vLLM communities for their tools and infrastructure, which greatly supported this project.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
bs_mapping.json		bs_mapping.json
call_api.py		call_api.py
client_wrappers.py		client_wrappers.py
create_csv.py		create_csv.py
dataset.py		dataset.py
evaluate.sh		evaluate.sh
requirements.txt		requirements.txt
serve_vllm.py		serve_vllm.py
start_vllm_serve.sh		start_vllm_serve.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ManyICLBench

📌 Overview

⚙️ Getting Started

1. Install dependencies

2. Run evaluation on a model

3. Generate your final results

🚀 Leaderboard

📑 Citation

📬 Contact

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

launchnlp/ManyICLBench

Folders and files

Latest commit

History

Repository files navigation

ManyICLBench

📌 Overview

⚙️ Getting Started

1. Install dependencies

2. Run evaluation on a model

3. Generate your final results

🚀 Leaderboard

📑 Citation

📬 Contact

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages