Skip to content

Conversation

@Ki-Seki
Copy link
Member

@Ki-Seki Ki-Seki commented Jan 20, 2026

Fixes #31

Copilot AI review requested due to automatic review settings January 20, 2026 07:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an --auto_budget flag that enables automatic determination of reasoning steps needed for each question during MCQA evaluation. When enabled, the system uses a model call with a single example to predict the appropriate reasoning budget for each question, overriding the manual --reason_budget setting.

Changes:

  • Added --auto_budget command-line flag for automatic reasoning budget determination
  • Modified _form_cot_query method in GIMEvaluator to dynamically determine reasoning budget via model inference
  • Added error handling with fallback to budget of 1 when auto-determination fails

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/gimbench/arguments.py Added --auto_budget flag argument definition
src/gimbench/mcqa/evaluators.py Implemented auto-budget logic with model-based determination and error handling

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Ki-Seki and others added 2 commits January 20, 2026 15:48
@Ki-Seki Ki-Seki merged commit dd7f344 into main Jan 20, 2026
3 checks passed
@Ki-Seki Ki-Seki deleted the feat/auto-budget branch January 20, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add auto-budget option

2 participants