Skip to content

Conversation

@juanmichelini
Copy link
Collaborator

Summary

This PR renames the --max-attempts parameter to --n-critic-runs across the benchmarks codebase to better reflect its purpose: controlling the number of critic evaluation runs in iterative mode.

Changes

  • CLI argument: --max-attempts--n-critic-runs
  • Model field: max_attemptsn_critic_runs (EvalMetadata)
  • Updated files: 17 files total
    • 7 run_infer.py files (all benchmarks)
    • Core logic files (evaluation.py, iterative.py)
    • 4 test files
    • 2 README documentation files

Breaking Changes

This is a breaking change for users. Existing scripts and workflows must be updated.

Migration Required

  • CLI usage: --max-attempts 3--n-critic-runs 3
  • Python API: EvalMetadata(max_attempts=3)EvalMetadata(n_critic_runs=3)

Related PRs

This PR is part of a coordinated change. A corresponding PR will be created for the evaluation repo.

Testing

  • All test files updated
  • Changes verified across all benchmarks

This breaking change renames the parameter across the codebase to better
reflect its purpose: controlling the number of critic evaluation runs in
iterative mode, not general retry attempts.

Changes:
- CLI argument: --max-attempts → --n-critic-runs
- Model field: max_attempts → n_critic_runs (EvalMetadata)
- Updated all run_infer.py files (7 benchmarks)
- Updated core logic (evaluation.py, iterative.py)
- Updated test files (4 files)
- Updated documentation (2 README files)

Migration guide:
- Update CLI usage: --max-attempts 3 → --n-critic-runs 3
- Update EvalMetadata construction: max_attempts= → n_critic_runs=

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #325 at branch `rename-max-attempts-to-n-critic-runs`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants