Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

  • add helper to drop top-level file diffs (e.g., reproduction.py) when converting patches for SWTBench
  • apply the filter in swtbench eval_infer conversion to match OpenHands behavior
  • prevents helper scripts from being treated as test directives by the harness

Context

Recent SWTBench runs showed many failures because top-level reproduction scripts were included in patches and executed as tests. OpenHands strips these before evaluation; this aligns our pipeline with that behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants