Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

  • add per-instance attempt timelines in evaluation orchestrator
  • capture SWTBench eval/harness timings and emit structured JSON timelines
  • inject profiling sitecustomize into swt-bench at runtime (optional via PROFILE_SWTBENCH)

Testing

  • not run (profiling-only instrumentation)

Note: vendor/software-agent-sdk submodule advanced to current main when updating base branch; no manual edits inside submodule.

@openhands-ai
Copy link

openhands-ai bot commented Jan 14, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #313 at branch `feature/swtbench-profiling`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@simonrosenberg simonrosenberg force-pushed the feature/swtbench-profiling branch from 33b5968 to 89428d5 Compare January 14, 2026 16:47
@juanmichelini juanmichelini self-requested a review January 14, 2026 21:34
@simonrosenberg
Copy link
Collaborator Author

simonrosenberg commented Jan 14, 2026

@juanmichelini perhaps this could NOT be merged (because a bit messy) but used as a hack when we need to profile and optimize so branch... We can just ask OH to merge this into any other feature branch and then we get the profiling logs for free.. But I do enjoy the idea of adding precise profiling logs into the report.jsons.
What is less good is that the profiling code depends on each benchmark and Sitecustomize is hacky and messy so idk

@juanmichelini
Copy link
Collaborator

@juanmichelini perhaps this could NOT be merged (because a bit messy) but used as a hack when we need to profile and optimize so branch... We can just ask OH to merge this into any other feature branch and then we get the profiling logs for free.. But I do enjoy the idea of adding precise profiling logs into the report.jsons. What is less good is that the profiling code depends on each benchmark and Sitecustomize is hacky and messy so idk

Tough choice! Thoughs on doing a PR to the SWT bench repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants