Agent-sdk integration for SWE bench-styled tasks #2

adityasoni9998 · 2026-01-07T00:51:23Z

No description provided.

ApGa · 2026-01-14T18:51:34Z

plugins/openhands/platoon/openhands/agent.py

 from copy import deepcopy
 from platoon.envs.base import Task
-from platoon.openhands.types import OpenHandsObservation, OpenHandsAction
+from .types import OpenHandsObservation, OpenHandsAction


this change doesn't seem necessary?

ApGa · 2026-01-14T19:14:12Z

plugins/openhands_rl/platoon/openhands_rl/env.py

Let's remove this file for now if it is not needed yet.

ApGa · 2026-01-14T19:16:25Z

plugins/openhands/platoon/openhands/env.py

        if self._conversation is not None:
            self._conversation.close()
        self._conversation = None
+        # TODO: check if cleaning up workspace manually is required


Let's tackle the TODOs before merging changes, if they are quick enough to do.

ApGa · 2026-01-14T19:17:28Z

plugins/openhands_rl/README.md

@@ -0,0 +1,41 @@
+# platoon-openhands-rl


Instead of openhands-rl I wonder if we should name this plugin something like swe-issue-resolution?

ApGa · 2026-01-14T19:18:47Z

plugins/openhands_rl/platoon/openhands_rl/train.py

+import sys
+import logging
+from datasets import Dataset
+from areal.api.cli_args import load_expr_config


I think we can actually remove all of these for now. I switched AreaL code to use prints instead of logging for now.

ApGa · 2026-01-14T21:48:01Z

plugins/openhands_rl/platoon/openhands_rl/rollout.py

+from platoon.episode.context import current_trajectory_collection
+from pydantic import SecretStr
+from platoon.visualization.event_sinks import JsonlFileSink
+from .tasks import EVAL_AGENT_SERVER_IMAGE, SDK_SHORT_SHA, ENV_SETUP_COMMANDS, PROMPT_FILENAME


let's avoid relative imports, for consistency

ApGa · 2026-01-14T21:48:55Z

plugins/openhands_rl/platoon/openhands_rl/rollout.py

+#     official_image_name = docker_image_prefix.rstrip("/")
+#     official_image_name += f"/sweb.eval.x86_64.{repo}_1776_{name}:latest".lower()
+#     logger.debug(f"Official SWE-Bench image: {official_image_name}")
+#     return official_image_name


let's think of a way to clean this up to make it more modular and extensible. Perhaps can adopt some sort of factory pattern for different datasets.

Similar for the get_instruction and any other dataset-specific functions below.

ApGa · 2026-01-14T21:51:42Z

plugins/openhands/platoon/openhands/agent.py

 from platoon.envs.base import Task
-from platoon.openhands.types import OpenHandsObservation, OpenHandsAction
+from .types import OpenHandsObservation, OpenHandsAction
 from platoon.utils.openhands_utils import get_actions_for_last_obs


Let's move the openhands utils into the openhands plugin

ApGa · 2026-01-14T21:55:04Z

plugins/openhands_rl/platoon/openhands_rl/rollout.py

+            server_image=agent_server_image,
+            target_type="source" if "source" in build_target else "binary",
+        )
+    else: #NOTE: Docker workspace not supported yet since Babel doesn't allow docker access


Can we add docker as well? Babel is specific to CMU, so need to restrict this for the general user.

ApGa · 2026-01-14T22:27:49Z

plugins/openhands_rl/platoon/openhands_rl/rollout.py

+    instruction = template.render(context)
+    return instruction
+
+def prepare_workspace(instance: dict, task: Task) -> BaseWorkspace:


It might be better to move all the helper functions like this and others (e.g., prepare_llm) into utils (either openhands plugin utils if it is relevant to all plugins that might leverage openhands or utils for this plugin, if it is issue-solving-specific).

That way if someone wants to add another custom rollout file they can import from those utils.

adityasoni9998 added 2 commits January 6, 2026 19:34

Add openhands-rl code for SWE tasks

c23c6b4

cleanup and minor bug fixes

e85de91

adityasoni9998 requested a review from ApGa January 7, 2026 00:55

ApGa requested changes Jan 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agent-sdk integration for SWE bench-styled tasks #2

Agent-sdk integration for SWE bench-styled tasks #2

Uh oh!

adityasoni9998 commented Jan 7, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

ApGa Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Agent-sdk integration for SWE bench-styled tasks #2

Are you sure you want to change the base?

Agent-sdk integration for SWE bench-styled tasks #2

Uh oh!

Conversation

adityasoni9998 commented Jan 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants