-
Notifications
You must be signed in to change notification settings - Fork 2
Agent-sdk integration for SWE bench-styled tasks #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: apga/tinker_experimental
Are you sure you want to change the base?
Agent-sdk integration for SWE bench-styled tasks #2
Conversation
| from copy import deepcopy | ||
| from platoon.envs.base import Task | ||
| from platoon.openhands.types import OpenHandsObservation, OpenHandsAction | ||
| from .types import OpenHandsObservation, OpenHandsAction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change doesn't seem necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this file for now if it is not needed yet.
| if self._conversation is not None: | ||
| self._conversation.close() | ||
| self._conversation = None | ||
| # TODO: check if cleaning up workspace manually is required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's tackle the TODOs before merging changes, if they are quick enough to do.
| @@ -0,0 +1,41 @@ | |||
| # platoon-openhands-rl | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of openhands-rl I wonder if we should name this plugin something like swe-issue-resolution?
| import sys | ||
| import logging | ||
| from datasets import Dataset | ||
| from areal.api.cli_args import load_expr_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can actually remove all of these for now. I switched AreaL code to use prints instead of logging for now.
| from platoon.episode.context import current_trajectory_collection | ||
| from pydantic import SecretStr | ||
| from platoon.visualization.event_sinks import JsonlFileSink | ||
| from .tasks import EVAL_AGENT_SERVER_IMAGE, SDK_SHORT_SHA, ENV_SETUP_COMMANDS, PROMPT_FILENAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's avoid relative imports, for consistency
| # official_image_name = docker_image_prefix.rstrip("/") | ||
| # official_image_name += f"/sweb.eval.x86_64.{repo}_1776_{name}:latest".lower() | ||
| # logger.debug(f"Official SWE-Bench image: {official_image_name}") | ||
| # return official_image_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's think of a way to clean this up to make it more modular and extensible. Perhaps can adopt some sort of factory pattern for different datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar for the get_instruction and any other dataset-specific functions below.
| from platoon.envs.base import Task | ||
| from platoon.openhands.types import OpenHandsObservation, OpenHandsAction | ||
| from .types import OpenHandsObservation, OpenHandsAction | ||
| from platoon.utils.openhands_utils import get_actions_for_last_obs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move the openhands utils into the openhands plugin
| server_image=agent_server_image, | ||
| target_type="source" if "source" in build_target else "binary", | ||
| ) | ||
| else: #NOTE: Docker workspace not supported yet since Babel doesn't allow docker access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add docker as well? Babel is specific to CMU, so need to restrict this for the general user.
| instruction = template.render(context) | ||
| return instruction | ||
|
|
||
| def prepare_workspace(instance: dict, task: Task) -> BaseWorkspace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to move all the helper functions like this and others (e.g., prepare_llm) into utils (either openhands plugin utils if it is relevant to all plugins that might leverage openhands or utils for this plugin, if it is issue-solving-specific).
That way if someone wants to add another custom rollout file they can import from those utils.
No description provided.