Skip to content

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 12, 2026

Summary

  • Add AgentReviewCritic, a CriticBase implementation that spawns a separate OpenHands agent to review the current git diff.
  • The critic forks the current agent settings (LLM + agent config) from the running conversation rather than reading critic-specific environment variables.
  • Includes unit tests for critic output parsing.

Fixes #1704

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:de8bb36-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-de8bb36-python \
  ghcr.io/openhands/agent-server:de8bb36-python

All tags pushed for this build

ghcr.io/openhands/agent-server:de8bb36-golang-amd64
ghcr.io/openhands/agent-server:de8bb36-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:de8bb36-golang-arm64
ghcr.io/openhands/agent-server:de8bb36-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:de8bb36-java-amd64
ghcr.io/openhands/agent-server:de8bb36-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:de8bb36-java-arm64
ghcr.io/openhands/agent-server:de8bb36-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:de8bb36-python-amd64
ghcr.io/openhands/agent-server:de8bb36-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:de8bb36-python-arm64
ghcr.io/openhands/agent-server:de8bb36-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:de8bb36-golang
ghcr.io/openhands/agent-server:de8bb36-java
ghcr.io/openhands/agent-server:de8bb36-python

About Multi-Architecture Support

  • Each variant tag (e.g., de8bb36-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., de8bb36-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 12, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/critic/impl
   agent_review.py915836%35–36, 38–40, 42–44, 46, 53–56, 58–59, 62–66, 68–74, 76, 79–82, 84–89, 91, 94, 97–99, 104, 116–126, 145–147
TOTAL15210449170% 

@xingyaoww
Copy link
Collaborator Author

@OpenHands do a /codereview-roasted and help me address all review comments in this PR.

When you are done, you should help me write a script in examples/0 folder that demonstrate this critic work in action. You can refer to the existing delagation example, as well as the example introduced in #1269

@openhands-ai
Copy link

openhands-ai bot commented Jan 13, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 13, 2026

Final summary (double-checked)

✅ Checklist against your request

  • Address all PR review comments found on PR DRAFT: Critic agent that forks generator settings for PR-style review #1706
    • Switched parsing/output to use CriticResult directly (no custom output model).
    • Added llm: LLM | None field to AgentReviewCritic so callers can pass the LLM explicitly (removes the need for the “extract_llm hack” as the primary path).
  • Add an example script demonstrating the critic working in action
    • Added examples/00_critic/00_agent_review_critic_stop_hook.py showing a Stop-hook gating flow with AgentReviewCritic.

🔍 Conciseness / extraneous changes check

  • The diff vs origin/main contains only:
    • New critic implementation + exports
    • New unit tests for parsing
    • New example script
  • No unrelated refactors or drive-by formatting changes outside those files.

📌 What changed (since the last state on main)

  • New critic: openhands-sdk/openhands/sdk/critic/impl/agent_review.py
    • AgentReviewCritic.llm: LLM | None = None
    • _parse_output() now returns CriticResult
    • evaluate() returns parsed CriticResult directly
  • Exports wired up
    • openhands-sdk/openhands/sdk/critic/__init__.py
    • openhands-sdk/openhands/sdk/critic/impl/__init__.py
  • New tests: tests/sdk/critic/test_agent_review_critic.py
    • Updated assertions to match CriticResult
  • New example: examples/00_critic/00_agent_review_critic_stop_hook.py
    • Creates a temp git repo, generates a diff, runs the critic, and uses a Stop hook to gate stopping until approval.

🚀 Push status

View full conversation

@openhands-ai
Copy link

openhands-ai bot commented Jan 13, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • [Optional] Docs example

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1706 at branch `openhands/critic-agent-reviewer`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if __name__ == "__main__":
raise SystemExit(main())
"""
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not ideal. Should we make a hook ideally accepting a callback function as well?

Comment on lines +61 to +76
def _extract_llm(self, events: Sequence[LLMConvertibleEvent]) -> LLM | None:
for event in events:
agent = getattr(event, "agent", None)
llm = getattr(agent, "llm", None)
if isinstance(llm, LLM):
return llm

for event in events:
if not isinstance(event, SystemPromptEvent):
continue
agent = getattr(event, "agent", None)
llm = getattr(agent, "llm", None)
if isinstance(llm, LLM):
return llm

return None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this, since self.llm is already there

`not_pass`.
"""

llm: LLM | None = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be llm: LLM

It is a required arg

Comment on lines +78 to +91
def _extract_agent(self, events: Sequence[LLMConvertibleEvent]) -> Agent | None:
for event in events:
agent = getattr(event, "agent", None)
if isinstance(agent, Agent):
return agent

for event in events:
if not isinstance(event, SystemPromptEvent):
continue
agent = getattr(event, "agent", None)
if isinstance(agent, Agent):
return agent

return None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the get planning agent function in the openhands.tools.preset - we should reconstruct an Agent instance ourselves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Interleave codereview-roasted with agent task: add critic-agent model that reviews git diff and returns pass/fail

3 participants