Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Jan 7, 2026

Summary

  • keep full tag segments (including tag_latest) when truncating long cache/base slugs, even when registries include ports
  • use rsplit on the tag delimiter to avoid mis-parsing registry ports
  • add coverage for tag_latest and registry-port cases to prevent regressions

Testing

  • PYTHONPATH=openhands-agent-server:openhands-sdk python3 -m pytest tests/agent_server/test_docker_build.py -k base_slug

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:b2f1229-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-b2f1229-python \
  ghcr.io/openhands/agent-server:b2f1229-python

All tags pushed for this build

ghcr.io/openhands/agent-server:b2f1229-golang-amd64
ghcr.io/openhands/agent-server:b2f1229-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:b2f1229-golang-arm64
ghcr.io/openhands/agent-server:b2f1229-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:b2f1229-java-amd64
ghcr.io/openhands/agent-server:b2f1229-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:b2f1229-java-arm64
ghcr.io/openhands/agent-server:b2f1229-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:b2f1229-python-amd64
ghcr.io/openhands/agent-server:b2f1229-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:b2f1229-python-arm64
ghcr.io/openhands/agent-server:b2f1229-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:b2f1229-golang
ghcr.io/openhands/agent-server:b2f1229-java
ghcr.io/openhands/agent-server:b2f1229-python

About Multi-Architecture Support

  • Each variant tag (e.g., b2f1229-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., b2f1229-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL14981715852% 
report-only-changed-files is enabled. No files were changed during this commit :)

@simonrosenberg simonrosenberg self-assigned this Jan 7, 2026
@xingyaoww xingyaoww added the review-this This label triggers a PR review by OpenHands label Jan 9, 2026
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Fix cache tag truncation with ports and latest suffix

I've completed a thorough review of the PR changes. The implementation achieves its stated goals but has several issues that should be addressed.

✅ What Works Well

  1. Core fix is correct: Using rsplit instead of split properly handles registries with ports (e.g., ghcr.io:443/repo:tag)
  2. Test coverage added: New tests cover the main scenarios
  3. Debug logging: Added helpful logging for troubleshooting
  4. Length constraints maintained: All test cases properly respect max_len limits

🔴 Critical Issues

None - the code functions as intended for the target use cases.

🟡 Issues That Should Be Addressed

1. Misleading Comment About Preservation Guarantees (Lines 255-256)

Severity: Medium

Location: openhands-agent-server/openhands/agent_server/docker/build.py:255-256

Issue: The comment states:

# Tag portion alone is longer than the budget; keep the leading part
# so sentinel substrings like "tag_latest" remain intact.

This is misleading because the code only keeps the leading portion of the tag (tag_piece[:visible_budget]), which only preserves "tag_latest" if it appears at the beginning of the tag.

Evidence: I tested this with tags where "tag_latest" appears in different positions:

  • repo:tag_latest-suffix → Preserved (tag_latest at start)
  • repo:prefix-tag_latest-suffix → Preserved (enough budget)
  • repo:prefix-extra-info-tag_latestLost (tag_latest at end, truncated)
  • repo:very-long-prefix-tag_latest-suffixLost (tag_latest too far in)

Recommendation: Update the comment to be accurate:

# Tag portion alone is longer than the budget; keep the leading part
# which may preserve sentinel substrings like "tag_latest" if they appear
# near the beginning of the tag.

2. Test Assertion Inconsistency (Lines 233-246)

Severity: Low

Location: tests/agent_server/test_docker_build.py:233-246

Issue: test_base_slug_preserves_tag_with_registry_port is missing the digest suffix validation that exists in test_base_slug_preserves_latest_tag_suffix.

Current test (line 242-245):

assert len(result) <= 64
assert "tag_latest" in result

Should match the pattern from line 226-230:

assert len(result) <= 64
assert "tag_latest" in result
# Keep cache keys stable with digest suffix
assert result[-13:-12] == "-"
assert all(c in "0123456789abcdef" for c in result[-12:])

Recommendation: Add the digest suffix checks to test_base_slug_preserves_tag_with_registry_port for consistency.

3. Subtle Edge Case with Underscored Tags

Severity: Low (informational)

Location: openhands-agent-server/openhands/agent_server/docker/build.py:232

Issue: When a tag itself contains underscores (e.g., tag_latest), the rsplit approach splits within the tag name rather than at the tag delimiter.

Example:

  • Image: localhost:5001/repo:tag_latest-0a797356ebce
  • After slugification: localhost_tag_5001_s_repo_tag_tag_latest-0a797356ebce
  • After rsplit on _tag_: splits on the last _tag_ which is part of "tag_latest"
    • left: localhost_tag_5001_s_repo_tag
    • tag: latest-0a797356ebce (incorrect - should be tag_latest-0a797356ebce)

Why it still works: When reconstructed, the result still contains the substring "tag_latest" because the repo part includes _tag at the end, creating _tag_tag_latest in the final output.

Recommendation: Document this behavior or consider a more robust parsing approach in future work. For now, it works for the intended use case but is worth noting.

📝 Summary

Overall Assessment: The PR successfully achieves its goals of handling registry ports and preserving the "tag_latest" substring in common cases. The issues found are primarily about accuracy of comments and test consistency rather than functional problems.

Recommendations Priority:

  1. Fix the misleading comment (medium priority)
  2. Add consistent test assertions (low priority)
  3. Document the edge case behavior (optional)

Test Results: ✅ All 9 existing base_slug tests pass

The PR is functionally correct and can be merged, but addressing the comment accuracy would improve code maintainability.

@all-hands-bot
Copy link
Collaborator

[Automatic Post]: This PR seems to be currently waiting for review. @xingyaoww, could you please take a look when you have a chance?

Comment on lines 249 to 259
if len(ident) > visible_budget:
if tag_piece and len(tag_piece) < visible_budget:
# Keep the full tag portion and trim the repo prefix if needed.
head_budget = visible_budget - len(tag_piece)
ident = repo[:head_budget] + tag_piece
elif tag_piece:
# Tag portion alone is longer than the budget; keep the leading part
# so sentinel substrings like "tag_latest" remain intact.
ident = tag_piece[:visible_budget]
else:
ident = ident[:visible_budget]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands any way we can simplify this nested if-else? Do not make any changes just yet, only provide your thinking

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only answered the question (no code changes were made).

Re-stated answer: The nested if/elif/else can be simplified by separating (1) choosing what to preserve (repo+tag vs tag-only vs repo-only) from (2) doing the truncation. Concretely: add an early “fits already” guard, compute a single “candidate” string based on whether tag_piece exists and whether it can fit, then apply one uniform candidate[:visible_budget] slice at the end. This flattens the branching and keeps the same prioritization (preserve full tag when possible; otherwise preserve leading tag portion like tag_latest; otherwise truncate normally).

View full conversation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands see the comment above and simplify the nested if-else

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! simonrosenberg can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I simplified the nested if-else block in the _base_slug function as requested in the PR comment.

Changes Made

Before (3 branches):

if len(ident) > visible_budget:
    if tag_piece and len(tag_piece) < visible_budget:
        head_budget = visible_budget - len(tag_piece)
        ident = repo[:head_budget] + tag_piece
    elif tag_piece:
        ident = tag_piece[:visible_budget]
    else:
        ident = ident[:visible_budget]

After (2 branches):

if len(ident) > visible_budget:
    if tag_piece and len(tag_piece) < visible_budget:
        # Full tag fits: trim repo to make room
        ident = repo[: visible_budget - len(tag_piece)] + tag_piece
    else:
        # Either no tag or tag too long: truncate from the start
        ident = (tag_piece or ident)[:visible_budget]

Verification Checklist

  • Simplified the nested if-else as requested
  • Flattened elif/else into a single else using (tag_piece or ident)[:visible_budget]
  • All 9 base_slug tests pass
  • Pre-commit hooks pass (ruff format, ruff lint, pycodestyle, pyright)
  • Changes are minimal and focused on the requested simplification
  • Pushed to the fix-cache-tag-latest branch

The simplification works because tag_piece or ident evaluates to:

  • tag_piece when it exists (but is too long) → truncate tag_piece
  • ident when tag_piece is empty → truncate ident (which equals repo)

View full conversation

Flatten the elif/else branches into a single else clause using
(tag_piece or ident)[:visible_budget] which handles both cases:
- tag_piece exists but too long: truncate tag_piece
- no tag_piece: truncate ident (which equals repo)

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 13, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1626 at branch `fix-cache-tag-latest`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this This label triggers a PR review by OpenHands

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants