-
Notifications
You must be signed in to change notification settings - Fork 109
Fix cache tag truncation with ports and latest suffix #1626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This reverts commit edf1fcd.
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review: Fix cache tag truncation with ports and latest suffix
I've completed a thorough review of the PR changes. The implementation achieves its stated goals but has several issues that should be addressed.
✅ What Works Well
- Core fix is correct: Using
rsplitinstead ofsplitproperly handles registries with ports (e.g.,ghcr.io:443/repo:tag) - Test coverage added: New tests cover the main scenarios
- Debug logging: Added helpful logging for troubleshooting
- Length constraints maintained: All test cases properly respect max_len limits
🔴 Critical Issues
None - the code functions as intended for the target use cases.
🟡 Issues That Should Be Addressed
1. Misleading Comment About Preservation Guarantees (Lines 255-256)
Severity: Medium
Location: openhands-agent-server/openhands/agent_server/docker/build.py:255-256
Issue: The comment states:
# Tag portion alone is longer than the budget; keep the leading part
# so sentinel substrings like "tag_latest" remain intact.This is misleading because the code only keeps the leading portion of the tag (tag_piece[:visible_budget]), which only preserves "tag_latest" if it appears at the beginning of the tag.
Evidence: I tested this with tags where "tag_latest" appears in different positions:
- ✅
repo:tag_latest-suffix→ Preserved (tag_latest at start) - ✅
repo:prefix-tag_latest-suffix→ Preserved (enough budget) - ❌
repo:prefix-extra-info-tag_latest→ Lost (tag_latest at end, truncated) - ❌
repo:very-long-prefix-tag_latest-suffix→ Lost (tag_latest too far in)
Recommendation: Update the comment to be accurate:
# Tag portion alone is longer than the budget; keep the leading part
# which may preserve sentinel substrings like "tag_latest" if they appear
# near the beginning of the tag.2. Test Assertion Inconsistency (Lines 233-246)
Severity: Low
Location: tests/agent_server/test_docker_build.py:233-246
Issue: test_base_slug_preserves_tag_with_registry_port is missing the digest suffix validation that exists in test_base_slug_preserves_latest_tag_suffix.
Current test (line 242-245):
assert len(result) <= 64
assert "tag_latest" in resultShould match the pattern from line 226-230:
assert len(result) <= 64
assert "tag_latest" in result
# Keep cache keys stable with digest suffix
assert result[-13:-12] == "-"
assert all(c in "0123456789abcdef" for c in result[-12:])Recommendation: Add the digest suffix checks to test_base_slug_preserves_tag_with_registry_port for consistency.
3. Subtle Edge Case with Underscored Tags
Severity: Low (informational)
Location: openhands-agent-server/openhands/agent_server/docker/build.py:232
Issue: When a tag itself contains underscores (e.g., tag_latest), the rsplit approach splits within the tag name rather than at the tag delimiter.
Example:
- Image:
localhost:5001/repo:tag_latest-0a797356ebce - After slugification:
localhost_tag_5001_s_repo_tag_tag_latest-0a797356ebce - After rsplit on
_tag_: splits on the last_tag_which is part of "tag_latest"- left:
localhost_tag_5001_s_repo_tag - tag:
latest-0a797356ebce(incorrect - should betag_latest-0a797356ebce)
- left:
Why it still works: When reconstructed, the result still contains the substring "tag_latest" because the repo part includes _tag at the end, creating _tag_tag_latest in the final output.
Recommendation: Document this behavior or consider a more robust parsing approach in future work. For now, it works for the intended use case but is worth noting.
📝 Summary
Overall Assessment: The PR successfully achieves its goals of handling registry ports and preserving the "tag_latest" substring in common cases. The issues found are primarily about accuracy of comments and test consistency rather than functional problems.
Recommendations Priority:
- Fix the misleading comment (medium priority)
- Add consistent test assertions (low priority)
- Document the edge case behavior (optional)
Test Results: ✅ All 9 existing base_slug tests pass
The PR is functionally correct and can be merged, but addressing the comment accuracy would improve code maintainability.
|
[Automatic Post]: This PR seems to be currently waiting for review. @xingyaoww, could you please take a look when you have a chance? |
| if len(ident) > visible_budget: | ||
| if tag_piece and len(tag_piece) < visible_budget: | ||
| # Keep the full tag portion and trim the repo prefix if needed. | ||
| head_budget = visible_budget - len(tag_piece) | ||
| ident = repo[:head_budget] + tag_piece | ||
| elif tag_piece: | ||
| # Tag portion alone is longer than the budget; keep the leading part | ||
| # so sentinel substrings like "tag_latest" remain intact. | ||
| ident = tag_piece[:visible_budget] | ||
| else: | ||
| ident = ident[:visible_budget] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands any way we can simplify this nested if-else? Do not make any changes just yet, only provide your thinking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only answered the question (no code changes were made).
Re-stated answer: The nested if/elif/else can be simplified by separating (1) choosing what to preserve (repo+tag vs tag-only vs repo-only) from (2) doing the truncation. Concretely: add an early “fits already” guard, compute a single “candidate” string based on whether tag_piece exists and whether it can fit, then apply one uniform candidate[:visible_budget] slice at the end. This flattens the branching and keeps the same prioritization (preserve full tag when possible; otherwise preserve leading tag portion like tag_latest; otherwise truncate normally).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands see the comment above and simplify the nested if-else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! simonrosenberg can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
I simplified the nested if-else block in the _base_slug function as requested in the PR comment.
Changes Made
Before (3 branches):
if len(ident) > visible_budget:
if tag_piece and len(tag_piece) < visible_budget:
head_budget = visible_budget - len(tag_piece)
ident = repo[:head_budget] + tag_piece
elif tag_piece:
ident = tag_piece[:visible_budget]
else:
ident = ident[:visible_budget]After (2 branches):
if len(ident) > visible_budget:
if tag_piece and len(tag_piece) < visible_budget:
# Full tag fits: trim repo to make room
ident = repo[: visible_budget - len(tag_piece)] + tag_piece
else:
# Either no tag or tag too long: truncate from the start
ident = (tag_piece or ident)[:visible_budget]Verification Checklist
- Simplified the nested if-else as requested
- Flattened
elif/elseinto a singleelseusing(tag_piece or ident)[:visible_budget] - All 9
base_slugtests pass - Pre-commit hooks pass (ruff format, ruff lint, pycodestyle, pyright)
- Changes are minimal and focused on the requested simplification
- Pushed to the
fix-cache-tag-latestbranch
The simplification works because tag_piece or ident evaluates to:
tag_piecewhen it exists (but is too long) → truncate tag_pieceidentwhentag_pieceis empty → truncate ident (which equals repo)
Flatten the elif/else branches into a single else clause using (tag_piece or ident)[:visible_budget] which handles both cases: - tag_piece exists but too long: truncate tag_piece - no tag_piece: truncate ident (which equals repo) Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
Summary
Testing
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:b2f1229-pythonRun
All tags pushed for this build
About Multi-Architecture Support
b2f1229-python) is a multi-arch manifest supporting both amd64 and arm64b2f1229-python-amd64) are also available if needed