Skip to content

Conversation

@jwm4
Copy link
Collaborator

@jwm4 jwm4 commented Jan 11, 2026

Summary

This PR fixes issues where the model would incorrectly flag files as 'missing' when they were either in the PR (but on page 2 of results) or pre-existing in the repository.

Changes

1. Fix GitHub API Pagination (root cause)

The PR files endpoint defaults to 30 files per page, but we weren't paginating. PRs with 31+ files were missing files from the analysis.

File: review_roadmap/github/client.py

  • Updated _fetch_file_diffs() to request 100 files per page and paginate through all results

2. Include Diff Content in LLM Context

The LLM was only seeing file names, not the actual code changes. This made it hard to understand imports and cross-file relationships.

File: review_roadmap/agent/nodes.py

  • Added _build_diffs_context() to include truncated diffs in the draft_roadmap prompt
  • Smart truncation: 1500 chars per file, 80K total budget

3. Better Context Expansion Prompt

Updated the prompt to explicitly tell the LLM it can fetch any file from the repo to verify imports exist.

Files: review_roadmap/agent/prompts.py, review_roadmap/agent/tools.py

  • Expanded CONTEXT_EXPANSION_SYSTEM_PROMPT with examples and clearer guidance
  • Updated read_file tool docstring to clarify scope

Testing

Tested against jwm4/temperature_alert#1 which has 39 files:

  • Before: Only fetched 30 files, model incorrectly warned about missing tools/temperature.py and tools/memory.py
  • After: Fetches all 39 files, model correctly includes all tool files in the roadmap

All 70 tests pass with 84% coverage.

- Fix GitHub API pagination for PR files endpoint (was missing files 31+)
- Include actual diff content in draft_roadmap prompt for better code understanding
- Enhance context_expansion prompt to verify imports from pre-existing files
- Update read_file tool docstring to clarify it can fetch any repo file

This fixes issues where the model would incorrectly flag files as 'missing'
when they were either in the PR (but on page 2 of results) or pre-existing
in the repository.
@jwm4
Copy link
Collaborator Author

jwm4 commented Jan 11, 2026

🗺️ Auto-Generated Review Roadmap

This roadmap was automatically generated by review-roadmap.
Model: anthropic-vertex/claude-opus-4-5


Review Roadmap: PR #10 - Improve Context Management for Large PRs

High-Level Summary

This PR addresses a critical bug where PRs with 31+ files were incomplete in analysis due to GitHub API pagination limits. It also enriches the LLM context by including actual diff content (not just file names) and improves the read_file tool documentation so the model understands it can fetch any repository file, not just PR files.

Three distinct changes:

  1. Pagination fix in the GitHub client (root cause)
  2. Diff content injection into the LLM prompt (context improvement)
  3. Prompt/tool clarification for the context expansion phase

Recommended Review Order

Phase 1: Understand the Root Cause Fix

Start with the GitHub client since it's the foundational bug fix that the other changes build upon.

Order File Focus Area
1 review_roadmap/github/client.py Pagination logic in _fetch_file_diffs()

What to verify:

  • Does the while True loop correctly break when receiving fewer results than per_page?
  • Is per_page=100 documented as GitHub's max?
  • Error handling: what happens if pagination fails mid-way?

Phase 2: Context Building Logic

Review how diffs are now formatted and injected into prompts.

Order File Focus Area
2 review_roadmap/agent/nodes.py New _build_diffs_context() function starting around the new additions

What to verify:

  • Truncation constants: MAX_DIFF_CHARS = 1500 and MAX_TOTAL_DIFF_CHARS = 80000 — are these reasonable for typical LLM context windows?
  • Edge case: what happens when a single file exceeds the total budget?
  • The diff is truncated mid-content — does this risk cutting in the middle of a line/syntax?

Phase 3: Prompt & Tool Documentation

These are documentation-focused changes that tell the LLM how to behave.

Order File Focus Area
3 review_roadmap/agent/prompts.py CONTEXT_EXPANSION_SYSTEM_PROMPT changes
4 review_roadmap/agent/tools.py read_file docstring update

What to verify:

  • Prompt and tool docstring should be consistent (both now say "ANY file from the repository")
  • Examples in the prompt reference temperature_agent.tools.memory — is this from the test PR? Could be confusing if left in production

Watch Outs

🔴 Critical: Pagination Termination Logic

In client.py, verify the loop termination condition. The standard pattern is:

if len(files_data) < per_page:
    break

If this check is incorrect, you could either:

  • Exit too early (missing files again)
  • Loop infinitely on exact multiples of 100

🟡 Medium: Token Budget Accounting

In nodes.py, the 80K char budget (~20K tokens) is generous. Verify:

  • Is this budget only for diffs, or does it compete with other prompt sections?
  • What's the model's actual context limit (MAX_TOKENS = 4096 for output, but input could be much higher)?

🟡 Medium: Truncation Mid-Diff

When truncating at MAX_DIFF_CHARS = 1500, the code does:

diff = diff[:MAX_DIFF_CHARS] + f"\n... (truncated, {len(f.diff_content)} chars total)"

This could slice mid-line. Consider truncating at a newline boundary for cleaner context.

🟢 Low: Hardcoded Example in Prompt

The example temperature_agent.tools.memory in prompts.py is from the test PR. This is fine for illustrative purposes, but ensure it's clear this is just an example pattern.


Existing Discussions

No comments have been left on this PR yet. You'll be the first reviewer!


Quick Checklist

  • Pagination loop terminates correctly for PRs with exactly 100, 200, etc. files
  • Diffs context doesn't blow past model context limits when combined with other prompts
  • Error handling exists if GitHub API returns an error mid-pagination
  • read_file tool docstring matches CONTEXT_EXPANSION_SYSTEM_PROMPT guidance
  • Test coverage includes pagination edge cases (the PR claims 70 tests pass — check if pagination is covered)

@jwm4 jwm4 merged commit 194ca5f into main Jan 11, 2026
1 check passed
@jwm4 jwm4 deleted the feat/improve-context-management branch January 11, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants