refactor: Rewrite agent instructions for custom tool descriptions #19

greynewell · 2026-01-15T16:52:22Z

Summary by CodeRabbit

Documentation
- Tools entry renamed to "explore_codebase" in README and docs.
New / Updates
- Tool renamed to "explore_codebase" with tightened input requirements (directory-first) and clarified query options; responses may include auto-generated idempotency-key metadata.
Bug Fixes / Reliability
- More robust archive handling, clearer error messages, and ensured temporary archives are cleaned up.
Configuration
- Increased graph cache capacity and raised default maximum ZIP size.
Chores
- Server instructions replaced with a concise guidance block.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Removes all instructional content, workflow guidance, and agent-directed prompts from the MCP server to provide a minimal, declarative interface. Changes: - Remove MCP server instructions field entirely (previously 2000+ chars) - Simplify tool description to single sentence - Strip workflow guidance and "USE THIS TOOL WHEN" sections - Remove parameter description instructions and examples - Convert all descriptions to purely factual statements Rationale: - Reduces prompt bloat and token usage - Provides cleaner, more declarative API - Lets agents decide when/how to use tools - Removes prescriptive workflow guidance - Maintains full functional compatibility Impact: - No functional changes to tool behavior - All parameters and validation unchanged - API surface remains identical - Only metadata/documentation affected Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Changes tool name to better reflect its core functionality of generating code graphs rather than generic 'analysis'. Changes: - Rename tool from 'analyze_codebase' to 'graph_codebase' - Update README.md with new tool name Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Strips all description text from tool and parameters to provide a blank slate for custom agent-facing documentation. Changes: - Set tool description to empty string - Set all parameter descriptions to empty strings - Keep all functionality, validation, and enum values intact This allows complete control over what information the agent sees about when and how to use this tool. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai · 2026-01-15T16:52:32Z

Walkthrough

Renames the public tool from analyze_codebase to explore_codebase; README updated; server instructions shortened. The tool now auto-generates idempotency keys (git- or UUID-based), prefers cached graphs, zips directories (with larger default max size), and the graph-cache defaults were increased.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Tool entry renamed: `analyze_codebase` → `explore_codebase` (header and description updated).
MCP Server Config `src/server.ts`	Replaced verbose instructions payload with a concise `Server Instructions` block; capabilities otherwise unchanged.
Tool schema & behavior `src/tools/create-supermodel-graph.ts`	Exported `tool` renamed; input schema reworked to require `directory` (removed `file`), `Idempotency-Key` may be auto-generated; added `generateIdempotencyKey(directory)`; added cache-aware routing (`graphCache`) and `handleQueryModeWithCache`; zip creation/cleanup preserved with enhanced error handling and logging; responses include `idempotencyKey` and `idempotencyKeyGenerated` metadata when auto-generated.
Graph cache config `src/cache/graph-cache.ts`	LRU defaults increased: `maxGraphs` 3 → 20, `maxNodes` 100000 → 1000000 (constructor defaults changed).
Zip defaults `src/utils/zip-repository.ts`	Default max zip size increased: 50 MB → 500 MB (`maxSizeBytes` default updated).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant MCP_Server
  participant Tool
  participant Git_FS
  participant QueryEngine

  Client->>MCP_Server: Invoke tool (directory, optional Idempotency-Key, query)
  MCP_Server->>Tool: Forward request
  Tool->>Git_FS: Validate directory, read git HEAD/status
  Tool->>Tool: generateIdempotencyKey (if missing)
  Tool->>Git_FS: Create temporary ZIP of directory
  alt cached graph exists for idempotencyKey
    Tool->>Tool: load cached graph
    alt query provided
      Tool->>QueryEngine: run query against cached graph
      QueryEngine-->>Tool: query results
    else
      Tool-->>Tool: return cached status/summary
    end
  else
    alt query provided
      Tool->>QueryEngine: handleQueryMode(zip, query, idempotencyKey)
      QueryEngine-->>Tool: results
    else legacy mode
      Tool->>Tool: handleLegacyMode(zip, idempotencyKey)
      Tool-->>Tool: legacy results
    end
  end
  Tool->>Git_FS: Delete temp ZIP (cleanup)
  Tool-->>MCP_Server: Return results (include idempotencyKey metadata if generated)
  MCP_Server-->>Client: Relay response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: Add automatic repository zipping with gitignore support #11 — Overlaps on directory-based automatic zipping, zip-repository changes, and server-side ZIP cleanup.
Exp/query engine #10 — Related changes to graph-cache behavior and idempotency-key/cache flow integrated into the tool.
feat: rework tool and server to be agent-oriented #9 — Overlaps on tool renaming and server instruction updates affecting the same files.

Suggested reviewers

robotson
jonathanpopham

Poem

A folder zipped, a key takes flight,
New name, new cache — the graph comes alight.
Queries find homes where old echoes stay,
Temp files vanish, metadata leads the way. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title claims to 'Remove agent instructions' but the changes actually add new server instructions and rename the tool from analyze_codebase to explore_codebase, which contradicts the stated intent.	Update the title to accurately reflect the main changes, such as 'refactor: Rename tool and update server instructions' or 'refactor: Migrate from analyze_codebase to explore_codebase with enhanced instructions'.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧹 Recent nitpick comments

src/tools/create-supermodel-graph.ts (3)
1-1: @ts-nocheck disables all type safety for this file.

So here's the deal: this directive tells TypeScript "don't check anything in this file." That means you could have type mismatches, missing properties, wrong argument types — and TypeScript won't tell you about any of them.

Think of it like turning off the smoke detector in your kitchen. Sure, it stops the beeping, but you might not notice when something's actually on fire.

If there are specific lines causing type errors, consider using @ts-expect-error on just those lines instead, with a comment explaining why. That way you keep type safety for the 95% of the file that doesn't have issues.

185-220: execSync blocks the event loop — fine for now, but worth noting.

So execSync is synchronous, meaning Node.js just stops and waits while git does its thing. For most repos this is super fast (milliseconds), but on a massive monorepo or slow filesystem, it could block the server.

Right now this is probably fine since you're generating keys on-demand. But if you ever see this becoming a bottleneck, you could swap to execFile (async version) wrapped in a promise:
// Example of async alternative
import { execFile } from 'child_process';
import { promisify } from 'util';
const execFileAsync = promisify(execFile);
Not a blocker, just something to keep in mind.

367-384: This metadata injection code is duplicated from lines 286-302.

You've got the same try-parse-JSON-add-metadata-or-prepend-text logic in two places. That's a maintenance headache waiting to happen — if you change one, you might forget to change the other.

Here's a quick helper you could extract:
♻️ Suggested refactor
+/**
+ * Add idempotency key metadata to response
+ */
+function addIdempotencyKeyMetadata(
+  result: ReturnType<typeof asTextContentResult>,
+  idempotencyKey: string
+): void {
+  if (!result.content?.[0] || result.content[0].type !== 'text') return;
+  
+  const originalText = result.content[0].text;
+  try {
+    const responseData = JSON.parse(originalText);
+    responseData._metadata = {
+      ...responseData._metadata,
+      idempotencyKey,
+      idempotencyKeyGenerated: true
+    };
+    result.content[0].text = JSON.stringify(responseData, null, 2);
+  } catch {
+    result.content[0].text = `[Auto-generated Idempotency-Key: ${idempotencyKey}]\n\n${originalText}`;
+  }
+}
Then both places become just:
if (keyGenerated) {
  addIdempotencyKeyMetadata(result, idempotencyKey);
}

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23a094e and df2a2b0.

📒 Files selected for processing (1)

src/tools/create-supermodel-graph.ts

🧰 Additional context used

🧬 Code graph analysis (1)

src/tools/create-supermodel-graph.ts (4)

src/types.ts (3)

ClientContext (4-6)

asErrorResult (51-61)

asTextContentResult (39-49)

src/queries/index.ts (2)

graphCache (271-271)

executeQuery (36-69)

src/cache/graph-cache.ts (2)

graphCache (360-360)

IndexedGraph (42-71)

src/queries/types.ts (1)

QueryType (8-28)

🔇 Additional comments (6)

src/tools/create-supermodel-graph.ts (6)

32-112: Nice detailed description — this will help agents understand what the tool does.

This is exactly what the previous review was asking for. The example output and query type list give agents concrete info about what they're working with. Much better than an empty string!

261-306: TOCTOU fix looks solid — the atomic get() solves the race condition.

Nice work addressing the previous review feedback. By grabbing the cached graph with get() and then passing it directly to handleQueryModeWithCache, you've closed that window where the cache could be evicted between the check and use.

One small thing: the metadata injection logic (lines 286-302) is copy-pasted again at lines 367-384. Consider extracting this into a small helper function to keep things DRY.

405-461: Clean implementation of the cache-aware query path.

This function is the heart of the TOCTOU fix. By passing cachedGraph.raw directly to executeQuery (line 446), you ensure that even if the cache entry gets evicted right after your get(), the query can still succeed because it has the actual data, not just a promise that the cache will have it.

The comment on lines 443-445 explains this nicely for future readers.

463-529: Standard query mode with cache fallback looks good.

The flow is clear: try cache first, if miss then fetch from API. This is the path taken when the handler doesn't have a cached graph available (i.e., it had to zip the directory).

633-653: API fetch function is straightforward.

Reads the ZIP file, wraps it in a Blob, sends it to the API. Clean and simple.

655-696: Legacy mode with good error logging.

The detailed error logging (lines 672-688) will be super helpful for debugging API issues. Nice to see you're capturing response status, headers, and body when available.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/tools/create-supermodel-graph.ts`:
- Around line 26-27: The tool definition for 'graph_codebase' has an empty
description which harms discoverability; update the tool registration in
create-supermodel-graph (symbol: name: 'graph_codebase') to provide a concise
one-line description summarizing its purpose, and add brief descriptions for
each parameter in the tool's schema/params so callers and LLMs can understand
when to use it; keep each description short (one sentence) and accurate to the
parameter's role.

🧹 Nitpick comments (2)

src/tools/create-supermodel-graph.ts (2)
96-97: Consider aligning schema validation with runtime validation.

Right now your schema says only Idempotency-Key is required, but the runtime code (lines 127-134) actually requires either directory or file to be present. This isn't wrong per se - the runtime catches it - but it means an agent might craft a request that passes schema validation but fails at runtime.

You could make this explicit using JSON Schema's oneOf:
✨ Optional schema improvement
     },
-    required: ['Idempotency-Key'],
+    required: ['Idempotency-Key'],
+    oneOf: [
+      { required: ['directory'] },
+      { required: ['file'] },
+    ],
   },
This tells consumers upfront: "you need the idempotency key AND one of directory/file." But totally fine to leave as-is if you want to keep the schema minimal.
1-1: Side note: @ts-nocheck hides potential type issues.

This isn't related to your PR changes, but just flagging that @ts-nocheck at the top means TypeScript won't catch any type errors in this file. It's like turning off the smoke detector - might be fine, might miss something important.

If this was added to work around a specific issue, consider fixing that issue and removing the directive. If it's blocking you from making progress, that's fine for now, but worth revisiting later.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 243a71e and 333f741.

📒 Files selected for processing (3)

README.md
src/server.ts
src/tools/create-supermodel-graph.ts

💤 Files with no reviewable changes (1)

src/server.ts

🔇 Additional comments (3)

README.md (1)

129-129: LGTM! Tool rename looks good.

The documentation correctly reflects the tool rename from analyze_codebase to graph_codebase. Nice that you kept all the helpful docs here even though the in-code descriptions are being stripped - this gives you flexibility to control the narrative through external prompt engineering while still having a reference for humans.

src/tools/create-supermodel-graph.ts (2)

43-51: Nice addition of the query enum!

This is a solid improvement. Before, an agent could pass any random string as query and only find out at runtime it was invalid. Now the schema itself tells you "hey, these are your 13 options."

Think of it like a dropdown menu vs a free-text field - the dropdown prevents typos and invalid inputs right at the door.

100-211: Handler logic looks solid - all functionality preserved.

The core logic is untouched which is exactly what the PR promises. The handler still:

Validates inputs properly

Auto-zips directories with cleanup

Routes to query mode or legacy mode

Handles errors gracefully

The breadcrumb hints and error hints are still in place too (lines 294-391), so even though the schema descriptions are gone, agents will still get helpful guidance in the actual responses. Smart move keeping those.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

src/tools/create-supermodel-graph.ts

Changes tool name to better reflect the exploratory nature of codebase navigation and discovery. Changes: - Rename tool from 'graph_codebase' to 'explore_codebase' - Update README.md with new tool name Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Removes the 'file' parameter to enforce automatic zipping. Agents must now provide a directory path instead of creating ZIP files manually. Changes: - Remove 'file' parameter from tool schema - Make 'directory' a required parameter - Simplify handler logic to only support directory input - Always auto-zip and cleanup temp files This ensures all codebases are processed through our auto-zipping pipeline with proper gitignore support and security exclusions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Restores helpful descriptions for query-related parameters to guide agents on proper parameter usage while keeping tool-level and primary parameter descriptions empty for custom writing. Restored descriptions for: - query (with workflow guidance) - targetId, searchText, namePattern, filePathPrefix, labels - depth, relationshipTypes, limit, includeRaw, jq_filter Still empty (for custom writing): - Tool description - directory description - Idempotency-Key description Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/tools/create-supermodel-graph.ts`:
- Around line 25-27: The tool object's name property is inconsistent with the PR
description; update the Tool defined as tool (the exported constant) by changing
the name field from 'explore_codebase' to 'graph_codebase' (and optionally
update the description string) so the code matches the PR intent and
documentation.

♻️ Duplicate comments (1)

src/tools/create-supermodel-graph.ts (1)

27-27: Empty tool description — already flagged.

This was called out in a previous review. The short version: MCP best practices recommend at least a one-liner so LLMs know what this tool does and when to use it. Without it, agents are basically guessing.

If you're intentionally leaving it empty to inject custom prompts at runtime, just make sure that injection actually happens, or the tool becomes invisible to agents.

🧹 Nitpick comments (2)

src/tools/create-supermodel-graph.ts (2)
1-1: Consider removing @ts-nocheck before merging.

This directive tells TypeScript to look the other way for the entire file. Think of it like turning off the smoke detector while cooking — fine temporarily, but you probably want it back on once you're done.

Since this is a draft PR focused on cleaning up descriptions, no rush. But before this hits main, it'd be good to re-enable type checking so TypeScript can catch any sneaky bugs.

203-220: Return type annotation could be cleaner.

The return type Promise<ReturnType<typeof asTextContentResult>> is technically correct since both asTextContentResult and asErrorResult return ToolCallResult. But it's a bit misleading because the function can return error results too (line 248).

Consider just using Promise<ToolCallResult> for clarity:
async function handleQueryMode(
  client: ClientContext,
  params: { /* ... */ }
): Promise<ToolCallResult> {
Not a big deal, just makes the function signature more honest about what it can return.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80b440a and 4f9c701.

📒 Files selected for processing (1)

src/tools/create-supermodel-graph.ts

🧰 Additional context used

🧬 Code graph analysis (1)

src/tools/create-supermodel-graph.ts (2)

src/types.ts (1)

asErrorResult (51-61)

src/utils/zip-repository.ts (1)

zipRepository (110-237)

🔇 Additional comments (7)

src/tools/create-supermodel-graph.ts (7)

39-47: Nice work constraining query to an enum!

This is solid. By locking down the query parameter to these 13 specific values, you're preventing agents from sending garbage queries and making validation easier. The schema will reject invalid values before they even hit your handler.

Just a thought: even with empty descriptions, the enum values themselves (graph_status, summary, etc.) are somewhat self-documenting. Still, a one-liner on query explaining "pick one of these to explore the codebase" would help agents understand what to choose.

92-92: Good call making directory required.

Since you removed the file parameter and everything now goes through the auto-zip pipeline, requiring directory upfront makes total sense. This prevents confusing error messages later when the handler expects a directory but doesn't get one.

122-157: Solid auto-zipping flow with good error handling.

This is well structured. You're:

Validating the directory upfront

Capturing the cleanup function for later

Handling specific error cases with helpful messages

The error messages are user-friendly — instead of cryptic stack traces, users get "Directory does not exist" or "Permission denied."

One tiny thing: matching errors via error.message.includes(...) (lines 143, 146, 149, 152) is a bit fragile. If zipRepository ever changes its error messages, these checks break silently. But honestly, for a tool like this, it's probably fine. Just something to keep in mind if you ever refactor the zip utility.

182-187: Clean temp file cleanup — exactly right.

Using try/finally here ensures the temp ZIP gets deleted no matter what happens in the query execution. The if (cleanup) check is smart too — if zipRepository threw before returning, cleanup would still be null, and you avoid a "cannot call null" error.

This prevents disk space leaks from orphaned temp files. Nice work!

193-198: LGTM!

Simple, readable byte formatter. Nothing fancy, gets the job done for debug logs.

271-347: Love the breadcrumb hints!

This is a great UX pattern for agents. After running a query, you're basically saying "hey, here's what you might want to do next." For example, after a summary query, you suggest searching for specific functions or browsing the domain map.

This guides agents through the codebase exploration flow without them having to guess. Solid agent ergonomics.

409-424: Verbose error logging — useful for debugging but consider log levels.

This logs everything when an API call fails: error name, message, stack, response status, headers, body. That's super helpful when debugging production issues.

Just one thought: in a production environment with lots of traffic, this could get noisy. If you ever add log levels (like debug vs. error), you might want to gate some of this (like the full stack trace) behind a debug flag. But for now, it's fine — better to have too much info than too little when something breaks.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

src/tools/create-supermodel-graph.ts

Adds back the description for the directory parameter to help agents understand what path to provide. Still empty (for custom writing): - Tool description - Idempotency-Key description Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Makes Idempotency-Key parameter optional and auto-generates it when not provided by the agent. Key generation logic: - Format: {repo}:supermodel:{hash} - Uses git rev-parse --short HEAD if git is available in the directory - Falls back to SHA-1 hash of random UUID (7 chars) if git unavailable - Ensures git repos get stable keys, non-git dirs get unique keys Changes: - Make Idempotency-Key optional (remove from required array) - Add generateIdempotencyKey() function with git fallback logic - Update handler to auto-generate key if not provided - Include auto-generated key in response metadata - Log key generation to stderr for agent visibility Benefits: - Simplifies agent workflow - no manual key generation needed - Maintains caching benefits with stable git-based keys - Graceful fallback for non-git directories - Agent can see generated key in response and logs Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Adds concrete example of tool output from analyzing the MCP repo itself, showing agents what to expect when using the tool. Example includes: - Full JSON response structure - Summary statistics (52 functions, 6 domains, 251 relationships) - Available query hints - Auto-generated idempotency key demonstration Tested locally on this repo: - 24 TypeScript files processed - 60KB ZIP created - Complete graph analysis generated - All 13 query types listed This helps agents understand the tool's capabilities and output format before making their first call. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Updated tool description to show real graph output with: - Sample nodes (Functions, Files, External Modules) with properties - Sample relationships (CHILD_DIRECTORY, CONTAINS_FILE) - Full schema structure agents will receive This replaces the previous summary-only example with the literal graph structure as requested. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/tools/create-supermodel-graph.ts (1)
552-567: Watch out for logging potentially sensitive response data.

The detailed error logging here is super helpful for debugging, but line 564 logs the entire response body. If the API ever returns anything sensitive (auth tokens, user data, etc.), it'll end up in your logs.

Consider either:

Truncating the response body in logs

Only logging this in development mode

Redacting known sensitive fields

Something like:
const responseText = await error.response.text();
console.error('[ERROR] Response body (truncated):', responseText.substring(0, 500));
Not a huge deal for a dev tool, but worth thinking about if this ever handles sensitive data.

♻️ Duplicate comments (1)

src/tools/create-supermodel-graph.ts (1)

30-30: Tool name doesn't match what the PR says.

Quick heads up: the PR description says the tool was renamed to graph_codebase, but here it's explore_codebase. One of them needs to be updated so docs and code stay in sync.

🧹 Nitpick comments (2)

src/tools/create-supermodel-graph.ts (2)
5-6: Consider combining the crypto imports.

You've got two imports from the same crypto module on consecutive lines. They can be merged into one line for tidiness:
-import { randomUUID } from 'crypto';
-import { createHash } from 'crypto';
+import { randomUUID, createHash } from 'crypto';
Not a big deal, just a minor cleanup.

184-205: execSync blocks the event loop and has no timeout.

Here's the thing: execSync is synchronous, meaning while it's running, nothing else can happen. If git rev-parse hangs (say, on a slow network drive or weird git state), your whole server is stuck waiting.

A safer approach would be to use the async version with a timeout:
♻️ Suggested async version with timeout
+import { exec } from 'child_process';
+import { promisify } from 'util';
+
+const execAsync = promisify(exec);

-function generateIdempotencyKey(directory: string): string {
+async function generateIdempotencyKey(directory: string): Promise<string> {
   const repoName = basename(directory) || 'repo';
   let hash: string;
 
   try {
-    hash = execSync('git rev-parse --short HEAD', {
+    const { stdout } = await execAsync('git rev-parse --short HEAD', {
       cwd: directory,
-      encoding: 'utf-8',
-      stdio: ['pipe', 'pipe', 'pipe']
+      timeout: 5000, // 5 second timeout
     });
-    hash = hash.trim();
+    hash = stdout.trim();
     console.error('[DEBUG] Generated idempotency key using git hash:', hash);
   } catch (error) {
     const uuid = randomUUID();
     hash = createHash('sha1').update(uuid).digest('hex').substring(0, 7);
     console.error('[DEBUG] Generated idempotency key using random UUID hash:', hash);
   }
 
   return `${repoName}:supermodel:${hash}`;
 }
Also, tiny edge case: if someone passes a directory ending with / (like /home/user/repo/), basename returns an empty string. Adding || 'repo' as a fallback handles that.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a23c560 and 662b25c.

📒 Files selected for processing (1)

src/tools/create-supermodel-graph.ts

🧰 Additional context used

🧬 Code graph analysis (1)

src/tools/create-supermodel-graph.ts (2)

src/types.ts (3)

HandlerFunction (19-22)

ClientContext (4-6)

asErrorResult (51-61)

src/utils/zip-repository.ts (1)

zipRepository (110-237)

🔇 Additional comments (7)

src/tools/create-supermodel-graph.ts (7)

31-111: The description is pretty hefty for a "reduce token bloat" PR.

So the PR objective mentions removing agent instructions to cut down on tokens, but this description is about 80 lines of JSON example output. That's roughly 2KB+ of text that gets sent to the model on every tool listing.

Don't get me wrong—having a concrete example is super helpful for agents to understand the output format. But if the goal is minimizing tokens, you might want to consider:

A much shorter example (just show the structure, not full sample data)

Or keep a one-liner description and let agents discover the format from actual responses

Just something to think about. If you're intentionally trading off "more tokens" for "better agent understanding through examples," that's totally valid—just worth calling out since it goes against the stated PR goal.

228-244: LGTM on the directory validation and key tracking.

Nice and clean. The keyGenerated flag is a good way to track whether you need to inject metadata into the response later.

246-276: Solid error handling for the zip operation.

The specific error messages for "does not exist", "permission denied", "exceeds limit", and "ENOSPC" are really helpful. When something goes wrong, users will actually know why instead of just getting a generic "zip failed" error. Nice work!

305-322: The metadata injection logic looks good.

I like how you're handling both JSON and non-JSON responses. The try/catch around JSON.parse is a clean way to detect JSON vs plain text.

One small thing: you're mutating result.content[0].text directly. It works fine here since you're in control of the object, but if you ever run into weird bugs with shared references, consider creating a new content array instead. Not critical—just something to keep in mind.

336-341: Nice little utility.

Clean and does what it says on the tin. 👍

346-409: Clean query flow with smart caching.

The cache-first approach with automatic API fallback is solid. And the breadcrumb hints at the end help guide agents to logical next steps—that's a nice touch for usability.

414-511: The breadcrumb hints system is a great UX pattern.

This is clever—after an agent runs a query, they get suggestions for what to try next. It's like giving them a roadmap instead of just raw data. The NEXT: prefix makes it clear these are suggestions, not requirements.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Updated the description of the 'explore_codebase' tool to clarify its functionality and provide an example output.

Updated idempotency key description to clarify usage.

Added detailed instructions for using the server API, including strategies for graph generation, debugging, planning, and analysis.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

When a query is provided and the graph is already cached in memory, skip the expensive ZIP creation step entirely. This optimization avoids redundant file I/O and compression when the cached graph can serve the query directly. Fixes #36 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/tools/create-supermodel-graph.ts`:
- Around line 260-302: The current cache check uses
graphCache.has(idempotencyKey) then calls handleQueryMode with file: '' which
creates a TOCTOU window and causes a confusing failure if the cache entry is
evicted; instead read the cached value atomically (const cached =
graphCache.get(idempotencyKey)) and if cached is truthy call handleQueryMode
using that cached data (or pass the original file/directory variable instead of
an empty string) so the code uses a concrete payload rather than assuming the
cache still exists; update handleQueryMode (and any downstream callers like
fetchFromApi and readFile) to accept and handle a directory/file path or an
already-assembled cached payload so the cache-miss path can safely re-zip
on-demand.

♻️ Duplicate comments (1)

src/tools/create-supermodel-graph.ts (1)

29-31: Tool name still doesn't match PR description.

The PR description says the tool was renamed to graph_codebase, but the code has explore_codebase. This was flagged in a previous review and still needs reconciliation - either update the PR description or change the code.

Good news: the description is now populated with useful example output, which addresses the previous concern about empty descriptions making tools harder to discover.

🧹 Nitpick comments (3)

src/tools/create-supermodel-graph.ts (3)
1-1: Consider removing @ts-nocheck and fixing the underlying type issues.

Having @ts-nocheck at the top is like closing your eyes while driving - TypeScript can't warn you about real bugs. I see there are as any casts throughout (line 240, for example), which suggests there are some typing friction points.

When you have time, it'd be cleaner to:

Remove @ts-nocheck

Fix or properly type the areas that complain

Use targeted // @ts-expect-error`` comments only where truly unavoidable

Not blocking, but worth cleaning up in a follow-up.

282-299: Duplicate metadata injection logic - consider extracting to a helper.

These two blocks do exactly the same thing:

Parse response as JSON

Add _metadata.idempotencyKey and _metadata.idempotencyKeyGenerated

Re-stringify (or prepend text if not JSON)

If you ever need to change this logic (add another field, change format), you'd have to remember to update both places.
♻️ Suggested refactor
function injectIdempotencyMetadata(
  result: ReturnType<typeof asTextContentResult>,
  idempotencyKey: string
): void {
  if (result.content?.[0]?.type !== 'text') return;
  
  const originalText = result.content[0].text;
  try {
    const responseData = JSON.parse(originalText);
    responseData._metadata = {
      ...responseData._metadata,
      idempotencyKey,
      idempotencyKeyGenerated: true
    };
    result.content[0].text = JSON.stringify(responseData, null, 2);
  } catch {
    result.content[0].text = `[Auto-generated Idempotency-Key: ${idempotencyKey}]\n\n${originalText}`;
  }
}
Then replace both blocks with:
if (keyGenerated) {
  injectIdempotencyMetadata(result, idempotencyKey);
}
Also applies to: 362-380

394-399: Minor: formatBytes is duplicated from zip-repository.ts.

This same function exists in src/utils/zip-repository.ts at lines 347-352. You could export it from there and import it here to avoid the duplication. Not a big deal, just a small cleanup opportunity.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9c5a45 and 23a094e.

📒 Files selected for processing (4)

src/cache/graph-cache.ts
src/server.ts
src/tools/create-supermodel-graph.ts
src/utils/zip-repository.ts

🧰 Additional context used

🧬 Code graph analysis (1)

src/tools/create-supermodel-graph.ts (4)

src/types.ts (2)

HandlerFunction (19-22)

ClientContext (4-6)

src/cache/graph-cache.ts (1)

graphCache (360-360)

src/queries/index.ts (1)

graphCache (271-271)

src/utils/zip-repository.ts (1)

zipRepository (110-237)

🔇 Additional comments (6)

src/cache/graph-cache.ts (1)

294-295: Solid capacity bump for larger repos.

Increasing from 3→20 graphs and 100K→1M nodes makes sense for handling SWE-BENCH repos. The LRU eviction logic still works correctly with these new limits.

One thing to keep in mind: with 1M nodes, memory usage could get chunky depending on how fat your nodes are. If you ever need to tune this per-environment (e.g., smaller for local dev, bigger for CI), you could expose these via environment variables. But for now, this is fine.

src/utils/zip-repository.ts (1)

100-101: Nice bump to support bigger repos.

Going from 50MB to 500MB is a 10x increase that'll accommodate those chonky SWE-BENCH repositories. The error messaging at line 176-178 still correctly guides users to analyze subdirectories if they hit the limit, which is good UX.

Also applies to: 114-114
src/server.ts (1)
21-54: Much cleaner instructions with actionable guidance.

The new instructions are way more practical than the old wall of text. I especially like the Performance Optimization section with the concrete example:
Full repo: 180MB, 50k nodes
Subsystem: 15MB, 3k nodes
This gives agents (and humans) a clear mental model of why subdirectory analysis matters. The strategy sections (Debugging, Planning, Analysis) are structured well and easy to follow.
src/tools/create-supermodel-graph.ts (3)
184-219: Solid idempotency key generation with good fallbacks.

The logic here is sensible:

Try git commit hash first (most repos will have this)

Include git status hash to detect dirty working trees - nice touch!

Fall back to UUID if not a git repo

One thing to be aware of: execSync blocks the event loop. For git commands this is typically fine (they complete in <100ms), but if someone runs this on a repo with a massive .git directory or over a slow network mount, it could cause a noticeable pause. Probably not worth changing unless you see issues in practice.

310-334: Clean error handling and proper cleanup.

The ZIP creation logic handles the common failure modes nicely:

Directory doesn't exist → clear message

Permission denied → clear message

Size limit exceeded → passes through the helpful message from zipRepository

Disk full → helpful ENOSPC message

And the finally block on lines 383-388 ensures the temp ZIP gets cleaned up even if the query throws. The if (cleanup) guard handles the edge case where zipRepository itself threw before returning the cleanup function. Well done.

Also applies to: 383-388

471-548: Nice breadcrumb hints for agent navigation.

The addBreadcrumbHints function is a thoughtful addition. Giving agents concrete "NEXT:" suggestions like:
NEXT: Use function_calls_in with targetId to see who calls a function
...helps them navigate the graph systematically instead of fumbling around. This kind of guidance is exactly what makes tools usable by AI agents.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

src/tools/create-supermodel-graph.ts

Changed cache check from has() to get() to atomically retrieve the cached graph. Created handleQueryModeWithCache() that passes the cached graph's raw data to executeQuery, ensuring that even if the cache is evicted between check and execution, the query can still succeed by rebuilding indexes from the raw data. This fixes the race condition where: 1. graphCache.has() returns true 2. Cache entry gets evicted 3. handleQueryMode with file: '' fails because it can't re-fetch Now we atomically get the cached graph and pass its raw data, eliminating the TOCTOU window. Fixes potential "Cache miss with no file to re-zip" errors. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

greynewell and others added 3 commits January 15, 2026 11:45

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

src/tools/create-supermodel-graph.ts Outdated Show resolved Hide resolved

greynewell and others added 3 commits January 15, 2026 12:45

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

src/tools/create-supermodel-graph.ts Outdated Show resolved Hide resolved

greynewell and others added 4 commits January 15, 2026 13:04

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

greynewell and others added 8 commits January 15, 2026 13:28

Refine description for explore_codebase tool

34692fa

Updated the description of the 'explore_codebase' tool to clarify its functionality and provide an example output.

Clarify idempotency key description

d3440ab

Updated idempotency key description to clarify usage.

Add "Domain Expert" server instructions for codebase exploration

f9c5a45

Added detailed instructions for using the server API, including strategies for graph generation, debugging, planning, and analysis.

fix: raise ZIP size limit from 50MB to 500MB for SWE-BENCH repos

3638dfb

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

perf: increase cache size from 3 to 20 graphs and 1M nodes

c3cb9fa

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

fix: include git status in idempotency key to detect uncommitted changes

2cee795

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

docs: document subdirectory analysis pattern for performance

23a094e

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

greynewell requested review from jonathanpopham and robotson January 15, 2026 19:10

greynewell marked this pull request as ready for review January 15, 2026 19:10

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

src/tools/create-supermodel-graph.ts Show resolved Hide resolved

greynewell changed the title ~~refactor: Remove agent instructions for custom tool descriptions~~ refactor: Rewrite agent instructions for custom tool descriptions Jan 15, 2026

greynewell merged commit 30709e4 into main Jan 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: Rewrite agent instructions for custom tool descriptions #19

refactor: Rewrite agent instructions for custom tool descriptions #19

greynewell commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor: Rewrite agent instructions for custom tool descriptions #19

refactor: Rewrite agent instructions for custom tool descriptions #19

Conversation

greynewell commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greynewell commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading