-
Notifications
You must be signed in to change notification settings - Fork 0
Exp/query engine #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exp/query engine #10
Conversation
- Add LRU cache (10 graphs, 30 min TTL) with precomputed indexes - Implement v1 MVP queries: graph_status, summary, get_node, search, list_nodes, function_calls_in/out, definitions_in_file, jq - Implement v1.1 queries: file_imports, domain_map, domain_membership, neighborhood - Fix cache key to use idempotencyKey only (not file path) - Fix list_nodes duplicate nodes with Set dedup - Fix definitions_in_file partial-match fallback - Make includeRaw optional in get_node (default false) - Add breadcrumb hints to guide agents to next queries Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Lazy adjacency creation: IMPORTS edges now recorded for any node that participates, not just File/Module nodes - Query-time aggregation: file_imports aggregates imports from a file's contained functions/classes/types via pathIndex - Fixes empty results when graph models IMPORTS as Function->Module Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Not yet implemented for aggregated imports - will add back when reverse traversal is properly supported. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WalkthroughAdds typed Supermodel IR models, an in-memory LRU indexed graph cache, a suite of discovery/traversal/summary query handlers, a cache-first query dispatcher with API fallback, and integrates query-driven mode into the create-supermodel-graph tool. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant QueryEngine as Query Engine
participant Cache as Graph Cache
participant API as Supermodel API
participant Handler as Query Handler
Client->>QueryEngine: executeQuery(params, cacheKey)
QueryEngine->>Cache: get(cacheKey)
alt cache hit
Cache-->>QueryEngine: IndexedGraph
QueryEngine->>Handler: handler(params, graph, "cache")
Handler-->>QueryEngine: QueryResponse
else cache miss
Cache-->>QueryEngine: null
QueryEngine->>API: fetch raw SupermodelIR
API-->>QueryEngine: SupermodelIR
QueryEngine->>QueryEngine: buildIndexes(raw, cacheKey)
QueryEngine->>Cache: set(cacheKey, IndexedGraph)
Cache-->>QueryEngine: ack
QueryEngine->>Handler: handler(params, graph, "api")
Handler-->>QueryEngine: QueryResponse
end
QueryEngine-->>Client: QueryResponse | QueryError
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @src/queries/discovery.ts:
- Around line 137-145: The code currently constructs a RegExp directly from
user-supplied namePattern (new RegExp(namePattern, 'i')), which allows ReDoS via
complex patterns; before compiling, reject overly long patterns by checking
namePattern.length and return createError('INVALID_PARAMS', `namePattern too
long`) if it exceeds a safe limit (e.g., 100 chars), and only then attempt new
RegExp inside the existing try/catch; optionally note in comments to replace
this length check later with a proper safety check (e.g., safe-regex) for more
robust protection.
In @src/queries/index.ts:
- Around line 187-193: The catalog entry for the query with query: 'get_node' is
missing the optional parameter includeRaw; update the entry (the object where
query === 'get_node') to add 'includeRaw' to its optionalParams array so
consumers know this flag is supported (this aligns with how includeRaw is used
in src/queries/discovery.ts).
In @src/tools/create-supermodel-graph.ts:
- Around line 276-296: The code in the 'get_node' case reads
result.result.node.label but NodeDescriptor exposes labels (an array), so those
branches never run; update the checks to inspect result.result.node.labels
(e.g., guard for existence and use .includes('Function') / .includes('File') /
.includes('Domain') or check labels[0] as appropriate) and leave the hints.push
calls as-is so the Function/File/Domain hints trigger correctly when the node's
labels contain those values.
🧹 Nitpick comments (12)
src/queries/summary.ts (1)
5-14: Unused import:createErrorHey, looks like
createErroris imported here but never actually used in this file. Both functions return success responses viacreateResponse. You can safely remove it to keep things tidy.🧹 Suggested cleanup
import { IndexedGraph, graphCache } from '../cache/graph-cache'; import { QueryParams, QueryResponse, QueryError, GraphStatusResult, SummaryResult, createResponse, - createError, } from './types';src/queries/discovery.ts (1)
77-106: Minor:scannedvariable is unusedThe
scannedvariable gets incremented (line 102) but is never actually used anywhere. If you're not using it for metrics or debugging, you can remove it.Also, a heads-up on the label filtering (lines 90-93): you're only checking
node.labels?.[0](the first label), so if a node has multiple labels like["Class", "Serializable"]and someone filters bylabels=["Serializable"], it won't match. This might be intentional, but worth documenting if so.🧹 Remove unused variable
const results: NodeDescriptor[] = []; - let scanned = 0; // Scan through nameIndex for matches for (const [name, ids] of graph.nameIndex) { if (!name.includes(searchLower)) continue; for (const id of ids) { if (results.length >= limit) break; const node = graph.nodeById.get(id); if (!node) continue; // Filter by labels if specified if (labels && labels.length > 0) { const nodeLabel = node.labels?.[0]; if (!nodeLabel || !labels.includes(nodeLabel)) continue; } // Filter by file path prefix if specified if (filePrefix) { const filePath = node.properties?.filePath as string | undefined; if (!filePath || !normalizePath(filePath).startsWith(filePrefix)) continue; } results.push(toNodeDescriptor(node)); - scanned++; } if (results.length >= limit) break; }src/queries/index.ts (1)
6-7: Acknowledged:@ts-ignorefor jq-webI see this is already noted as a known issue in the PR. For future reference, you could add type declarations with a
.d.tsfile:// jq-web.d.ts declare module 'jq-web' { const jq: { json: (data: unknown, filter: string) => unknown; }; export default jq; }Not blocking for now since it's tracked.
src/tools/create-supermodel-graph.ts (3)
1-1: Acknowledged:@ts-nocheckAlready noted in PR as a known issue to remove. When you do tackle it, the main culprits are likely the
args as anycast (line 149) and the Blob usage (line 358). You could add proper type guards or narrow the input schema type.
219-232: Debug logging should be conditional or removedLines 224, 356, 360-361, 369 have
console.error('[DEBUG]...')statements. These are useful during development but will be noisy in production. Consider:
- Removing them before merge
- Using a debug flag/environment variable
- Using a proper logger with log levels
💡 Option: environment-based toggle
const DEBUG = process.env.DEBUG === 'true'; function debug(...args: unknown[]) { if (DEBUG) console.error(...args); } // Then use: debug('[DEBUG] Cache miss, fetching from API...');
355-372: Consider: API timeout handlingThe API call at line 368 doesn't have a timeout. If the Supermodel API is slow or unresponsive, this could hang. This might be acceptable for MVP, but worth considering an AbortController with timeout for production:
const controller = new AbortController(); const timeout = setTimeout(() => controller.abort(), 60000); // 60s try { const response = await client.graphs.generateSupermodelGraph(requestParams, { signal: controller.signal }); } finally { clearTimeout(timeout); }This depends on whether the API client supports abort signals.
src/cache/graph-cache.ts (4)
83-165: Consider early return when graph data is missing.The function handles missing
graphgracefully with optional chaining, but ifraw.graphis undefined, you'll still process all the loops with empty arrays—which is fine but a bit wasteful. More importantly, thelabels[0]check for determining node type assumes labels array is non-empty. While unlikely to be empty in practice, a defensive check would be safer.For example, on line 133:
const primaryLabel = labels[0];If
labelsis an empty array,primaryLabelwill beundefined, which won't match any of the conditions—so it's safe but worth noting.💡 Optional: Add a guard for empty labels
const entry = pathIndex.get(normalized)!; - const primaryLabel = labels[0]; + const primaryLabel = labels[0] ?? ''; if (primaryLabel === 'File') {
145-159: Directory path extraction could fail silently.On line 147, if both
props.pathandprops.nameare undefined,dirPathbecomes an empty string. This would create an entry indirIndexwith key'', which could cause confusing behavior later.Same pattern on lines 154-158 where you initialize
callAdjandimportAdjonly for the primary label—that's correct, but the empty dirPath edge case is worth addressing.🛠️ Suggested fix
// dirIndex - build directory tree if (labels[0] === 'Directory') { const dirPath = normalizePath(props.path as string || props.name as string || ''); - if (!dirIndex.has(dirPath)) { + if (dirPath && !dirIndex.has(dirPath)) { dirIndex.set(dirPath, []); } }
328-330:has()doesn't update LRU timestamp.Calling
has(key)checks if a key exists but doesn't updatelastAccessed. This means if you repeatedly callhas()to check for a graph's existence without actually using it, that graph could get evicted even though it's being "used".Probably not a big deal since the typical pattern is
has()→get(), andget()does update the timestamp. But worth noting.
359-360: Global singleton is convenient but limits testability.Exporting a singleton
graphCacheis handy for the common case, but makes unit testing harder since tests share state. If you add tests later, you might want a factory function or a way to reset the cache.No action needed now, but something to keep in mind when you add tests (as mentioned in the PR's next steps).
src/queries/traversal.ts (2)
163-256: Partial path matching could match unintended files.Lines 196-200 try to find files by partial match:
if (path.endsWith(normalizedPath) || normalizedPath.endsWith(path)) {This means if you search for
foo.ts, it could matchsrc/foo.ts,lib/foo.ts, or evensome_other_foo.ts. The first match wins (breakon line 199), which could be non-deterministic depending on Map iteration order.Consider either:
- Requiring exact matches only
- Returning an error when multiple files match a prefix
For an experimental query engine, this is probably fine as-is, but worth documenting the behavior.
💡 Example: collect all matches and warn on ambiguity
if (!pathEntry) { - // Try to find by partial match + // Try to find by partial match - collect all matches + const matches: string[] = []; for (const [path] of graph.pathIndex) { if (path.endsWith(normalizedPath) || normalizedPath.endsWith(path)) { - resolvedPath = path; - break; + matches.push(path); } } - if (resolvedPath === normalizedPath) { + if (matches.length === 0) { return createError('NOT_FOUND', `No definitions found for file '${filePath}'`, { detail: 'File may not exist in the analyzed codebase or has no class/function/type definitions', }); } + if (matches.length > 1) { + return createError('INVALID_PARAMS', `Ambiguous file path '${filePath}' matches multiple files`, { + detail: `Matches: ${matches.slice(0, 5).join(', ')}${matches.length > 5 ? '...' : ''}`, + }); + } + resolvedPath = matches[0]; }
455-554: BFS neighborhood traversal looks correct but has redundant.includes()calls.Lines 502-506 and 525-528 check
callAdj.out.includes(adjId)to determine edge direction. Since you're iterating over[...callAdj.out, ...callAdj.in], you already know which half you're in based on the index. But using.includes()is O(n) per call.For small neighborhoods this is fine, but if
callAdj.outhas hundreds of entries and you're iterating over them all, this becomes O(n²).🛠️ Suggested fix: iterate separately to avoid `.includes()`
if (relationshipTypes.includes('calls')) { const callAdj = graph.callAdj.get(nodeId); if (callAdj) { - for (const adjId of [...callAdj.out, ...callAdj.in]) { + // Process outgoing edges + for (const adjId of callAdj.out) { if (!visited.has(adjId) && nodes.length < limit) { visited.add(adjId); const adjNode = graph.nodeById.get(adjId); if (adjNode) { nodes.push(toNodeDescriptor(adjNode)); nextFrontier.push(adjId); - - // Add edge - if (callAdj.out.includes(adjId)) { - edges.push({ type: 'calls', from: nodeId, to: adjId }); - } else { - edges.push({ type: 'calls', from: adjId, to: nodeId }); - } + edges.push({ type: 'calls', from: nodeId, to: adjId }); + } + } + } + // Process incoming edges + for (const adjId of callAdj.in) { + if (!visited.has(adjId) && nodes.length < limit) { + visited.add(adjId); + const adjNode = graph.nodeById.get(adjId); + if (adjNode) { + nodes.push(toNodeDescriptor(adjNode)); + nextFrontier.push(adjId); + edges.push({ type: 'calls', from: adjId, to: nodeId }); } } } } }Same pattern applies to the IMPORTS block below.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.DS_Storeis excluded by!**/.DS_Storepackage-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (9)
src/cache/graph-cache.tssrc/cache/graph-types.tssrc/cache/index.tssrc/queries/discovery.tssrc/queries/index.tssrc/queries/summary.tssrc/queries/traversal.tssrc/queries/types.tssrc/tools/create-supermodel-graph.ts
🧰 Additional context used
🧬 Code graph analysis (5)
src/queries/discovery.ts (2)
src/queries/types.ts (6)
QueryParams(31-57)QueryResponse(66-74)QueryError(77-84)createError(150-163)createResponse(166-183)NodesResult(111-113)src/cache/graph-cache.ts (4)
IndexedGraph(42-71)NodeDescriptor(9-17)toNodeDescriptor(271-282)normalizePath(264-266)
src/tools/create-supermodel-graph.ts (3)
src/queries/types.ts (1)
QueryType(8-28)src/types.ts (3)
ClientContext(4-6)asTextContentResult(39-49)asErrorResult(51-61)src/queries/index.ts (1)
executeQuery(36-69)
src/queries/types.ts (1)
src/cache/graph-cache.ts (2)
NodeDescriptor(9-17)EdgeDescriptor(20-25)
src/queries/traversal.ts (2)
src/queries/types.ts (7)
QueryParams(31-57)QueryResponse(66-74)NodesAndEdgesResult(115-118)QueryError(77-84)createError(150-163)createResponse(166-183)NodesResult(111-113)src/cache/graph-cache.ts (5)
IndexedGraph(42-71)NodeDescriptor(9-17)EdgeDescriptor(20-25)toNodeDescriptor(271-282)normalizePath(264-266)
src/cache/graph-cache.ts (2)
src/cache/graph-types.ts (3)
SupermodelIR(20-44)CodeGraphNode(6-10)CodeGraphRelationship(12-18)src/queries/summary.ts (1)
summary(58-70)
🪛 ast-grep (0.40.4)
src/queries/discovery.ts
[warning] 140-140: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(namePattern, 'i')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (25)
src/cache/index.ts (1)
1-6: LGTM!Nice barrel export pattern here. This gives consumers a single clean entry point (
import { ... } from '../cache') instead of having to know the internal file structure. Standard TypeScript module organization - good stuff.src/queries/summary.ts (2)
19-53: LGTM!The dual-path logic here is clean. When there's no cached graph, you return
cached: falsewith current stats. When there is one, you return the full summary. Nice pattern for "check before you fetch" workflows.
58-70: LGTM!Simple and clean - just wraps the precomputed summary in the standard response envelope. Since
buildIndexesalready computed these stats during cache population, this is O(1) which is exactly what you want.src/queries/discovery.ts (2)
25-57: LGTM!Solid implementation. The
includeRawtoggle is a nice touch - keeps responses lean by default but lets callers opt-in to full node data when they need it. The error message with "Use search or list_nodes to discover valid node IDs" is super helpful for guiding the next action.
146-207: LGTM with a note onhasMoreGood use of
Setfor deduplication when multiple labels match. The filter chain is efficient with early termination.One small note:
hasMore(line 194) is set totruewhenresults.length >= limit, but technically you've stopped iterating at exactlylimitresults - there might not actually be more. This is a minor UX thing though; it's conservative (better to say "there might be more" than miss results).src/queries/index.ts (3)
36-69: LGTM!The cache-first flow is well structured:
- Try cache → 2. Build from API if provided → 3. Special case for
graph_status→ 4. Error for other queries without graphThe
retryable: truehint on cache miss is great for agent UX - tells them to retry with the same params to trigger a fetch.
74-122: LGTM!Clean switch-based dispatch. I like that you've got explicit "not yet implemented" errors for
uses_in_fileandlist_files_in_dir- makes it clear to callers that these are planned but not ready, rather than silently failing.
128-158: LGTM!Nice escape hatch implementation. The warning nudging users toward canned queries is a good UX touch. Including the filter in the error detail makes debugging easier too.
src/cache/graph-types.ts (1)
1-44: LGTM!Clean type definitions. Making most fields in
SupermodelIRoptional is a good defensive choice - means the code won't break if the API adds/removes fields. The nestedgraphobject with requirednodesandrelationshipscaptures the core structure nicely.src/tools/create-supermodel-graph.ts (4)
130-180: LGTM!Clean routing logic. The validation of required params (
file,Idempotency-Key) happens first, then it branches to query mode or legacy mode. If an invalid query type sneaks through,dispatchQuerywill catch it withINVALID_QUERYerror.
334-350: LGTM!Helpful error hints that guide users toward recovery actions. The actionable suggestions like "Use search with searchText to find nodes by name" are much better than just "not found".
377-415: LGTM!Good backward compatibility for the legacy jq_filter mode. The detailed error logging (status, headers, body) will be super helpful for debugging API issues - especially during this experimental phase.
103-106: Good news — depth max enforcement is already in placeThe validation is there! At line 469 of
src/queries/traversal.ts, the depth parameter gets clamped usingMath.min(params.depth || 1, 3). So if someone passesdepth: 100, it'll automatically cap to 3. The code even has a helpful comment explaining why: "Cap at 3 to prevent explosion". This means your schema description of "max 3" accurately reflects what actually happens under the hood — no risk of hanging on large graphs.src/queries/types.ts (3)
1-84: LGTM!Well-structured types. The
QueryTypeunion is complete,QueryParamscovers all the parameters used across queries, and the error codes are comprehensive. The comment on line 50 about cursor being removed for v1 is helpful context.
86-138: LGTM!Result types are well-defined and reuse the existing
NodeDescriptorandEdgeDescriptorfrom the cache module.DomainMapResultincluding the relationships array is nice for building dependency visualizations.
139-183: LGTM!Clean helper functions.
isQueryErroris a proper type guard with thorough checks. The factory functions (createError,createResponse) keep response creation consistent across all query handlers. Nice use of generics increateResponse<T>for type-safe results.src/cache/graph-cache.ts (4)
1-71: LGTM on the interface definitions!Clean type definitions. The
IndexedGraphinterface nicely documents what indexes are available and their purpose via inline comments. This makes it easy for consumers to understand what lookups are O(1).
167-231: Nice lazy adjacency initialization for IMPORTS edges.Lines 182-198 handle the case where IMPORTS edges come from non-File nodes (like Function → Module). Creating adjacency lists lazily is a good fix for that edge case. The commit messages mention this was a bug that was fixed—looks solid now.
One minor observation: Lines 223-230 collect domain-to-domain relationships, but you're only storing relationships where the start node is a Domain. If you have bidirectional domain relationships in the graph, you might miss some. Probably fine for the current use case though.
287-326: LRU eviction looks correct but has O(n) eviction cost.The
evictOldestmethod iterates through all cache entries to find the oldest one. WithmaxGraphsdefaulting to 3, this is totally fine. But if someone cranks upmaxGraphssignificantly, eachsetcould become expensive when eviction is needed.For a production system with many graphs, you'd want a proper LRU data structure (like a doubly-linked list + map). But for 3-10 graphs? This is perfectly reasonable.
Also, nice touch updating
lastAccessedonget()(line 302)—that's the "LRU" part working correctly.
261-266:normalizePathhandles only backslashes; consider whether source paths need more normalization.The function converts
\to/, which is solid for cross-platform consistency. However, it doesn't strip trailing slashes (foo/bar/vsfoo/bar), collapse double slashes, or remove leading./prefixes—all things that can creep into file paths.This works fine if your graph data always provides clean, consistent paths. But if paths from the source sometimes have these variations, lookups will fail silently. The
.startsWith()prefix matching in discovery.ts would be especially vulnerable to trailing slashes.I see there's already a
.endsWith()fallback in traversal.ts, which suggests you've run into path matching issues. Either document that input paths must be pre-normalized, or expandnormalizePathto handle these cases too.src/queries/traversal.ts (5)
27-90:functionCallsInlooks solid.Good validation flow:
- Check
targetIdrequired ✓- Check node exists ✓
- Check node is a Function ✓
- Handle empty adjacency list gracefully ✓
- Paginate and warn if truncated ✓
The only thing I'd note: if a
callerIdinadj.indoesn't exist innodeById, you silently skip it (line 67). That's probably the right behavior—dangling references shouldn't crash—but you could add it to the node count for the "more callers" warning, since you're slicing before filtering out missing nodes.Tiny edge case, probably not worth fixing.
95-158:functionCallsOutmirrorsfunctionCallsIncorrectly.Same structure, just using
adj.outinstead ofadj.inand flipping the edge direction. Looks good!
262-340:fileImportsaggregation logic is clever.Lines 296-310 handle the case where IMPORTS edges are attached to Function/Class/Type nodes rather than the File node itself. This aligns with the PR description mentioning a file_imports bug that was fixed.
Using a
Setto deduplicate (line 282) is the right call.
346-395:domainMapcollects relationships but doesn't deduplicate.If domain A has relationships to domain B stored in both A's and B's
domainIndex, you might get duplicate entries in therelationshipsarray. Looking at howdomainIndexis built ingraph-cache.ts(lines 222-230), relationships are only stored on the start domain, so this shouldn't happen.Just flagging it as something to watch if the indexing logic changes.
469-472: Nice defensive capping on depth.const depth = Math.min(params.depth || 1, 3); // Cap at 3 to prevent explosionThis prevents users from accidentally requesting depth=10 and blowing up the response. Good call! The comment makes the intent clear too.
| // Compile regex if namePattern provided | ||
| let regex: RegExp | null = null; | ||
| if (namePattern) { | ||
| try { | ||
| regex = new RegExp(namePattern, 'i'); | ||
| } catch (e) { | ||
| return createError('INVALID_PARAMS', `Invalid namePattern regex: ${namePattern}`); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd /repo && cat -n src/queries/discovery.ts | sed -n '125,160p'Repository: supermodeltools/mcp
Length of output: 118
🏁 Script executed:
cd /repo && rg "namePattern" src/queries/discovery.ts -B 5 -A 5Repository: supermodeltools/mcp
Length of output: 118
🏁 Script executed:
cd /repo && git log --oneline --all -- src/queries/discovery.ts | head -5Repository: supermodeltools/mcp
Length of output: 118
🏁 Script executed:
cat -n src/queries/discovery.ts | sed -n '125,160p'Repository: supermodeltools/mcp
Length of output: 1416
🏁 Script executed:
rg "namePattern" src/queries/discovery.ts -B 5 -A 5Repository: supermodeltools/mcp
Length of output: 885
🏁 Script executed:
rg "function listNodes|const listNodes" src/queries/discovery.ts -A 80 | head -100Repository: supermodeltools/mcp
Length of output: 2322
🏁 Script executed:
rg "type QueryParams|interface QueryParams" src/ -A 10Repository: supermodeltools/mcp
Length of output: 525
🏁 Script executed:
rg "listNodes\(" src/ -B 3 -A 1 | head -40Repository: supermodeltools/mcp
Length of output: 565
🏁 Script executed:
fd "*.ts" src/ --exec grep -l "list_nodes\|listNodes" {} \;Repository: supermodeltools/mcp
Length of output: 346
🏁 Script executed:
rg "list_nodes" src/ --glob "*.ts" -B 2 -A 2Repository: supermodeltools/mcp
Length of output: 6041
🏁 Script executed:
rg "namePattern" src/ --glob "*.ts" -B 2 -A 2 | head -60Repository: supermodeltools/mcp
Length of output: 3361
🏁 Script executed:
cat -n src/queries/index.ts | sed -n '1,50p'Repository: supermodeltools/mcp
Length of output: 1786
🏁 Script executed:
rg "regex|RegExp" src/queries/discovery.ts -B 2 -A 2Repository: supermodeltools/mcp
Length of output: 495
🏁 Script executed:
cat -n src/queries/discovery.ts | sed -n '165,180p'Repository: supermodeltools/mcp
Length of output: 615
🌐 Web query:
JavaScript ReDoS attack prevention best practices 2024
💡 Result:
Here are concise, practical ReDoS prevention best practices for JavaScript (2024–2026), with short rationale/action and sources.
- Never build regexes from untrusted input. Prefer allowlists or escape/sanitize inputs before using RegExp() with user data. [3][1]
- Prefer safe alternatives to regex when possible (string APIs, indexOf, parse libraries). Use regex only when necessary. [1][6]
- Limit input size and complexity (strict length limits, trim, rate-limit endpoints) before regex evaluation. [3][6]
- Avoid super-linear patterns: nested quantifiers, overlapping alternations, and greedy .*/.+. Use negated classes, atomic groups or possessive quantifiers where supported. Test regexes for catastrophic backtracking. [1][6]
- Use static analysis / automated scanners to detect risky patterns (safe-regex, vuln-regex-detector, SAST rules) in CI. Verify tool maintenance before relying on it. [4][2]
- Use safe regex engines when feasible (RE2 or engines without backtracking) or language bindings that guarantee linear-time matching for untrusted input. Test for semantic differences. [4][2]
- Mitigate at runtime: run expensive matching in isolated workers/processes with CPU/time quotas, or apply execution timeouts and circuit breakers. Combine with rate limits. [2][6]
- Keep dependencies and runtimes patched (e.g., past Node/Undici ReDoS fixes). Monitor CVEs and upgrade promptly. [5]
- Add targeted fuzzing and worst-case performance tests for regexes (benchmark with crafted adversarial inputs) as part of QA. [2][1]
- Instrument and monitor CPU/latency; alert on anomalous spikes and add defensive rules (block/slowdown clients causing repeated heavy regex work). [2][6]
Quick action checklist:
- Audit code for RegExp(new RegExp(userInput)) or dynamic regex construction. [3]
- Enforce input length + rate limits. [3][6]
- Replace risky regex with RE2 or non-regex logic where reachable by untrusted input. [4][2]
- Add CI SAST scan and performance tests; patch affected libs immediately. [4][5]
Sources: regular-expressions.info (prevention techniques) [1]; SoK literature/engineering review on ReDoS (2024) [2]; OWASP guidance on overly-permissive regexes [3]; npm safe-regex and tooling commentary (detection) [4]; NVD/CVE/Undici Node ReDoS advisories (example of real-world patching) [5]; LogRocket practical mitigation guide (patterns & testing) [6].
ReDoS risk with user-supplied regex patterns
The vulnerability is real here. When someone passes a namePattern via the API, it goes straight into new RegExp(namePattern, 'i') without any ReDoS protection. A malicious pattern like (a+)+$ could cause the regex engine to backtrack exponentially and hang your process—imagine checking that pattern against a node name with many characters.
The try/catch you have catches syntax errors ("invalid regex") but it can't catch patterns that are technically valid but dangerously slow.
A simple fix for now: add a length limit on the pattern. Something like "reject patterns longer than 100 characters" blocks most real-world use cases while cutting off the room for complex attacks. If you want to be more thorough later, consider using a library like safe-regex that detects risky patterns automatically.
💡 Suggested quick mitigation
// Compile regex if namePattern provided
let regex: RegExp | null = null;
if (namePattern) {
+ // Basic protection: reject overly long patterns that could be malicious
+ if (namePattern.length > 100) {
+ return createError('INVALID_PARAMS', 'namePattern too long (max 100 chars)');
+ }
try {
regex = new RegExp(namePattern, 'i');
} catch (e) {
return createError('INVALID_PARAMS', `Invalid namePattern regex: ${namePattern}`);
}
}🧰 Tools
🪛 ast-grep (0.40.4)
[warning] 140-140: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(namePattern, 'i')
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🤖 Prompt for AI Agents
In @src/queries/discovery.ts around lines 137 - 145, The code currently
constructs a RegExp directly from user-supplied namePattern (new
RegExp(namePattern, 'i')), which allows ReDoS via complex patterns; before
compiling, reject overly long patterns by checking namePattern.length and return
createError('INVALID_PARAMS', `namePattern too long`) if it exceeds a safe limit
(e.g., 100 chars), and only then attempt new RegExp inside the existing
try/catch; optionally note in comments to replace this length check later with a
proper safety check (e.g., safe-regex) for more robust protection.
| { | ||
| query: 'get_node', | ||
| description: 'Get full details for a specific node by ID', | ||
| requiredParams: ['file', 'idempotencyKey', 'targetId'], | ||
| optionalParams: [], | ||
| phase: 'v1', | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing includeRaw in get_node optional params
The get_node query supports includeRaw as an optional parameter (you can see it's used in src/queries/discovery.ts line 46), but it's not listed here. Agents using this catalog won't know about it.
🔧 Quick fix
{
query: 'get_node',
description: 'Get full details for a specific node by ID',
requiredParams: ['file', 'idempotencyKey', 'targetId'],
- optionalParams: [],
+ optionalParams: ['includeRaw'],
phase: 'v1',
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { | |
| query: 'get_node', | |
| description: 'Get full details for a specific node by ID', | |
| requiredParams: ['file', 'idempotencyKey', 'targetId'], | |
| optionalParams: [], | |
| phase: 'v1', | |
| }, | |
| { | |
| query: 'get_node', | |
| description: 'Get full details for a specific node by ID', | |
| requiredParams: ['file', 'idempotencyKey', 'targetId'], | |
| optionalParams: ['includeRaw'], | |
| phase: 'v1', | |
| }, |
🤖 Prompt for AI Agents
In @src/queries/index.ts around lines 187 - 193, The catalog entry for the query
with query: 'get_node' is missing the optional parameter includeRaw; update the
entry (the object where query === 'get_node') to add 'includeRaw' to its
optionalParams array so consumers know this flag is supported (this aligns with
how includeRaw is used in src/queries/discovery.ts).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.gitignore:
- Line 4: The .gitignore entry "test_file.txt.DS_Store" only matches that exact
filename; update the file to separate patterns so you ignore the intended files:
add "test_file.txt" on its own line to ignore that file and add ".DS_Store" (or
"**/.DS_Store" to catch DS_Store in subdirectories) on its own line so macOS
.DS_Store files are ignored everywhere.
NodeDescriptor has 'labels' (array), not 'label'. The hints for Function/File/Domain nodes were never triggering. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @src/tools/create-supermodel-graph.ts:
- Around line 234-242: The current isQueryError branch returns the error via
asTextContentResult which sets isError: false, causing clients to treat failures
as successes; update the error handling in the isQueryError branch of
create-supermodel-graph (the block that builds errorWithHints) to return an
error result with isError: true by using asErrorResult (or implement a new
helper that wraps the error payload and sets isError: true) and include the
hints from getErrorHints(result.error.code, params.query) in that error payload
so clients can reliably detect and surface query failures.
🧹 Nitpick comments (3)
src/tools/create-supermodel-graph.ts (3)
13-13: Unused import:getAvailableQueriesHey, looks like
getAvailableQueriesis imported but never used anywhere in the file. This is basically dead code that'll just confuse folks reading it later.-import { executeQuery, getAvailableQueries, isQueryError, QueryType } from '../queries'; +import { executeQuery, isQueryError, QueryType } from '../queries';
159-176: No runtime validation forqueryparameter before castingThe
queryparameter gets cast toQueryTypeon line 162 without any runtime validation. While the schema has anenumthat should prevent invalid values from typical clients, if someone bypasses schema validation or sends an unexpected value, it'll flow through as-is.Think of it like this: you're trusting the front door to keep out bad actors, but if someone climbs through a window, there's no guard inside.
Consider adding a quick sanity check:
const validQueries = new Set([ 'graph_status', 'summary', 'get_node', 'search', 'list_nodes', // ... etc ]); if (!validQueries.has(query)) { return asErrorResult(`Invalid query type: ${query}`); }That said, if your
executeQueryfunction already handles unknown query types gracefully, this is less critical.
352-372: Consider adding a timeout for the API callThe
generateSupermodelGraphAPI call on line 368 doesn't have a timeout. If the API server hangs or is slow, this request will just... wait forever. In a tool context, that's not great.This might not be a blocker for an experimental feature, but something to keep in mind. A simple approach would be using
AbortControllerwith a timeout:const controller = new AbortController(); const timeout = setTimeout(() => controller.abort(), 60000); // 60s try { const response = await client.graphs.generateSupermodelGraph({ ...requestParams, signal: controller.signal, }); // ... } finally { clearTimeout(timeout); }(Depends on whether your API client supports abort signals)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
.DS_Storeis excluded by!**/.DS_Store
📒 Files selected for processing (1)
src/tools/create-supermodel-graph.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/tools/create-supermodel-graph.ts (3)
src/queries/types.ts (1)
QueryType(8-28)src/types.ts (3)
ClientContext(4-6)asTextContentResult(39-49)asErrorResult(51-61)src/queries/index.ts (1)
executeQuery(36-69)
🔇 Additional comments (3)
src/tools/create-supermodel-graph.ts (3)
276-296: LGTM on thelabels[0]fix!This is the fix mentioned in the commit message - using
labels?.[0]instead of a non-existentlabelproperty. SinceNodeDescriptorexposes an array of labels, grabbing the first one makes sense here.The optional chaining (
labels?.[0]) is a nice touch too - handles the case wherelabelsmight be undefined without blowing up.
331-350: Nice error hints helper!Simple and clean. Returning actionable hints for
NOT_FOUNDandINVALID_PARAMSerrors is a nice touch for guiding agents (or humans) toward recovery steps. The default empty array is sensible for unknown error codes.
374-415: Solid legacy mode with thorough error logging!The error handling here is comprehensive - you're logging error name, message, stack, response status, headers, and even attempting to read the response body. The nested try/catch for
error.response.text()(lines 401-406) is good defensive coding since that could also throw.This kind of detailed logging will be super helpful when debugging API issues.
| relationshipTypes: { | ||
| type: 'array', | ||
| items: { type: 'string' }, | ||
| description: 'Relationship types to traverse (neighborhood). Options: calls, IMPORTS', | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find relationshipTypes usage and how they're processed
rg -n "relationshipTypes" --type=ts -B 2 -A 2Repository: supermodeltools/mcp
Length of output: 3257
Inconsistent casing in relationshipTypes will silently break user input
You nailed it. The code checks for exact string matches: 'calls' (lowercase) on line 490 and 'IMPORTS' (uppercase) on line 514 in traversal.ts.
What this means in practice: if a user passes ['CALLS', 'imports'] (which seems totally reasonable), the system silently ignores them because .includes() won't find a match. The traversal runs, but no relationships are found. Frustrating for users.
The fix is straightforward—either normalize the input to lowercase (or uppercase) before checking, or update the defaults and checks to use consistent casing throughout. The description should also explicitly state the exact strings accepted, like: 'calls' or 'IMPORTS' (not 'CALLS' or 'imports').
| // Handle query errors | ||
| if (isQueryError(result)) { | ||
| // Include hints for common errors | ||
| const errorWithHints = { | ||
| ...result, | ||
| hints: getErrorHints(result.error.code, params.query), | ||
| }; | ||
| return asTextContentResult(errorWithHints); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query errors returned with isError: false
Here's a subtle gotcha: when a query fails (line 235-241), you're using asTextContentResult() which sets isError: false. So from the MCP client's perspective, this looks like a successful response that happens to contain error data.
Looking at the helper from src/types.ts:
asTextContentResult→isError: falseasErrorResult→isError: true
If the client is checking isError to determine success/failure, query errors will slip through as "successes." You might want to either:
- Use
asErrorResultfor query errors, or - Create a new helper that returns structured error data with
isError: true
Example fix
// Handle query errors
if (isQueryError(result)) {
- // Include hints for common errors
- const errorWithHints = {
- ...result,
- hints: getErrorHints(result.error.code, params.query),
- };
- return asTextContentResult(errorWithHints);
+ const errorMessage = JSON.stringify({
+ ...result,
+ hints: getErrorHints(result.error.code, params.query),
+ }, null, 2);
+ return asErrorResult(errorMessage);
}🤖 Prompt for AI Agents
In @src/tools/create-supermodel-graph.ts around lines 234 - 242, The current
isQueryError branch returns the error via asTextContentResult which sets
isError: false, causing clients to treat failures as successes; update the error
handling in the isQueryError branch of create-supermodel-graph (the block that
builds errorWithHints) to return an error result with isError: true by using
asErrorResult (or implement a new helper that wraps the error payload and sets
isError: true) and include the hints from getErrorHints(result.error.code,
params.query) in that error payload so clients can reliably detect and surface
query failures.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
This looks really good. Let me test it and I will merge |
MCP Server Query Engine - Experimental Branch Overview
Branch:
exp/query-engineonsupermodeltools/mcpDate: 2026-01-12
Status: Prototype ready for testing, not yet merged to main
What It Does
The query engine adds a layer between AI agents (Claude Code, etc.) and the Supermodel graph API. Instead of agents making raw API calls and parsing giant JSON blobs, they can now:
function_calls_in,file_imports, etc. instead of writing jq filtersThe Problem It Solves
Before: An agent calls the Supermodel API, gets back a 30MB JSON graph, and has to parse it with jq filters to find anything useful. Every query re-fetches the whole graph. Agents often write bad jq and waste tokens.
After: An agent calls
summaryonce, gets high-level stats. Then callssearchto find a function. Then callsfunction_calls_into see who calls it. All from cache. No jq needed for common operations.How It Works
Architecture
The Cache
myrepo:supermodel:abc123)The Indexes
When a graph is cached, we precompute these indexes for O(1) lookups:
nodeByIdlabelIndexpathIndexdirIndexnameIndexcallAdjimportAdjdomainIndexQuery Types
v1 MVP (all implemented):
graph_status- Check if graph is cached, get summary if sosummary- High-level stats (file count, function count, etc.)get_node- Get details for a specific node by IDsearch- Find nodes by name substringlist_nodes- List nodes with filters (labels, file path prefix)function_calls_in- Who calls this function?function_calls_out- What does this function call?definitions_in_file- What's defined in this file?jq- Escape hatch for arbitrary queriesv1.1 (all implemented):
file_imports- What does this file import?domain_map- List all domains and their relationshipsdomain_membership- What's in this domain?neighborhood- Ego graph around a node (BFS traversal)Breadcrumb Hints
Every response includes hints for what to query next. For example, after
summary:{ "hints": [ "Use search with searchText to find specific nodes by name", "Use list_nodes with labels=['Function'] to browse functions", "Use domain_map to see architectural domains" ] }This guides agents through a logical exploration pattern without them having to figure out the API.
Why This Design
Canned Queries > Raw jq
We could have just improved jq error messages. But:
The jq escape hatch exists for edge cases, but with a warning that canned queries are preferred.
Query-Time Aggregation
Some graph modeling choices (like IMPORTS edges on Function nodes instead of File nodes) required us to aggregate at query time. For
file_imports, we:pathIndexThis keeps the index simple while handling different graph schemas.
What We Chose Not To Do
importedBy- reverse import lookup needs more workuses_in_file,list_files_in_dir- not implemented yetpath_between,module_dependencies- v2 featuresFiles Changed
All new code is in:
Testing Notes
The query engine was tested by another Claude agent exploring the
supermodel-vscoderepo. Results:file_importsbug found and fixed (Function→Module IMPORTS edges)Known Issues
@ts-nocheckin two files - tech debt, not a blockerNext Steps
@ts-nocheckand add proper typesimportedByfor reverse import lookupSummary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.