-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add automatic repository zipping with gitignore support #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implements transparent auto-zipping of repositories for codebase analysis, eliminating the need for manual ZIP creation by agents and users. Key Features: - Automatic ZIP creation from directory paths using 'directory' parameter - Cross-platform support (Windows, macOS, Linux) - Respects .gitignore patterns automatically - Excludes sensitive files (.env, *.pem, credentials, keys) - Excludes dependencies (node_modules, venv, vendor) - Excludes build outputs (dist, build, out) - Automatic cleanup of temporary ZIP files after use - Cleanup of stale ZIPs (>24h) on server startup - 50MB size limit with clear error messages - Comprehensive error handling (disk space, permissions, validation) Changes: - Add src/utils/zip-repository.ts with zipRepository() and cleanupOldZips() - Update src/tools/create-supermodel-graph.ts to support 'directory' param - Update src/server.ts to cleanup old ZIPs on startup - Update tool schema to accept 'directory' (recommended) or 'file' (deprecated) - Update README.md with new usage examples and troubleshooting - Add dependencies: archiver, ignore, @types/archiver Backward Compatibility: - Maintains support for pre-zipped 'file' parameter - No breaking changes to existing integrations Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
WalkthroughThis PR introduces directory-based input for repository analysis, replacing the prior requirement for pre-zipped files. It adds automatic ZIP creation with gitignore support, implements temporary file cleanup mechanisms, and updates documentation to reflect these changes. Changes
Sequence DiagramsequenceDiagram
participant Client as Client/API
participant Tool as create-supermodel-graph<br/>Tool
participant ZipUtil as zipRepository<br/>Utility
participant Archiver as Archiver +<br/>Ignore Libs
participant FileSystem as File System
Client->>Tool: Request with directory path
Tool->>Tool: Validate input (directory XOR file)
alt Directory provided
Tool->>ZipUtil: zipRepository(directoryPath)
ZipUtil->>ZipUtil: Build ignore filter<br/>(standards + .gitignore)
ZipUtil->>Archiver: Create archive
ZipUtil->>FileSystem: Traverse & add eligible files
ZipUtil->>FileSystem: Enforce size cap
Archiver-->>ZipUtil: ZIP created
ZipUtil-->>Tool: ZipResult {path, cleanup, ...}
else File provided (deprecated)
Tool->>Tool: Use file path directly
end
Tool->>Tool: Process ZIP content
Tool->>Client: Return results
Tool->>FileSystem: cleanup() — remove temp ZIP
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/utils/zip-repository.ts`:
- Around line 311-321: The code currently calls fs.stat(fullPath) (variable
fullPath) which follows symlinks and can include files outside rootDir; change
this to use fs.lstat(fullPath) to detect and skip symlinks outright (consistent
with other special-file handling), or if you must follow links, resolve them
with fs.realpath(fullPath) and verify the resolved path is inside rootDir before
including; update the error handling branch accordingly to skip symlinks and log
a warning when a symlink or an out-of-repo resolved path is encountered.
🧹 Nitpick comments (5)
src/utils/zip-repository.ts (2)
167-181: Potential race condition in size limit enforcement.So here's the thing: you're checking
totalSizein theentryevent, butentry.stats?.sizegives you the uncompressed size of each file, not the compressed size that ends up in the ZIP. This means:
- The actual ZIP file could be smaller than
totalSize(compression wins)- Or larger if there's metadata overhead
Also, there's a subtle timing issue -
archive.abort()is async-ish, so more files might get added between when you call abort and when it actually stops. Not a huge deal in practice, but worth knowing.Consider checking the actual output stream size periodically or using the
progressevent if available.💡 Alternative approach using actual file size
+ // Track actual compressed size + let compressedSize = 0; + output.on('data', (chunk) => { + compressedSize += chunk.length; + if (compressedSize > maxSizeBytes) { + archive.abort(); + archiveError = new Error( + `ZIP size exceeds limit (${formatBytes(maxSizeBytes)}). ` + + `Consider excluding more directories.` + ); + } + });
193-209: Minor: Redundant error check after promise.Lines 204-209 check
archiveErroragain after the promise resolves, but ifarchiveErrorwas set, the promise on line 193-202 would have already rejected. This code path is unreachable.That said, it's harmless defensive code - just a tiny bit of dead code. Up to you if you want to clean it up.
🧹 Simplified error handling
// Wait for output stream to finish await new Promise<void>((resolve, reject) => { output.on('close', () => { if (archiveError) { reject(archiveError); } else { resolve(); } }); output.on('error', reject); }); - - // Check for errors during archiving - if (archiveError) { - // Clean up partial ZIP - await fs.unlink(zipPath).catch(() => {}); - throw archiveError; - }Note: You'd want to add cleanup logic inside the reject path if you remove lines 204-209.
src/tools/create-supermodel-graph.ts (3)
191-209: Error handling is good, but relies on string matching.The error mapping works, but checking
error.message.includes('does not exist')is a bit fragile - if the message text changes inzip-repository.ts, these checks break silently.A more robust approach would be to use custom error classes or error codes in
zipRepository, but honestly for a tool like this, string matching is probably fine. Just something to keep in mind if you refactor later.
249-257: DuplicateformatBytesfunction.This is identical to the one in
src/utils/zip-repository.ts(line 347). You could export it from there and import it here to keep things DRY.Not a big deal for a 5-line helper, but if you ever need to change the formatting logic, you'd have to remember to change it in both places.
🔧 Extract to shared utility
In
src/utils/zip-repository.ts:-function formatBytes(bytes: number): string { +export function formatBytes(bytes: number): string {In
src/tools/create-supermodel-graph.ts:-import { zipRepository } from '../utils/zip-repository'; +import { zipRepository, formatBytes } from '../utils/zip-repository'; -/** - * Format bytes as human-readable string - */ -function formatBytes(bytes: number): string { - if (bytes < 1024) return `${bytes} B`; - if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(2)} KB`; - if (bytes < 1024 * 1024 * 1024) return `${(bytes / (1024 * 1024)).toFixed(2)} MB`; - return `${(bytes / (1024 * 1024 * 1024)).toFixed(2)} GB`; -}
1-1: Consider removing@ts-nocheck.This disables all TypeScript checking for the file. If there's a specific type issue you're working around, a more targeted
@ts-expect-erroror@ts-ignoreon just that line would be better. That way TypeScript can still catch bugs in the rest of the file.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (5)
README.mdpackage.jsonsrc/server.tssrc/tools/create-supermodel-graph.tssrc/utils/zip-repository.ts
🧰 Additional context used
🧬 Code graph analysis (1)
src/server.ts (1)
src/utils/zip-repository.ts (1)
cleanupOldZips(358-396)
🔇 Additional comments (12)
README.md (3)
151-157: Documentation looks solid!The parameter table clearly explains the new
directoryoption and properly marksfileas deprecated. The footnote about mutual exclusivity is helpful. Nice and clear for users! 🎯
175-181: Great addition of the "Automatic features" section.This gives users a nice at-a-glance view of what the auto-zipping handles for them. Super helpful for setting expectations about what gets excluded.
195-198: Helpful troubleshooting additions.Good coverage of the new error scenarios (permissions, disk space, directory existence). These will save users a lot of head-scratching when things go wrong.
src/utils/zip-repository.ts (3)
17-83: Nice comprehensive exclusion list!The security-sensitive files section (lines 54-70) is particularly important - good job covering
.env, credentials, keys, and cloud provider secrets. This prevents accidental credential leaks when users zip their repos.One small thing to consider: the
envpattern on line 28 might be a bit aggressive. Some projects have legitimateenv/directories that aren't Python virtualenvs (like config directories). But this is a minor tradeoff - better safe than sorry for a default exclusion list.
256-273: Note: Only root.gitignoreis parsed.Just so you're aware - this implementation only reads the
.gitignorefile in the root directory. Real git supports nested.gitignorefiles in subdirectories, each applying to their own subtree.For most repos this is probably fine since the root
.gitignoreusually covers everything, but some monorepos or projects with complex structures might have patterns only in nested.gitignorefiles that won't be respected here.If you want to keep it simple (totally reasonable!), maybe add a note in the docs. If you want full fidelity, you'd need to check for
.gitignorein each directory duringaddFilesRecursively.
358-396: Cleanup function looks solid!Good job on:
- Only touching files with the
supermodel-prefix (won't accidentally delete other stuff)- Graceful error handling (won't crash if files are deleted by another process)
- Logging the cleanup count for debugging
The naming convention
supermodel-{random}.zipmakes it easy to identify your temp files.src/server.ts (2)
60-63: Nice server startup cleanup integration!Running the cleanup before the transport connects is a good call - it ensures old temp files from crashed/killed processes don't pile up. The 24-hour threshold is reasonable.
One tiny thing: since
cleanupOldZipshandles its own errors internally and never throws, theawaithere is fine but you might consider logging that cleanup is starting, just for visibility during debugging.
21-22: The instructions update is thorough!The updated instructions cover the new directory-based workflow well. The explanation of automatic zipping, exclusions, and the 50MB limit gives users everything they need to know.
src/tools/create-supermodel-graph.ts (3)
67-74: Clear schema update for the new directory parameter.The descriptions are helpful - users will immediately understand that
directoryis the new recommended way andfileis deprecated. Good UX!
163-170: Solid input validation!The mutual exclusivity check (
directoryXORfile) is clean and the error messages are clear. Users won't be confused about what went wrong.
218-246: Cleanup logic is correct!Good use of
try/finallyto ensure the temp ZIP gets cleaned up even if the query processing fails. TheshouldCleanupflag correctly tracks whether we created a temp file that needs cleanup.The flow is:
- If
zipRepositorythrows → early return with error, no cleanup needed (zip wasn't created)- If
zipRepositorysucceeds →shouldCleanup = true,cleanupis set- If anything in the try block fails → finally still runs and cleans up
Nice and tidy! 🧹
package.json (1)
39-40: Dependencies are spot-on. All versions are current—you've gotarchiver7.0.1 andignore7.0.5 which are the latest releases, so nothing outdated here.To break it down simply:
archiveris the standard go-to library for creating ZIP files in Node.js (basically your toolbox for zipping things), andignorehandles gitignore patterns (the file that tells git what to skip). Both are well-maintained and widely trusted in the Node ecosystem. Using the caret (^) in your version specs means you'll automatically get bug fixes and minor improvements when they drop—which is exactly what you want.Great choices for the ZIP functionality you're adding! 👍
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary
Implements transparent auto-zipping of repositories for codebase analysis, eliminating the need for manual ZIP creation by agents and users.
Key Features
✅ Automatic ZIP Creation
directoryparameter accepts repository paths directlygit archiveor create ZIPs manually✅ Smart Exclusions
.gitignorepatterns automatically.env,*.pem, credentials, keys)node_modules,venv,vendor)dist,build,out).idea,.vscode)✅ Resource Management
✅ Error Handling
✅ Backward Compatibility
fileparameterChanges
New Files
src/utils/zip-repository.ts: Core zipping functionality with gitignore supportModified Files
src/tools/create-supermodel-graph.ts: Updated to supportdirectoryparametersrc/server.ts: Added cleanup of old ZIPs on startuppackage.json: Addedarchiver,ignore,@types/archiverdependenciesREADME.md: Updated documentation with new usage examplesTesting
Comprehensive testing performed:
Usage
Before (manual):
git archive -o /tmp/repo.zip HEAD # Then call tool with file=/tmp/repo.zipAfter (automatic):
Breaking Changes
None. The
fileparameter is still supported for backward compatibility.Related Issues
Co-authored-by
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com
Summary by CodeRabbit
Release Notes
New Features
Improvements
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.