Skip to content

Conversation

@tnm
Copy link
Contributor

@tnm tnm commented Nov 23, 2025

Summary

Fixes #144 by correcting three issues with subdirectory .gitignore handling:

  1. Fixed pattern precedence - Subdirectory patterns can now properly override root patterns
  2. Fixed negation patterns - ! prefix is preserved correctly
  3. Fixed pattern scope - Subdirectory patterns now apply to all depths, matching Git's behavior

The Problem

PR #144 added subdirectory .gitignore support but had a failing test for negation patterns. The issues were:

  1. Wrong sort order: .gitignore files were sorted deepest-first, but Git processes them root-to-leaf
  2. Broken negation: !*.tmp in special/.gitignore became special/!*.tmp (invalid)
  3. Wrong scope: *.cache in level1/.gitignore only matched level1/*.cache instead of level1/**/*.cache

What Changed

Pattern Precedence

# Before: deepest first (wrong)
gitignore_files.sort(key=lambda p: len(p.parts), reverse=True)

# After: shallowest first (correct)
gitignore_files.sort(key=lambda p: len(p.parts))

Negation Patterns

# Handle negation separately to preserve ! at the beginning
is_negation = pattern.startswith("!")
if is_negation:
    pattern = pattern[1:]  # Remove temporarily
    
# ... adjust pattern ...

if is_negation:
    adjusted = f"!{adjusted}"  # Re-add at beginning

Pattern Scope

# Before: level1/pattern
adjusted = f"{rel_base}/{pattern}"

# After: level1/**/pattern (matches at any depth)
adjusted = f"{rel_base}/**/{pattern}" if "*" in pattern else f"{rel_base}/{pattern}"

Tests

All 9 tests now pass (5 from original PR + 4 new):

  • ✅ Root .gitignore only
  • ✅ Subdirectory .gitignore
  • ✅ Nested precedence with negation
  • ✅ Multiple subdirectory .gitignore files
  • ✅ No .gitignore files
  • NEW: CodeSearcher respects subdirectory .gitignore
  • NEW: Absolute patterns in subdirectories
  • NEW: Complex negation scenarios
  • NEW: Deeply nested .gitignore files

Test Results

python -m pytest tests/test_gitignore.py -xvs
# 9 passed in 0.64s

Ferymad and others added 2 commits October 5, 2025 17:11
Load all .gitignore files in repository tree recursively and merge
patterns with proper precedence (deeper overrides shallower).
Adjust relative patterns to be repo-root-relative.

Changes:
- Update RepoMapper._load_gitignore() with recursive loading
- Update CodeSearcher._load_gitignore() with same implementation
- Add comprehensive unit tests for multi-level .gitignore
- Add integration test with humanlayer repo validation

Fixes token overflow on large monorepos with multiple .gitignore files.
Before: 98,895 files (4.4M tokens)
After: Expected ~670 files (~50k tokens)

Related to SOL-1 implementation plan Phase 2.
Fixes #144 by correcting the order and handling of subdirectory .gitignore files:

1. **Fixed pattern precedence**: Changed sort order from deepest-first to
   shallowest-first, allowing subdirectory patterns to properly override
   parent patterns (Git processes .gitignore from root to leaf)

2. **Fixed negation patterns**: Preserve ! prefix at the beginning when
   adjusting patterns for subdirectories (was becoming `dir/!pattern`
   instead of `!dir/**/pattern`)

3. **Fixed subdirectory pattern scope**: Patterns in subdirectory .gitignore
   files now use `/**/` to match at any depth under that directory
   (e.g., `level1/**/*.cache` instead of `level1/*.cache`), matching
   Git's actual behavior

4. **Added comprehensive tests**:
   - Test CodeSearcher respects subdirectory .gitignore
   - Test absolute patterns in subdirectories
   - Test complex negation scenarios
   - Test deeply nested .gitignore files with multiple levels

All original tests from PR #144 now pass, plus 4 additional edge case tests.
- Fix: Remove conditional for /**/  - ALL relative patterns in subdirectory
  .gitignore files must match at any depth, not just patterns with wildcards
  (e.g., 'node_modules' in frontend/.gitignore should match at ANY depth)

- Add test case for patterns without wildcards matching at multiple depths

- Refactor: Extract _adjust_gitignore_pattern() helper method to reduce
  complexity and improve readability of _load_gitignore()

Addresses Vercel bot review comment.
Comment on lines +54 to +60
if pattern.startswith("/"):
# Absolute pattern (relative to gitignore dir) - make relative to repo root
return f"{rel_base}/{pattern[1:]}"
else:
# Relative pattern - applies to directory and all subdirectories
# Use /** to match files at any depth under the directory
return f"{rel_base}/**/{pattern}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pattern adjustment logic error as in repo_mapper.py - patterns with internal forward slashes are incorrectly treated as matching at any depth.

View Details
📝 Patch Details
diff --git a/src/kit/code_searcher.py b/src/kit/code_searcher.py
index 0c7e5f2..eb80bef 100644
--- a/src/kit/code_searcher.py
+++ b/src/kit/code_searcher.py
@@ -54,9 +54,11 @@ class CodeSearcher:
         if pattern.startswith("/"):
             # Absolute pattern (relative to gitignore dir) - make relative to repo root
             return f"{rel_base}/{pattern[1:]}"
+        elif "/" in pattern:
+            # Pattern with internal slash - keep relative to gitignore dir
+            return f"{rel_base}/{pattern}"
         else:
-            # Relative pattern - applies to directory and all subdirectories
-            # Use /** to match files at any depth under the directory
+            # Pattern without slash - applies to any depth
             return f"{rel_base}/**/{pattern}"
 
     def _load_gitignore(self):
diff --git a/src/kit/repo_mapper.py b/src/kit/repo_mapper.py
index 62ae31d..c3c981a 100644
--- a/src/kit/repo_mapper.py
+++ b/src/kit/repo_mapper.py
@@ -41,9 +41,11 @@ class RepoMapper:
         if pattern.startswith("/"):
             # Absolute pattern (relative to gitignore dir) - make relative to repo root
             return f"{rel_base}/{pattern[1:]}"
+        elif "/" in pattern:
+            # Pattern with internal slash - keep relative to gitignore dir
+            return f"{rel_base}/{pattern}"
         else:
-            # Relative pattern - applies to directory and all subdirectories
-            # Use /** to match files at any depth under the directory
+            # Pattern without slash - applies to any depth
             return f"{rel_base}/**/{pattern}"
 
     def _load_gitignore(self):

Analysis

Gitignore pattern adjustment logic error - patterns with internal slashes incorrectly treated as recursive

What fails: The _adjust_gitignore_pattern() method in both CodeSearcher (line 54-60) and RepoMapper (line 41-48) incorrectly adds recursive **/ matching to all patterns without a slash, including patterns that contain internal slashes. This causes patterns like src/config.js from a subdirectory .gitignore to match files at ANY depth under that directory, rather than only at the exact relative path.

How to reproduce:

  1. Create a repository structure:

    subdir/
    ├── .gitignore (contains: src/config.js)
    ├── src/
    │   └── config.js
    └── anything/
        └── src/
            └── config.js
    
  2. Run CodeSearcher or RepoMapper on this repository

  3. Check which files are ignored via _should_ignore() method

Result: Both subdir/src/config.js and subdir/anything/src/config.js are marked as ignored

Expected: Only subdir/src/config.js should be ignored. The pattern src/config.js in subdir/.gitignore means "match the file at the path src/config.js relative to the subdirectory", not "match src/config.js at any depth". This matches Git's actual behavior per gitignore documentation - a pattern containing a slash is treated as a glob pattern relative to the directory containing the .gitignore file.

Fix: The _adjust_gitignore_pattern() method now checks for internal slashes before adding **/:

  • Patterns starting with / are anchored to the gitignore directory
  • Patterns containing internal / are matched exactly relative to the gitignore directory
  • Only patterns without any slashes use **/ to match at any depth

Comment on lines +44 to +47
else:
# Relative pattern - applies to directory and all subdirectories
# Use /** to match files at any depth under the directory
return f"{rel_base}/**/{pattern}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern adjustment logic incorrectly applies recursive matching (**/) to all patterns in subdirectories. Patterns containing internal forward slashes (like src/config.js) should match only relative to the .gitignore directory, not at any depth.

View Details
📝 Patch Details
diff --git a/src/kit/code_searcher.py b/src/kit/code_searcher.py
index 0c7e5f2..806c62d 100644
--- a/src/kit/code_searcher.py
+++ b/src/kit/code_searcher.py
@@ -54,8 +54,14 @@ class CodeSearcher:
         if pattern.startswith("/"):
             # Absolute pattern (relative to gitignore dir) - make relative to repo root
             return f"{rel_base}/{pattern[1:]}"
+        elif "/" in pattern:
+            # Pattern with internal slash - keep relative to gitignore dir
+            # According to gitignore spec: "If there is a separator at the beginning
+            # or middle of the pattern, then the pattern is relative to the directory
+            # level of the .gitignore file itself"
+            return f"{rel_base}/{pattern}"
         else:
-            # Relative pattern - applies to directory and all subdirectories
+            # Pattern without slash - applies to directory and all subdirectories
             # Use /** to match files at any depth under the directory
             return f"{rel_base}/**/{pattern}"
 
diff --git a/src/kit/repo_mapper.py b/src/kit/repo_mapper.py
index 62ae31d..a011eb3 100644
--- a/src/kit/repo_mapper.py
+++ b/src/kit/repo_mapper.py
@@ -41,8 +41,14 @@ class RepoMapper:
         if pattern.startswith("/"):
             # Absolute pattern (relative to gitignore dir) - make relative to repo root
             return f"{rel_base}/{pattern[1:]}"
+        elif "/" in pattern:
+            # Pattern with internal slash - keep relative to gitignore dir
+            # According to gitignore spec: "If there is a separator at the beginning
+            # or middle of the pattern, then the pattern is relative to the directory
+            # level of the .gitignore file itself"
+            return f"{rel_base}/{pattern}"
         else:
-            # Relative pattern - applies to directory and all subdirectories
+            # Pattern without slash - applies to directory and all subdirectories
             # Use /** to match files at any depth under the directory
             return f"{rel_base}/**/{pattern}"
 

Analysis

Incorrect recursive matching for patterns with internal slashes in subdirectory .gitignore files

What fails: The _adjust_gitignore_pattern() method in both RepoMapper and CodeSearcher incorrectly applies recursive matching (**/) to patterns containing forward slashes when they appear in subdirectory .gitignore files. This causes patterns like src/config.js to match at arbitrary depths below the .gitignore directory instead of only relative to that directory.

How to reproduce:

  1. Create a subdirectory with .gitignore containing a pattern with an internal slash: frontend/.gitignore with pattern src/config.js
  2. Create files at frontend/src/config.js and frontend/something/src/config.js
  3. Call RepoMapper.get_file_tree() or CodeSearcher to check which files are ignored
  4. The pattern matches both files when it should only match frontend/src/config.js

Expected behavior: According to git documentation on gitignore patterns: "If there is a separator at the beginning or middle of the pattern, then the pattern is relative to the directory level of the .gitignore file itself." Patterns with internal slashes should NOT be treated as wildcard patterns that match at any depth.

Root cause: The code had three cases but was missing the distinction for patterns with internal slashes:

  • Patterns starting with / (absolute): handled correctly, removes leading / and prepends relative path
  • Patterns containing / (internal slash): MISSING - was being treated same as patterns without slashes
  • Patterns without /: correctly append **/ to match at any depth

The fix adds explicit handling for patterns containing internal slashes by prepending the relative base path without **/, keeping them relative to the .gitignore directory location.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants