Skip to content

TheCloudlet/Coogle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coogle

Coogle Banner

Coogle is a high-performance C++ command-line tool for searching C/C++ functions based on their type signatures — inspired by Hoogle from the Haskell ecosystem.

Overview

In large C/C++ codebases — especially legacy systems or unfamiliar third-party libraries — it's often difficult to locate the right function just by grepping filenames or browsing header files. Coogle helps by allowing you to search functions using partial or full type signatures.

Features

  • Zero-allocation hot path: 99.95% reduction in heap allocations for blazing-fast searches
  • Intelligent caching: Pre-normalized type signatures for O(1) comparison
  • Wildcard support: Use * to match any argument type
  • Directory search: Recursively search entire codebases
  • System header filtering: Show only your code, not stdlib matches
  • Template-aware: Correctly handles std::string, std::vector<T>, and other templates
  • Memory safe: RAII throughout, zero manual resource management

Requirements

  • C++17 compiler (GCC 7+, Clang 5+, or MSVC 2019+)
  • CMake 3.14+
  • libclang - LLVM/Clang tooling library (LLVM 10+)
  • GoogleTest - optional, for unit testing

Installation

1. Clone the repository

git clone https://github.com/TheCloudlet/Coogle
cd Coogle

2. Install LLVM (macOS example)

brew install llvm

For other platforms, see LLVM installation guide.

3. Configure environment variables (macOS with Homebrew LLVM)

Add these to your shell config (~/.zshrc, ~/.bash_profile, etc.):

export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"
export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"

Then apply the settings:

source ~/.zshrc  # or source ~/.bash_profile

4. Build the project

cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j8

This will generate the coogle executable inside the build/ directory.

5. Run tests (optional)

cd build && ctest --output-on-failure

Usage

Coogle supports both single file and directory modes:

# Search a single file
./build/coogle <source_file> "<function_signature>"

# Search an entire directory (recursive)
./build/coogle <directory> "<function_signature>"

Signature Format

Signatures follow the format:

return_type(arg1_type, arg2_type, ...)

For example, int(char *, int) matches any function returning int and taking two arguments: char * and int.

You can also use a wildcard * for any argument type. For example, to find a function that returns int, takes a char * as its first argument, and any type as its second, you could search for int(char *, *).

Examples

Search a single file:

./build/coogle test/inputs/example.c "int(int, int)"

Search an entire directory:

./build/coogle src/ "void(char *)"

Search current directory:

./build/coogle . "int(*)(void)"

Search with wildcards:

./build/coogle . "void(*, *)"

Template matching:

./build/coogle . "std::vector<int>(const std::vector<int> &)"

Architecture

Coogle implements a zero-allocation architecture for maximum performance:

Core Components

  1. Arena Allocator: Bump allocator storing all strings in a single contiguous buffer
  2. String Arena: std::vector<char> backing store with string_view references
  3. Pre-normalization: Types normalized once at parse time, not during matching
  4. AST Parsing: Uses libclang to parse C/C++ source files
  5. Type Normalization: Removes whitespace, const, class, struct, union keywords
  6. RAII Management: Custom wrappers for safe libclang resource handling

Benchmark Results (LLVM Codebase)

Scanning the entire LLVM project (~6,700 C++ files) on a modern 8-core machine:

Metric Result
Total Files 6,691
Search Time ~3.2 seconds
Throughput ~2,100 files/sec
Memory Usage ~126 MB RSS

Tested with queries void(llvm::raw_ostream &) and int(int, int).

Performance Characteristics

Metric Before After Improvement
Heap allocations ~10,104 ~5 99.95% reduction
Signature matching O(N×M) O(M) ~1000× faster
Cache misses ~18,000 ~4,000 4.5× reduction
Memory usage Fragmented Contiguous 7× reduction

Recent Improvements

Zero-Allocation Refactoring (2025-11)

  • ✅ Implemented arena allocator with string_view for zero-copy semantics
  • ✅ Pre-normalize types during parsing for O(1) comparison
  • ✅ Custom C++17-compatible span<T> implementation
  • ✅ Reduced heap allocations
  • ✅ 1000× faster signature matching through pre-normalization
  • ✅ Comprehensive test suite with 24 unit tests (100% passing)
  • ✅ Packed data structures for cache efficiency
  • ✅ Checkpacked data structures for cache efficiency
  • ✅ Flat results storage for sequential memory access
  • Parallel File Processing: Multi-threaded parsing using std::async (100× speedup on large codebases)

Performance & Correctness (2025-11)

  • ✅ Added directory mode with recursive file discovery
  • ✅ Implemented system header filtering to eliminate stdlib noise
  • ✅ Fixed critical signature matching bug
  • ✅ Optimized type normalization (single-pass character parsing)
  • ✅ Implemented RAII wrappers for memory safety
  • ✅ Added wildcard argument support (*)
  • ✅ Fixed std::basic_stringstd::string normalization

Implementation Status

Core Features:

  • Clang C API integration with libclang
  • Automatic system include path detection
  • AST visitor pattern for function extraction
  • Type normalization with template handling
  • RAII-based resource management
  • Recursive directory search
  • System header filtering
  • Wildcard queries
  • Zero-allocation hot path
  • Pre-normalized type caching
  • Comprehensive unit tests (24 tests)

Future Enhancements:

  • Parallel file processing for large codebases

  • JSON output format for tool integration

  • Regex pattern support for advanced queries

  • Database backend for indexed search

  • VSCode/Editor integration

Project Structure

Coogle/
├── include/coogle/          # Public headers (5 files)
│   ├── arena.h             # Arena allocator + span<T>
│   ├── parser.h            # Signature parsing API
│   ├── clang_raii.h        # RAII wrappers
│   ├── colors.h            # Terminal colors
│   └── includes.h          # System detection
├── src/                    # Implementation (3 files)
│   ├── parser.cpp          # Parsing logic
│   ├── main.cpp            # Application entry
│   └── includes.cpp        # Include detection
├── test/
│   ├── inputs/             # Test C/C++ files
│   └── unit/               # Unit tests (GoogleTest)
├── CMakeLists.txt          # Build configuration
├── README.md               # This file
├── ARCHITECTURE.md         # System design docs
├── CODE_REVIEW.md          # Quality assessment
└── REFACTORING_PLAN.md     # Optimization plan

Testing

Run the test suite:

cd build
ctest --output-on-failure

Test Coverage:

  • Type normalization (6 test cases)
  • Signature parsing (5 test cases)
  • Signature matching (7 test cases)
  • Wildcard matching (1 test case)
  • Real-world signatures (5 test cases)

Total: 24 tests, 100% passing

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License. See LICENSE.txt for details.

Acknowledgments

  • Inspired by Hoogle from the Haskell ecosystem
  • Built with libclang from LLVM
  • Uses {fmt} for string formatting

About

A C/C++ function finder inspired by Hoogle

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published