Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Cipherscope Architecture

## Overview
Cipherscope is a static analysis scanner designed to build a cryptographic inventory. It parses source files using Tree-sitter, matches library anchors and algorithm symbols, and emits JSONL findings that can be aggregated into an inventory.

## Pipeline
```mermaid
flowchart TD
A[Discovery] --> B[Parsing]
B --> C[Library Anchoring]
C --> D[Algorithm Detection]
D --> E[JSONL Output]

A --> A1[File walk + filters]
B --> B1[Tree-sitter AST]
C --> C1[Import/include anchors]
D --> D1[Symbol match + params]
D1 --> D2[Local constant resolution]
E --> E1[Library + algorithm assets]
```

## Data Model
- Library hit: name, file path, evidence location.
- Algorithm hit: name, file path, evidence location, metadata (e.g., key size, primitive).
- Output format is designed for tooling pipelines and inventory aggregation.

### JSONL Schema (Informal)
```json
{
"assetType": "library|algorithm",
"identifier": "string",
"path": "string",
"evidence": {
"line": 1,
"column": 1
},
"metadata": {
"primitive": "string",
"keySize": 256
}
}
```

## Dedupe Policy
To reduce overcounting on a single callsite, Cipherscope applies a simple same-line dedupe rule after matching:
- If two algorithms share the same `primitive` and line, drop the generic identifier when a more specific variant is present.
- A more specific identifier is one that either:
- starts with the generic identifier plus a `-` (e.g., `AES-GCM` over `AES`), or
- shares the same non-numeric tokens but adds numeric detail (e.g., `ECDSA-P256` over `ECDSA`).
- Different primitives on the same line are kept.

## Patterns and Extensibility
Patterns live in `patterns.toml`:
- Libraries define anchors and API regexes.
- Algorithms define symbol patterns and parameter extraction rules.
Adding a new library or algorithm usually only requires editing `patterns.toml`.

## Scope and Limits
- Inventory-first: it focuses on discovering crypto usage and relevant metadata.
- Local constant resolution only; cross-file or full data-flow analysis is out of scope for now.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,13 @@

[![CI](https://github.com/script3r/cipherscope/actions/workflows/ci.yml/badge.svg)](https://github.com/script3r/cipherscope/actions/workflows/ci.yml)

`cipherscope` is a high-performance, command-line tool for scanning source code to detect the usage of cryptographic libraries and algorithms. It uses language-aware static analysis powered by [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) for high precision.
`cipherscope` is a high-performance, command-line tool for scanning source code to detect the usage of cryptographic libraries and algorithms. The goal is to enable building an efficient, comprehensive cryptographic inventory. It uses language-aware static analysis powered by [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) for high precision.

## Key Features

- **High Performance**: Parallelized scanning of large codebases.
- **Language-Aware**: Uses Tree-sitter parsers to reduce false positives by understanding code structure.
- **Inventory-First**: Focused on assembling a reliable crypto usage inventory across large repos.
- **Extensible Patterns**: Easily add new libraries and algorithms via a simple TOML configuration.
- **Broad Language Support**: Currently supports C, C++, Java, Python, Go, Swift, PHP, Objective-C, and Rust.
- **Developer Friendly**: JSONL output for easy integration with CI/CD pipelines and security tools.
Expand All @@ -29,6 +30,7 @@
c. **Algorithm Detection**: If an anchor is found, the scanner performs a deeper search within that file for specific algorithm usage patterns, such as function calls and constants.

All results are streamed as JSONL to the output, allowing for real-time monitoring and processing.
For a deeper architecture overview, see `DESIGN.md`.

## Installation

Expand Down
98 changes: 49 additions & 49 deletions fixtures/cpp/libsodium_comprehensive/expected.jsonl

Large diffs are not rendered by default.

201 changes: 97 additions & 104 deletions fixtures/cpp/mbedtls_comprehensive/expected.jsonl

Large diffs are not rendered by default.

62 changes: 30 additions & 32 deletions fixtures/cpp/openssl_comprehensive/expected.jsonl
Original file line number Diff line number Diff line change
@@ -1,42 +1,40 @@
{"assetType": "library", "evidence": {"column": 1, "line": 19}, "identifier": "OpenSSL", "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 12, "line": 137}, "identifier": "DSA", "metadata": {"primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 12, "line": 72}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 12, "line": 80}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 12, "line": 88}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 26, "line": 238}, "identifier": "HKDF", "metadata": {"primitive": "kdf"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 30, "line": 227}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 106}, "identifier": "ECDSA-P384", "metadata": {"primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 114}, "identifier": "ECDSA-P521", "metadata": {"primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 122}, "identifier": "DH", "metadata": {"keySize": 2048, "primitive": "keyexchange"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 158}, "identifier": "SHA-1", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 163}, "identifier": "SHA-224", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 168}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 173}, "identifier": "SHA-384", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 178}, "identifier": "SHA-512", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 51}, "identifier": "ChaCha20", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 59}, "identifier": "Blowfish", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 35}, "identifier": "AES-GCM", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 40}, "identifier": "AES-GCM", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 29}, "identifier": "AES-CBC", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 32}, "identifier": "AES-CBC", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 183}, "identifier": "SHA3-224", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 188}, "identifier": "SHA3-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 193}, "identifier": "SHA3-384", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 198}, "identifier": "SHA3-512", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 203}, "identifier": "BLAKE2b", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 208}, "identifier": "BLAKE2s", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 213}, "identifier": "MD5", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 226}, "identifier": "PBKDF2", "metadata": {"iterations": 10000, "primitive": "kdf"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 158}, "identifier": "SHA-1", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 163}, "identifier": "SHA-224", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 12, "line": 137}, "identifier": "DSA", "metadata": {"primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 29}, "identifier": "AES", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 32}, "identifier": "AES", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 226}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 234}, "identifier": "Scrypt", "metadata": {"N": 16384, "primitive": "kdf"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 26, "line": 238}, "identifier": "HKDF", "metadata": {"primitive": "kdf"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 239}, "identifier": "HKDF", "metadata": {"primitive": "kdf"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 240}, "identifier": "HKDF", "metadata": {"primitive": "kdf"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 173}, "identifier": "SHA-384", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 106}, "identifier": "ECDSA-P384", "metadata": {"primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 62}, "identifier": "RC4", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 198}, "identifier": "SHA3-512", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 183}, "identifier": "SHA3-224", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 168}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 226}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 30, "line": 227}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 240}, "identifier": "SHA-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 98}, "identifier": "ECDSA-P256", "metadata": {"primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 188}, "identifier": "SHA3-256", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 203}, "identifier": "BLAKE2b", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 12, "line": 72}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 12, "line": 80}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 12, "line": 88}, "identifier": "RSA", "metadata": {"keySize": 2048, "primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 29}, "identifier": "AES-CBC", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 32}, "identifier": "AES-CBC", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 35}, "identifier": "AES-GCM", "metadata": {"keySize": 128, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 40}, "identifier": "AES-GCM", "metadata": {"keySize": 256, "primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 45}, "identifier": "3DES", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 122}, "identifier": "DH", "metadata": {"keySize": 2048, "primitive": "keyexchange"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 213}, "identifier": "MD5", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 114}, "identifier": "ECDSA-P521", "metadata": {"primitive": "signature"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 54}, "identifier": "ChaCha20-Poly1305", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 208}, "identifier": "BLAKE2s", "metadata": {"primitive": "hash"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 45}, "identifier": "DES", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 48}, "identifier": "DES", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 51}, "identifier": "ChaCha20", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 54}, "identifier": "ChaCha20-Poly1305", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 59}, "identifier": "Blowfish", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 62}, "identifier": "RC4", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 5, "line": 98}, "identifier": "ECDSA-P256", "metadata": {"primitive": "signature"}, "path": "FIXME"}
{"assetType": "library", "evidence": {"column": 1, "line": 19}, "identifier": "OpenSSL", "path": "FIXME"}
2 changes: 1 addition & 1 deletion fixtures/cpp/tink_aesgcm/expected.jsonl
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{"assetType": "library", "evidence": {"column": 1, "line": 5}, "identifier": "Google Tink (C++)", "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 33, "line": 25}, "identifier": "AES-GCM", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "algorithm", "evidence": {"column": 9, "line": 26}, "identifier": "AES-GCM", "metadata": {"primitive": "symmetric"}, "path": "FIXME"}
{"assetType": "library", "evidence": {"column": 1, "line": 5}, "identifier": "Google Tink (C++)", "path": "FIXME"}
Loading