An extremely fast and efficient duplicate file finder written in Rust to provide fast and accurate results while minimizing disk I/O. Available for Linux and macOS with Windows support planned in the future.
Files are compared using cryptographically secure hashing to ensure accuracy.
Optionally, duplicate files can be replaced with hardlinks to save disk space.
- Features
- Installation
- Usage
- CLI Options
- Benchmarks
- How It Works
- Hardlinking
- Output Formats
- Limitations
- License
- Multi-stage filtering: size grouping -> partial hash (8KB) -> full hash
- Parallel processing with rayon
- BLAKE3 hashing (fast, cryptographically secure)
- Hardlink replacement with dry-run support
- Human-readable and JSON output formats
- Flexible include/exclude glob patterns for filtering files
cargo install dedup-clibrew install denizariyan/tap/dedupcargo build --releaseThe binary will be at target/release/dedup.
See CLI Options for all available options.
# Scan current directory, report duplicates
dedup
# Scan specific directory
dedup /path/to/directory
# Output as JSON
dedup --format json
# Report duplicates with exit code
dedup --action report-exit-code
# Dry-run replacing duplicates with hardlinks
dedup --action hardlink --dry-run
# Skip files by pattern
dedup -e "*.log" -e "*.tmp" -e "node_modules"
# Use an exclude file (gitignore-style, one pattern per line)
dedup --exclude-file .gitignore
# Only scan image files
dedup --include "*.jpg" --include "*.png"
# Use an include file
dedup --include-file patterns.txt
# Scan all images, except those in backup folder - if a file matches both include and exclude, exclude takes precedence
dedup -i "*.jpg" -e "backup"All options can be used in combination.
| Option | Short | Description |
|---|---|---|
--format <FORMAT> |
-f |
Output format: human (default), json, or quiet |
--action <ACTION> |
-a |
Action: none (default), report-exit-code, or hardlink |
--min-size <BYTES> |
-s |
Skip files smaller than this size |
--max-size <BYTES> |
-S |
Skip files larger than this size |
--exclude <PATTERN> |
-e |
Glob pattern to exclude files or directories (can be used multiple times) |
--exclude-file <PATH> |
File containing exclude patterns (gitignore-style) | |
--include <PATTERN> |
-i |
Glob pattern to include files (can be used multiple times). Has no effect on directories |
--include-file <PATH> |
File containing include patterns | |
--verbose |
-v |
Show detailed output with file paths |
--jobs <N> |
-j |
Number of threads to use (defaults to CPU core count) |
--dry-run |
Preview hardlink changes without modifying files | |
--no-progress |
Disable progress bars |
Reference benchmark results for a 10GB dataset with various duplicate ratios and file size distributions can be found below. For more details, see benchmark docs.
In all tested scenarios, dedup outperforms other tested duplicate file finder tools, especially on slower disks where
the multi-stage filtering and parallel computing shines by minimizing the downtime waiting for disk I/O.
The tool uses a multi-stage pipeline to minimize disk I/O to reduce runtime:
- Scan: Walk directory tree, collect file paths and sizes
- Size grouping: Group files by size.
- Partial hash: For remaining candidates, hash only the first 8KB. Group by this partial hash.
- Full hash: For files with matching partial hashes, compute full content hash to confirm duplicates.
This approach avoids reading entire file contents for most files.
Example:
1000 files
↓ size grouping
200 candidates (800 unique sizes skipped)
↓ partial hash (8KB each)
50 candidates (150 different starts)
↓ full hash
20 confirmed duplicates
When using --action hardlink, duplicate files are replaced with hardlinks to a single copy.
Note that hardlinking files means the metadata such as file ownership and permissions are lost for the duplicates which are replaced by hardlinks. It is planned to provide other options in the future, but hardlinking is the only option for now.
If you are packaging the deduplicated files later, consider using a hardlink-aware archiver like tar to benefit from space savings.
Use --dry-run --verbose first to preview what would change.
Duplicate Report
Groups: 3
Total duplicate files: 12
Wasted space: 45.2 MB
Suppresses all output. Useful for scripting in combination with --action report-exit-code.
{
"stats": {
"duplicate_groups": 3,
"duplicate_files": 12,
"wasted_bytes": 47412224
},
"groups": [
{
"size": 15804074,
"files": ["/path/to/file1.jpg", "/path/to/file2.jpg"]
}
]
}- Because Hardlinks are the only deduplication method currently supported, only files within the same filesystem can be deduplicated
- Symlinks are ignored
MIT License. See LICENSE file for details.

