A high-performance, production-grade computer vision system for automated detection and counting of trout eggs using deep learning. This system leverages the Roboflow inference engine to identify and classify different egg states (eyed, blank, dead) with configurable confidence thresholds and advanced QR code masking capabilities.
- Overview
- Features
- System Architecture
- Installation
- Quick Start
- Usage
- Configuration
- Output Format
- Performance Optimization
- SLURM Deployment
- API Reference
- Troubleshooting
- Contributing
- License
- Citation
- Acknowledgments
Manual counting of trout eggs represents a significant bottleneck in aquaculture operations, requiring extensive labor hours and introducing human error into hatchery management. This automated computer vision system addresses these challenges by providing rapid, accurate, and reproducible egg counting across large image datasets.
The system processes images of trout eggs to detect three critical egg states:
- Eyed eggs: Fertilized eggs showing visible eye development
- Blank eggs: Unfertilized or early-stage eggs without eye development
- Dead eggs: Non-viable eggs requiring removal
Built on the Roboflow object detection platform, the system employs a trained neural network model that achieves high accuracy through:
- Configurable confidence thresholding for precision-recall optimization
- Intelligent image slicing for processing high-resolution photographs
- Advanced QR code detection and masking to prevent counting artifacts
- Parallel processing architecture for high-throughput analysis
- Multi-class Detection: Simultaneously identifies and counts eyed, blank, and dead eggs
- High-Resolution Processing: Handles large images through intelligent slicing (default 640Ă—640 patches)
- QR Code Intelligence: Automatically detects and masks QR codes with configurable expansion factors
- Batch Processing: Processes entire directories with parallel execution
- Annotated Outputs: Generates visualizations with bounding boxes for quality control
- Flexible Configuration: JSON-based configuration system for easy parameter tuning
- SLURM Integration: Designed for high-performance computing environments
- Logging System: Comprehensive logging with configurable verbosity
- Error Handling: Robust error recovery with detailed traceback reporting
- Progress Tracking: Real-time progress bars for long-running operations
- CSV Export: Structured data output for downstream analysis
- Configurable Parallelism: Adjustable worker threads based on available resources
- Detection Confidence Scores: Per-detection confidence values for quality filtering
- Visual Verification: Annotated images enable manual validation of results
- Reproducible Results: Deterministic processing with documented parameters
- Extensible Architecture: Modular design supports easy customization
The system implements a multi-stage processing pipeline:
- Image Loading: Reads images from specified directory using OpenCV
- QR Code Processing: Detects QR codes, extracts identifiers, creates expansion masks
- Image Masking: Applies masks to prevent detection in QR code regions
- Inference Slicing: Divides images into overlapping tiles for processing
- Detection Aggregation: Combines detections across tiles with NMS
- Result Compilation: Generates CSV summaries and annotated visualizations
The detection system operates through a convolutional neural network trained on labeled trout egg images. Key mathematical operations include:
- Confidence Thresholding: Filters detections based on model confidence scores
- IoU-based NMS: Eliminates duplicate detections using intersection-over-union calculations
- Coordinate Transformation: Maps detections from slice coordinates to full image space
- Geometric Scaling: Expands QR code boundaries using configurable scaling factors
The system achieves high throughput through ThreadPoolExecutor-based parallelism:
- Concurrent image processing across multiple worker threads
- Progress tracking with tqdm for real-time monitoring
- Exception handling preserves partial results from failed images
- Automatic CPU count detection for optimal worker allocation
- Python 3.8 or higher
- CUDA-capable GPU (optional, for accelerated inference)
- Sufficient RAM for image processing (minimum 8GB recommended)
# Clone the repository
git clone https://github.com/yourusername/trout-egg-counter.git
cd trout-egg-counter
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Load required modules
module load python/3.8
module load cuda/11.8 # Optional, for GPU acceleration
# Create virtual environment in scratch space
python -m venv $SCRATCH/trout-egg-env
source $SCRATCH/trout-egg-env/bin/activate
# Install dependencies
pip install --no-cache-dir -r requirements.txtCore dependencies (see requirements.txt for complete list):
supervision>=0.16.0
inference>=0.9.0
opencv-python>=4.8.0
numpy>=1.24.0
pandas>=2.0.0
Pillow>=10.0.0
tqdm>=4.65.0
Process a directory of images with default settings:
python count_eggs_production.py \
--input-dir /path/to/images \
--output-dir /path/to/resultspython count_eggs_production.py \
--input-dir /path/to/images \
--output-dir /path/to/results \
--config config.json \
--confidence 0.5 \
--max-workers 8For debugging or memory-constrained environments:
python count_eggs_production.py \
--input-dir /path/to/images \
--output-dir /path/to/results \
--sequential \
--log-level DEBUGThe system provides a comprehensive command-line interface:
python count_eggs_production.py [OPTIONS]-i, --input-dir PATH: Directory containing input images-o, --output-dir PATH: Directory for output files
--config PATH: Path to JSON configuration file--csv-output PATH: Custom path for CSV results (default: output_dir/egg_count_results.csv)--log-level {DEBUG,INFO,WARNING,ERROR}: Logging verbosity (default: INFO)--log-file PATH: Write logs to file--max-workers N: Maximum worker threads (default: CPU count)--confidence FLOAT: Detection confidence threshold (default: 0.45)--sequential: Process images sequentially instead of parallel
Import and use the system in Python code:
from count_eggs_production import EggCounter, EggCounterConfig
import logging
# Configure system
config = EggCounterConfig()
config.confidence_threshold = 0.5
config.max_workers = 4
# Initialize
logger = logging.getLogger("egg_counter")
counter = EggCounter(config, logger)
# Process single image
results_df = counter.count_eggs(
image_path="image.jpg",
output_dir="./results"
)
print(f"Total eggs detected: {results_df['total_eggs'].values[0]}")Create a config.json file to customize system behavior:
{
"model_id": "egg_training-bi/1",
"api_key": "your_roboflow_api_key",
"confidence_threshold": 0.45,
"iou_threshold": 0.5,
"slice_size": [640, 640],
"qr_scale_x": 1.6,
"qr_scale_y": 2.25,
"output_image_size": [4000, 6000],
"max_workers": 8
}model_id: Roboflow model identifier (format: "workspace/version")api_key: Your Roboflow API key for model accessconfidence_threshold: Minimum detection confidence (0.0-1.0, default: 0.45)iou_threshold: IoU threshold for non-maximum suppression (default: 0.5)
slice_size: Dimensions for image slicing as [width, height] (default: [640, 640])max_workers: Number of parallel worker threads (default: CPU count)output_image_size: Resize dimensions for annotated images (default: [4000, 6000])
qr_scale_x: Horizontal expansion factor for QR code masking (default: 1.6)qr_scale_y: Vertical expansion factor for QR code masking (default: 2.25)
The confidence threshold balances precision and recall:
- Higher values (0.6-0.8): Fewer false positives, may miss some eggs
- Lower values (0.3-0.4): Captures more eggs, increased false positives
- Recommended approach: Start at 0.45, adjust based on validation results
QR codes in images are automatically detected and masked to prevent false detections. The scaling factors control mask size:
qr_scale_x = 1.6: Expands mask 60% beyond QR code horizontallyqr_scale_y = 2.25: Expands mask 125% beyond QR code vertically
Adjust these values if QR codes contain useful counting areas or if masks are too aggressive.
Processing creates the following output structure:
output_dir/
├── egg_count_results.csv # Detection summary
├── annotated_images/ # Visual outputs
│ ├── [identifier_1].png
│ ├── [identifier_2].png
│ └── ...
└── processing.log # Detailed logs (if --log-file specified)
The results CSV contains detection counts by egg type:
,blank,dead,eyed,total_eggs
QR123,45,12,203,260
QR124,38,8,198,244
QR125,52,15,210,277
Columns:
- Index: Image identifier (from QR code or filename)
- blank: Count of blank/unfertilized eggs
- dead: Count of dead/non-viable eggs
- eyed: Count of eyed/fertilized eggs
- total_eggs: Total detections across all classes
Each annotated image shows:
- Bounding boxes around detected eggs
- Color-coded by egg type (if supported by supervision library)
- Original image with detection overlays
- Resized to configured output dimensions
Maximize throughput by tuning worker count:
# Use all available CPUs
python count_eggs_production.py -i images/ -o results/
# Limit to 4 workers for memory-constrained systems
python count_eggs_production.py -i images/ -o results/ --max-workers 4For large images or limited RAM:
- Reduce slice size: Smaller slices reduce memory per operation
- Sequential processing: Use
--sequentialflag to process one image at a time - Adjust output size: Smaller annotated images reduce memory requirements
config = EggCounterConfig()
config.slice_size = (512, 512) # Smaller slices
config.output_image_size = (2000, 3000) # Smaller outputsThe Roboflow inference engine automatically uses GPU if available:
# Verify CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# Process with GPU acceleration (automatic)
python count_eggs_production.py -i images/ -o results/For very large datasets:
- Divide into batches: Process 100-500 images per batch
- Monitor system resources: Adjust worker count based on CPU/memory usage
- Checkpoint progress: Process subdirectories separately for fault tolerance
Create run_egg_counting.sh:
#!/bin/bash
#SBATCH --job-name=egg_count
#SBATCH --output=egg_count_%j.out
#SBATCH --error=egg_count_%j.err
#SBATCH --time=4:00:00
#SBATCH --cpus-per-task=16
#SBATCH --mem=32G
#SBATCH --partition=general
# Load modules
module load python/3.8
module load cuda/11.8
# Activate environment
source $SCRATCH/trout-egg-env/bin/activate
# Run processing
python count_eggs_production.py \
--input-dir $SCRATCH/trout_images \
--output-dir $SCRATCH/results \
--log-file $SCRATCH/results/processing.log \
--max-workers 16 \
--confidence 0.45
# Copy results to permanent storage
cp -r $SCRATCH/results $HOME/trout_analysis/sbatch run_egg_counting.sh# Check job status
squeue -u $USER
# View output logs
tail -f egg_count_JOBID.out
# Monitor resource usage
sstat -j JOBID --format=JobID,MaxRSS,AveCPUConfiguration container for system parameters.
config = EggCounterConfig(config_path="config.json")Methods:
load_from_file(config_path): Load configuration from JSONsave_to_file(config_path): Save configuration to JSON
Attributes:
model_id: Roboflow model identifierapi_key: Roboflow API keyconfidence_threshold: Detection confidence thresholdiou_threshold: NMS IoU thresholdslice_size: Image slice dimensionsqr_scale_x: QR mask horizontal scalingqr_scale_y: QR mask vertical scalingoutput_image_size: Annotated image dimensionsmax_workers: Maximum parallel workers
Main processing class for egg detection and counting.
counter = EggCounter(config, logger)Methods:
Process single image and return detection results.
Parameters:
image_path(str): Path to input imageoutput_dir(str): Directory for output files
Returns:
pandas.DataFrame: Detection counts by egg type
Example:
results = counter.count_eggs("image.jpg", "./output")
print(results)Configure logging system.
Parameters:
log_level(str): Logging level (DEBUG/INFO/WARNING/ERROR)log_file(str, optional): Path to log file
Returns:
logging.Logger: Configured logger instance
Retrieve all image paths from directory.
Parameters:
input_dir(str): Directory containing imagesextensions(list, optional): Image file extensions to include
Returns:
list: Sorted list of image paths
Process multiple images in parallel.
Parameters:
egg_counter(EggCounter): Configured counter instanceimage_paths(list): List of image pathsoutput_dir(str): Output directorymax_workers(int, optional): Number of workers
Returns:
list: List of result DataFrames
Problem: Failed to load model: Authentication error
Solution: Verify your Roboflow API key is correct:
python count_eggs_production.py --config config.json ...Ensure api_key in configuration matches your Roboflow account.
Problem: MemoryError or system slowdown during processing
Solutions:
- Reduce worker count:
--max-workers 2 - Use sequential processing:
--sequential - Reduce slice size in configuration
- Process smaller batches of images
Problem: Images processed but QR codes not recognized
Solutions:
- Ensure QR codes have sufficient contrast and size
- Verify QR codes are not damaged or obscured
- Check that images contain valid QR code formats
- Image filenames will be used as identifiers if QR detection fails
Problem: System misses eggs or reports too many false positives
Solutions:
- Adjust confidence threshold:
- Lower for more detections:
--confidence 0.35 - Higher for fewer false positives:
--confidence 0.55
- Lower for more detections:
- Verify image quality meets model requirements
- Check that QR code masking is not too aggressive
- Consider retraining model with additional examples
Enable detailed logging for troubleshooting:
python count_eggs_production.py \
--input-dir images/ \
--output-dir results/ \
--log-level DEBUG \
--log-file debug.logIf issues persist:
- Check the log files for detailed error messages
- Verify all dependencies are correctly installed
- Test with a small subset of images first
- Open an issue on GitHub with:
- Complete error message
- System configuration
- Sample images (if possible)
- Steps to reproduce
We welcome contributions to improve the Trout Egg Counter system.
# Clone repository
git clone https://github.com/yourusername/trout-egg-counter.git
cd trout-egg-counter
# Create development environment
python -m venv dev-env
source dev-env/bin/activate
# Install development dependencies
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install- Follow PEP 8 style guidelines
- Use type hints for function signatures
- Add docstrings for all public functions and classes
- Maintain test coverage above 80%
- Run black formatter before committing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes with clear commit messages
- Add tests for new functionality
- Update documentation as needed
- Push to your fork and submit a pull request
Run the test suite:
# Run all tests
pytest tests/
# Run with coverage
pytest --cov=count_eggs_production tests/
# Run specific test file
pytest tests/test_egg_counter.pyThis project is licensed under the MIT License - see the LICENSE file for details.
Permission is granted to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the software, subject to including the copyright notice and permission notice in all copies or substantial portions of the software.
The software is provided "as is", without warranty of any kind.
If you use this system in your research, please cite:
@software{trout_egg_counter,
title={Trout Egg Counter: Production-Ready Computer Vision for Aquaculture},
author={[Your Name]},
year={2025},
url={https://github.com/yourusername/trout-egg-counter}
}This project builds upon several excellent open-source tools:
- Roboflow: Computer vision platform and inference engine
- Supervision: Detection utilities and visualization tools
- OpenCV: Computer vision and image processing
- NumPy/Pandas: Numerical computing and data analysis
Special thanks to the aquaculture research community for providing the biological context and validation datasets that made this system possible.
For questions, suggestions, or collaborations:
- Email: aja294@cornell.edu
- Email: edr24@cornell.edu
- GitHub Issues: Report issues
- Documentation: Full documentation
Note: This system is designed for research purposes. For production aquaculture deployment, consult with domain experts to validate accuracy for your specific trout species and imaging conditions.