Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PyEED Embedding Architecture Report
Executive Summary
The PyEED embedding system has undergone a significant architectural transformation to provide a more robust, scalable, and user-friendly interface for protein embedding calculations. This report details the new structure, legacy support mechanisms, and the design philosophy behind these changes.
Table of Contents
New Architecture Overview
Core Components
The new embedding system is built around several key components:
1. EmbeddingProcessor (Main Interface)
src/pyeed/embeddings/processor.py2. BaseEmbeddingModel (Abstract Base)
src/pyeed/embeddings/base.py3. ModelFactory (Creation Pattern)
src/pyeed/embeddings/factory.py4. Specialized Model Classes
src/pyeed/embeddings/models/Architectural Patterns
Factory Pattern
Singleton Pattern
Complete Module Structure
File Organization
Public API Structure
The module exports a clean public API through
__init__.py:Core Classes
Utility Functions
Backward Compatibility
Base Class Interface
The
BaseEmbeddingModeldefines the contract that all model implementations must follow:Model Type Detection
The system automatically detects model types based on naming conventions:
Legacy Support Strategy
Philosophy
The new architecture maintains 100% backward compatibility while encouraging migration to the improved interfaces. This is achieved through:
Legacy Method Categories
1. Batch Processing Legacy Methods
Modern Equivalent:
2. Single Embedding Legacy Methods
3. Model-Specific Legacy Methods
Migration Safety Net
The legacy support includes:
Backward Compatibility Implementation
The
__init__.pyfile provides comprehensive backward compatibility:Design Philosophy and Benefits
Core Principles
1. Separation of Concerns
2. Unified Interface
3. Automatic Resource Management
4. Extensibility
Key Benefits
Performance Improvements
Developer Experience
Reliability
Usage Guide
Basic Usage
Single Embedding Calculation
Batch Processing
Database Integration
Advanced Configuration
Multi-GPU Processing
Embedding Type Selection
Device Management
Custom Model Implementations
To add a new model type, inherit from
BaseEmbeddingModel:Migration Path
Phase 1: Immediate (No Code Changes Required)
Phase 2: Gradual Migration (Recommended)
Phase 3: Full Adoption (Optional)
Migration Examples
Batch Processing Migration
Single Embedding Migration
Technical Implementation Details
Model Factory Pattern
The factory pattern centralizes model creation logic:
Device Management
Automatic device detection and management:
Memory Management
Automatic cleanup and error recovery:
Legacy Compatibility
Method wrapping preserves old interfaces:
Utility Functions
The system includes comprehensive utility functions:
Error Handling and Resilience
The system includes robust error handling:
Conclusion
The new embedding architecture represents a significant improvement in:
The legacy support ensures a smooth transition path, allowing teams to migrate at their own pace while immediately benefiting from the improved underlying implementation.
Recommendations
The architecture is designed to grow with your needs while maintaining stability and backward compatibility.