-
Notifications
You must be signed in to change notification settings - Fork 6
Embedding fixes #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding fixes #154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a normalize flag throughout embedding calculation routines, standardizes Neo4j queries to use elementId instead of id, and updates several dependencies and logging patterns.
- Adds
normalize: bool = Trueparameter to all embedding methods and propagates it through the processor and model implementations - Replaces
id(r)withelementId(r)in standard numbering and mutation detection queries - Improves logging in sequence alignment and ontology loading, and updates project dependencies
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/pyeed/embeddings/processor.py | Added normalize parameter to public/legacy embedding APIs |
| src/pyeed/embeddings/models/prott5.py | Propagated normalize and pooling changes for ProtT5 |
| src/pyeed/embeddings/models/esmc.py | Added normalize flag to ESMC batch/single embedding |
| src/pyeed/embeddings/models/esm3.py | Added normalize flag to ESM3 batch/single embedding |
| src/pyeed/embeddings/models/esm2.py | Added normalize flag to ESM2 batch/single embedding |
| src/pyeed/embeddings/base.py | Updated abstract methods to include normalize parameter |
| src/pyeed/analysis/standard_numbering.py | Switched to elementId and refined logging |
| src/pyeed/analysis/sequence_alignment.py | Replaced print with logger, fixed query patterns |
| src/pyeed/analysis/ontology_loading.py | Improved OWL restriction handling and relationship logic |
| src/pyeed/analysis/mutation_detection.py | Updated region_ids_neo4j type and elementId queries |
| pyproject.toml | Removed old numpy constraint; replaced umap with umap-learn |
Comments suppressed due to low confidence (1)
src/pyeed/embeddings/models/prott5.py:145
- The variable
attention_maskis not defined in this scope. You need to obtain it from the model outputs (e.g.,outputs.attention_mask) or pass it into the method.
seq_len = attention_mask.cpu().numpy().sum()
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Added small fixed in embedding and mutation and standard numbering.