"Repurposing Life through Geometric Deep Learning."
BioSyn AI is an end-to-end generative pipeline designed to imagine novel drug candidates (ligands) that bind to specific protein targets. It combines E(n)-Equivariant Graph Neural Networks (GNNs) for protein structure encoding with 3D Denoising Diffusion Probabilistic Models (DDPMs) for molecule generation.
The pipeline follows a closed-loop generative process:
graph LR
A[Protein PDB] -->|Ingestion Engine| B(Geometric Graph)
B -->|GNN Encoder| C[Context Embedding]
D[Gaussian Noise] -->|Diffusion Model| E{Reverse Process}
C --> E
E -->|Denoising| F[3D Atom Cloud]
F -->|KNN Builder| G[SMILES Candidate]
- Ingestion: TypeScript engine fetches raw PDB/SDF files from biological databases.
- Encoder: A GNN extracts geometric features (invariant to rotation/translation) from the protein pocket.
- Decoder: A Diffusion model iteratively refines random noise into stable 3D molecular structures conditioned on the protein embedding.
- Inference: A robust
MoleculeBuilderreconstructs valid chemical graphs from 3D point clouds using K-Nearest Neighbors (KNN) logic.
- Python 3.10+
- Node.js (v16+)
- CUDA-enabled GPU (Recommended)
Clone the repository and set up the hybrid environment.
# Clone the repo
git clone [https://github.com/zumermalik/BioSyn-AI-Repurposing-Life.git](https://github.com/zumermalik/BioSyn-AI-Repurposing-Life.git)
cd BioSyn-AI-Repurposing-Life
# Set up Python Environment (Conda recommended for RDKit compatibility)
conda create -n biosyn python=3.10 -y
conda activate biosyn
# Install Core Dependencies
pip install -r requirements.txt
# Install Ingestion Engine (TypeScript)
npm install
You can run the entire inference stack with a single command. This will load the pre-trained checkpoint and generate candidates for the target protein 5R82.
# Run Inference
python src/pipeline/inference_pipeline.py
Expected Output:
🧪 Starting BioSyn Inference on cuda...
>> Target Protein: 5R82.pdb
>> Loading checkpoint: checkpoints/biosyn_epoch_5.pt
>> Generating 5 drug candidates...
🔹 Candidate 1: CC(=O)Nc1ccc(O)cc1
🔹 Candidate 2: CN1C=NC2=C1C(=O)N(C(=O)N2C)C
✅ Generation Complete. 5 candidates saved to results/
BioSyn-AI-Repurposing-Life/
├── configs/ # Hyperparameter Configuration
├── data/ # Data Storage
│ ├── external/ # External Databases (PDBBind/CrossDocked)
│ ├── processed/ # PyTorch Geometric Tensors
│ └── raw/ # Original PDB/SDF Files
├── notebooks/ # Jupyter Prototyping Environments
├── src/ # Source Code
│ ├── chemistry/ # RDKit Logic & Molecule Builders
│ ├── ingestion/ # TypeScript/Python Data Fetchers
│ ├── models/ # GNN Encoder & Diffusion Decoder
│ ├── pipeline/ # Training & Inference Orchestration
│ ├── utils/ # Utility Functions
│ ├── __init__.py # Package Initialization
│ └── main.py # Main Application Entry Point
├── tests/ # Unit Tests
├── environment.yml # Conda Environment Definition
├── LICENSE # Apache 2.0 License
├── package.json # Node.js Dependencies
├── pyproject.toml # Python Packaging Configuration
├── README.md # Project Documentation
├── requirements.txt # Python Dependencies
├── ROADMAP.md # Future Development Plans
└── tsconfig.json # TypeScript ConfigurationWe use pytest for unit testing the geometric logic and chemical validity.
# Run the full test suite
pytest tests/
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Code Style: Please use
blackfor Python formatting. - Testing: Ensure all new modules have accompanying tests in
tests/. - Data: Do not commit large datasets (PDB/SDF files) to Git. Use the
data/folder.
This project is licensed under the Apache 2.0 License. If you use this architecture in your research, please link back to this repository.
Maintained by the Builders.