Skip to content

BioSyn AI is an end-to-end generative pipeline that uses E(n)-Equivariant Graph Neural Networks and 3D Diffusion Models to hallucinate novel drug candidates for specific protein targets.

License

Notifications You must be signed in to change notification settings

zumermalik/BioSyn-AI-Repurposing-Life

Repository files navigation

BioSyn AI: Repurposing Life

Python 3.10+ PyTorch License: Apache 2.0 Status Code Style

"Repurposing Life through Geometric Deep Learning."

BioSyn AI is an end-to-end generative pipeline designed to imagine novel drug candidates (ligands) that bind to specific protein targets. It combines E(n)-Equivariant Graph Neural Networks (GNNs) for protein structure encoding with 3D Denoising Diffusion Probabilistic Models (DDPMs) for molecule generation.


🧬 Architecture

The pipeline follows a closed-loop generative process:

graph LR
    A[Protein PDB] -->|Ingestion Engine| B(Geometric Graph)
    B -->|GNN Encoder| C[Context Embedding]
    D[Gaussian Noise] -->|Diffusion Model| E{Reverse Process}
    C --> E
    E -->|Denoising| F[3D Atom Cloud]
    F -->|KNN Builder| G[SMILES Candidate]
Loading
  1. Ingestion: TypeScript engine fetches raw PDB/SDF files from biological databases.
  2. Encoder: A GNN extracts geometric features (invariant to rotation/translation) from the protein pocket.
  3. Decoder: A Diffusion model iteratively refines random noise into stable 3D molecular structures conditioned on the protein embedding.
  4. Inference: A robust MoleculeBuilder reconstructs valid chemical graphs from 3D point clouds using K-Nearest Neighbors (KNN) logic.

⚡ Quick Start

Prerequisites

  • Python 3.10+
  • Node.js (v16+)
  • CUDA-enabled GPU (Recommended)

1. Installation

Clone the repository and set up the hybrid environment.

# Clone the repo
git clone [https://github.com/zumermalik/BioSyn-AI-Repurposing-Life.git](https://github.com/zumermalik/BioSyn-AI-Repurposing-Life.git)
cd BioSyn-AI-Repurposing-Life

# Set up Python Environment (Conda recommended for RDKit compatibility)
conda create -n biosyn python=3.10 -y
conda activate biosyn

# Install Core Dependencies
pip install -r requirements.txt

# Install Ingestion Engine (TypeScript)
npm install

2. Run the Pipeline (Zero to Hero)

You can run the entire inference stack with a single command. This will load the pre-trained checkpoint and generate candidates for the target protein 5R82.

# Run Inference
python src/pipeline/inference_pipeline.py

Expected Output:

🧪 Starting BioSyn Inference on cuda...
   >> Target Protein: 5R82.pdb
   >> Loading checkpoint: checkpoints/biosyn_epoch_5.pt
   >> Generating 5 drug candidates...
      🔹 Candidate 1: CC(=O)Nc1ccc(O)cc1
      🔹 Candidate 2: CN1C=NC2=C1C(=O)N(C(=O)N2C)C
✅ Generation Complete. 5 candidates saved to results/


📂 Project Structure

BioSyn-AI-Repurposing-Life/
├── configs/              # Hyperparameter Configuration
├── data/                 # Data Storage
│   ├── external/         # External Databases (PDBBind/CrossDocked)
│   ├── processed/        # PyTorch Geometric Tensors
│   └── raw/              # Original PDB/SDF Files
├── notebooks/            # Jupyter Prototyping Environments
├── src/                  # Source Code
│   ├── chemistry/        # RDKit Logic & Molecule Builders
│   ├── ingestion/        # TypeScript/Python Data Fetchers
│   ├── models/           # GNN Encoder & Diffusion Decoder
│   ├── pipeline/         # Training & Inference Orchestration
│   ├── utils/            # Utility Functions
│   ├── __init__.py       # Package Initialization
│   └── main.py           # Main Application Entry Point
├── tests/                # Unit Tests
├── environment.yml       # Conda Environment Definition
├── LICENSE               # Apache 2.0 License
├── package.json          # Node.js Dependencies
├── pyproject.toml        # Python Packaging Configuration
├── README.md             # Project Documentation
├── requirements.txt      # Python Dependencies
├── ROADMAP.md            # Future Development Plans
└── tsconfig.json         # TypeScript Configuration

🛠️ Development & Testing

We use pytest for unit testing the geometric logic and chemical validity.

# Run the full test suite
pytest tests/

🤝 Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Contribution Standards

  • Code Style: Please use black for Python formatting.
  • Testing: Ensure all new modules have accompanying tests in tests/.
  • Data: Do not commit large datasets (PDB/SDF files) to Git. Use the data/ folder.

📜 Citation & License

This project is licensed under the Apache 2.0 License. If you use this architecture in your research, please link back to this repository.


Maintained by the Builders.

About

BioSyn AI is an end-to-end generative pipeline that uses E(n)-Equivariant Graph Neural Networks and 3D Diffusion Models to hallucinate novel drug candidates for specific protein targets.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published