DNA Sequence Optimizer

This tool helps optimize overlapping gene sequences, inspired by a project in "How to Grow Almost Anything" where we needed to update a protein sequence using AI. However, the sequence overlaps with another sequence.

Problem

When two genes overlap in DNA, changing one sequence can break the other. This tool updates codons to keep a base sequence the same and minimize errors with the new sequence.

Simple Demonstration

Suppose in your base sequence you have AAT, the new sequence you have created has a TAT, the two sequences are offset by 1. Such that the AT from AAT overlaps with the TA of TAT. This would not work since AT & TA are different. However, the algorithm determines that AAT is Asparagine & TAT is a Tyrosine. The algorithm then searches all combinations of Asparagine against Tyrosine offset by 1 to find a way where they can match. It finds that if it switches the base sequence to AAC and the new sequence to ACT then the sequences stay the same in protein space and the resulting sequence is AACT.

Additional Features

similar switches: If it cannot find another way to code the amino acid to keep the sequences the same in protein space, as a fallback, the algorithm will try other amino acids of the same "type" for instance if in the new sequence you have an Alanine and even with GAT, GCC, GCA, GCG it cannot find a way to align with the original sequence, it will try all the other hydrophobic amino acids

Example Usage

For the htgaa final project I was looking to bind the lysis protein to another protein (as opposed to Dnaj): https://spotless-bongo-449.notion.site/Group-Project-1af905f53b4b8071864cc252b8b00a7e?pvs=4 This meant updating the N terminus of the lysis protein. I used pepMLM https://colab.research.google.com/drive/1u0i-LBog_lvQ5YRKs7QLKh_RtI-tV8qM?usp=sharing#scrollTo=VtfbXYndhyle to generate binders which can be found in test.csv. I then ran all the binders through the protein optimizer and selected the peptide with the lease amount of errors.

Usage

Run the program interactively:

python main.py

You'll be prompted for:

DNA sequence: The original DNA sequence
Gene A start position: Where the first gene starts
Gene A end position: Where the first gene ends
Gene B start position: Where the second gene starts
Gene B end position: Where the second gene ends
New Gene B sequence: The new sequence you want for Gene B

Example

From our test case:

Original sequence A:

DNA: GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA
protein: DGNPIPSAIAANSGIY

New sequence B:

DNA: ATGGCGTGGACCAGCATTTATGAACTGGATGCGCTGAACAACTGCCGTAAAGGTCAGCGCCAGGCCGTGGGCAGCAGCCGCCGCTGCCGCCGCCAGCAGCGTAGCAGCACCCTGTACGTGCTGATTTTTCTGGCGATTTTTCGAGCAAATTTACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACCGTGACCACCCTGCAGCAGCTGCTGACCTGA
protein: MAWTSIYELDALNNCRKGQRQAVGSSRRCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

The sequences are offset by 1

Result:

DNA: GATGGCAACCCGATCCCCTCAGCAATTGCAGCAAACTCCGGCATCTACTAAGGGTCAGCGCCAGGCCGTGGGCAGCAGCCGCCGCTGCCGCCGCCAGCAGCGTAGCAGCACCCTGTACGTGCTGATTTTTCTGGCGATTTTTCTGAGCAAATTTACCAACCAGCTGCTGCTGAGCCTGCTGGAAGCGGTGATTCGCACCGTGACCACCCTGCAGCAGCTGCTGACCTGA
Protein: DGNPIPSAIAANSGIYGSAPGRGQQPPLPPPAAQHPVRADFSGDFSEQIYQPAAAEPAGSGDSHRDHPAAAADL
Protein offset by 1: MATRSPQQLQQTPASTKGQRQAVGSSRRCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

As you can see the original protein A's protein sequence has not changed but the codons have in order to minimize the errors in the new sequence:

Sequence A protein: DGNPIPSAIAANSGIY
Sequence A protein in result: DGNPIPSAIAANSGIY
Sequence A DNA: GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA
Sequence A DNA in result: GATGGCAACCCGATCCCCTCAGCAATTGCAGCAAACTCCGGCATCTACTAA

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
binder_results.csv		binder_results.csv
find_binder.py		find_binder.py
main.py		main.py
overlap.png		overlap.png
readme.md		readme.md
requirements.txt		requirements.txt
result.txt		result.txt
test-1.csv		test-1.csv
test-2.csv		test-2.csv
test-3.csv		test-3.csv
test.csv		test.csv
test_main.py		test_main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DNA Sequence Optimizer

Problem

Simple Demonstration

Additional Features

Example Usage

Usage

Example

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ethanitovitch/protein-optimizer

Folders and files

Latest commit

History

Repository files navigation

DNA Sequence Optimizer

Problem

Simple Demonstration

Additional Features

Example Usage

Usage

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages