YieldFCP

Data and codes for the paper "YieldFCP: Reaction Yield Prediction with Fine-grained 3D Cross-Modal Pre-training to Enhance Generalization".

Requirements

We implement our model on Python 3.9.19. These packages are mainly used:

rdkit                2024.3.3
torch                2.3.1
tensorboard          2.17.0
lightning            2.3.3
pytorch-lightning    2.3.3
salesforce-lavis     1.0.2
unicore              0.0.1
unimol_tools         0.1.0.post1
rxnfp                0.1.0

Datasets

get_coor.py is the example data to generate coordinates of molecules.

Pre-training dataset

We utilize and filter reactions from USPTO and CJHIF. You can download USPTO from https://figshare.com/articles/dataset/Chemical_reactions_from_US_patents_1976-Sep2016_/5104873 and CJHIF from https://github.com/jshmjs45/data_for_chem. Put the final reactions in data/pretraining.

Downstream dataset

We fine-tune our model on three publicly available downstream datasets. Related data and split for the HTE datasets (the Buchwald-Hartwig and the Suzuki-Miyaura reactions) and the real-world ELN dataset are stored in data/downstream/BH, data/downstream/SM, and data/downstream/ELN, respectively.

Experiments

Pre-training

Run pretraining.py to pre-train YieldFCP. For example,

python pretraining.py --max_epochs 10 --batch_size 8 --weight_decay 0.05 --init_lr 1e-4 --min_lr 5e-6

python pretraining.py --max_epochs 10 --batch_size 8 --cls 0 --lm --gtm --strategy_name ddp_find_unused_parameters_true

We will provide the model with full combinations of CSC, CSM, and SG losses in checkpoint.

Fine-tuning

Run finetuning.py to fine-tune YieldFCP on a given downstream dataset. For example,

python finetuning.py --devices 0, --batch_size 128 --ds BH --repeat 10 --max_epochs 150 --ft_type conformer --dropout 0.2 --weight_decay 1e-4 --init_lr 1e-4 --min_lr 1e-5 --check_val_every_n_epoch 1 --warmup_steps 0 --load_model_path checkpoint/True_True/pretraining_epoch=09-step=00570000.ckpt

Attention weights

Run get_attention.py to obtain the fine-grained cross-modal attention weights on the BH dataset. The checkpoint of the fine-tuned model should be stored in res/BH.

python get_attention.py

Citation

@article{shi2025yieldfcp,
  author    = {Shi, Runhan and Yu, Gufeng and Chen, Letian and Yang, Yang},
  title     = {YieldFCP: Enhancing Reaction Yield Prediction via Fine-grained Cross-modal Pre-training},
  journal = {Artificial Intelligence Chemistry},
  volume = {3},
  number = {1},
  pages = {100085},
  year = {2025},
  issn = {2949-7477},
  doi = {https://doi.org/10.1016/j.aichem.2025.100085},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
checkpoint/True_True		checkpoint/True_True
data/downstream		data/downstream
model		model
res/BH/conformer_128_150_0.2_0.0005_5e-05_0.0001_True_True		res/BH/conformer_128_150_0.2_0.0005_5e-05_0.0001_True_True
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
finetuning.py		finetuning.py
get_attention.py		get_attention.py
get_coor.py		get_coor.py
pretraining.py		pretraining.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YieldFCP

Requirements

Datasets

Pre-training dataset

Downstream dataset

Experiments

Pre-training

Fine-tuning

Attention weights

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Meteor-han/YieldFCP

Folders and files

Latest commit

History

Repository files navigation

YieldFCP

Requirements

Datasets

Pre-training dataset

Downstream dataset

Experiments

Pre-training

Fine-tuning

Attention weights

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages