Skip to content

Code for paper: "Deep Learning Approaches to Molecular Classification Using Voxel-Based Representations"

Notifications You must be signed in to change notification settings

daergoth/MoleculeVoxelCNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Molecule Voxel CNN

Repository for paper "Deep Learning Approaches to Molecular Classification Using Voxel-Based Representations" István Lakatos, András Hajdu and Balázs Harangi.

Abstract:

Accurate molecular classification is a key step in computational drug discovery, toxicological risk assessment, and high-throughput screening. While most machine learning approaches rely on 2D molecular fingerprints, these projections often fail to capture stereochemistry and 3D spatial interactions that critically influence molecular activity. In this study, we propose a voxel-based 3D molecular representation combined with convolutional neural networks (CNNs) for end-to-end molecular classification for the Tox21 toxicity dataset. We systematically compare three architectures: a 2D CNN operating on molecular images, a dense 3D CNN using volumetric grids, and a sparse 3D CNN implemented with the TensorFlow 3D framework. All models are trained under consistent preprocessing and multi-task settings to isolate the effects of molecular representation and network design. The 2D CNN achieves the highest mean ROC–AUC score, followed by the dense 3D CNN and the sparse 3D CNN, indicating that simple voxel occupancy grids provide limited benefit over 2D projections. Moreover, the sparse 3D CNN does not provide substantial computational savings relative to the dense 3D model, despite reducing the number of processed voxels. These results suggest that, while voxel-based CNNs remain viable for toxicity prediction, traditional 2D approaches currently offer a more favorable balance between predictive accuracy and resource efficiency.

Requirements

Install Conda environments, environment.yml files can be found in envs folders:

2D method

  1. Generate 2D images

    Environment: jmol-scripts

    • twodim/dataset_generator/generate25d-universal.py
  2. Pack images into TFRecords

    Environment: molecule36-tf21

    • twodim/tensorflow/pack-data-tfrecords.py
    • twodim/tensorflow/pack-data-tfrecords-multitask.py

    Optionally verify TFRecord files:

    • twodim/tensorflow/check-tfrecord-files.py
    • twodim/tensorflow/check-tfrecord-files-multitask.py
  3. Train 2DCNN

    Environment: molecule39-tf210

    • twodim/tensorflow/Molecule25D-train-small.py
    • twodim/tensorflow/Molecule25D-train-small-multitask.py

3D method

  1. Data preprocess

    Environment: molecule-threedim

    • threedim/moleculenet-tox21-task-preprocess.py
    • threedim/tox21-smiles-to-inchi.py
  2. Generate voxelboxes

    Environment: molecule-threedim

    • threedim/dataset_generator/dataset-generator.py
  3. Convert 3D voxelboxes to 2D images

    Environment: molecule-threedim

    • threedim/dataset_generator/voxel-to-2d-converter.py
  4. Convert 3D voxelboxes to Sparse voxelboxes

    Environment: molecule-threedim

    • threedim/dataset_sparse_converter.py
  5. Train 3D CNN

    Environment: molecule39-tf210

    • threedim/tensorflow/alexnet3d_keras_res294_multitask_training_regularization_binary_multichannel.py
    • threedim/tensorflow/alexnet3d_keras_res294_singletask_training_regularization_binary_multichannel.py
  6. Train Sparse 3D CNN

    Environment: wsl-molecule37-tf23-tf3d

    • threedim/tensorflow/alexnet3d_keras_res294_multitask_training_regularization_binary_multichannel_sparse.py
    • threedim/tensorflow/alexnet3d_keras_res294_singletask_training_regularization_binary_multichannel_sparse.py

About

Code for paper: "Deep Learning Approaches to Molecular Classification Using Voxel-Based Representations"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published