Skip to content

Code and data for EssSubgraph: improves performance and generalizability of mammalian essential gene prediction with large networks.

License

Notifications You must be signed in to change notification settings

wenmm/EssSubgraph

Repository files navigation

EssSubgraph: An inductive representation learning method that integrates graph-structured network data with omics features

License: GPL v3

EssSubgraph is a predictive framework designed to identify essential genes in mammals by integrating gene expression data with large-scale biological networks. The core idea is to extract subgraphs related to gene essentiality from multi-layer interaction networks and apply graph neural networks to learn informative representations for prediction.

The following depicts a broad overview over the EssSubgraph method.

Overview of the EMOGI method

Installation & Dependencies

The code is written in Python 3 and was mainly tested on Python 3.8 and a Linux OS but should run on any OS that supports python and pip. Training is faster on a GPU.

EssSubgraph has the following dependencies:

  • Numpy
  • Pandas
  • torch
  • Networkx
  • scipy
  • seaborn
  • scikit-learn
  • torch-geometric

Build conda environment

conda create --name py38 -c conda-forge  python=3.8
conda activate py38

Dependencies can be installed using the following command:

pip install -r requirements.txt

Reproducibility

Network Preprocessing

EssSubgraph was tested with 7 different protein-protein interaction (PPI )networks, namely:

The network was constructed using the tutorial from Network Evaluation Tools.

Node feature Preprocessing

The gene expression data (TCGA RNA-Seq normalized RSEM data) was obtained from Albino Bacolla.

python generate_pca.py

Dataset build

python build_dataset_container.py \
    --network ./data/string_net.txt \
    --essential ./data/Essential_genes \
    --nonessential ./data/Non_essential_genes \
    --features ./data/cancer_full_expression_pc50.csv \
    --output esssubgraph_human_pc50_string.pkl

The detailed descriptions about the arguments are as following:

Parameter name Description
--network Path to the network file (e.g., /path/to/string_net.txt). Specifies the gene interaction network to process.
--essential Path to the essential genes file (e.g., ../data/Essential_genes). Lists genes critical for cell survival.
--nonessential Path to the non-essential genes file (e.g., ../data/Non_essential_genes). Lists non-critical genes.
--features Path to the gene feature CSV file. Contains node feature data (e.g., gene expression PC50 features).
--output Output pickle file name for the PyTorch Geometric dataset. The network name is appended (e.g., esssubgraph_human_pc50_string.pkl).

Usage

python EssSubgraph.py --epochs 200 --device 0 --dataset ./data/esssubgraph_human_pc50_string.pkl

The detailed descriptions about the arguments are as following:

Parameter name Description of parameter
--dataset The path of the input pkl file
--epochs Number of epochs to train the model (defaults to 200)
device Device id of gpus (defaults to 0)

Docker Setup

To ensure reproducibility, build and run the project with Docker:

#Build Docker Image
docker build -t esssubgraph .

#Run Docker Container
docker run -it -v $(pwd):/app esssubgraph

Benchmark Models

To reproduce performance comparisons with other models, scripts under /baseline can be used.

License

GNU General Public License v3.0 (see LICENSE).

Contact

If you have any questions, feel free to contact me through Email (dal462929@utdallas.edu) or Github issues. Pull requests are highly welcome!

About

Code and data for EssSubgraph: improves performance and generalizability of mammalian essential gene prediction with large networks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •