EssSubgraph: An inductive representation learning method that integrates graph-structured network data with omics features

EssSubgraph is a predictive framework designed to identify essential genes in mammals by integrating gene expression data with large-scale biological networks. The core idea is to extract subgraphs related to gene essentiality from multi-layer interaction networks and apply graph neural networks to learn informative representations for prediction.

The following depicts a broad overview over the EssSubgraph method.

Installation & Dependencies

The code is written in Python 3 and was mainly tested on Python 3.8 and a Linux OS but should run on any OS that supports python and pip. Training is faster on a GPU.

EssSubgraph has the following dependencies:

Numpy
Pandas
torch
Networkx
scipy
seaborn
scikit-learn
torch-geometric

Build conda environment

conda create --name py38 -c conda-forge  python=3.8
conda activate py38

Dependencies can be installed using the following command:

pip install -r requirements.txt

Reproducibility

Network Preprocessing

EssSubgraph was tested with 7 different protein-protein interaction (PPI )networks, namely:

The network was constructed using the tutorial from Network Evaluation Tools.

Node feature Preprocessing

The gene expression data (TCGA RNA-Seq normalized RSEM data) was obtained from Albino Bacolla.

python generate_pca.py

Dataset build

python build_dataset_container.py \
    --network ./data/string_net.txt \
    --essential ./data/Essential_genes \
    --nonessential ./data/Non_essential_genes \
    --features ./data/cancer_full_expression_pc50.csv \
    --output esssubgraph_human_pc50_string.pkl

The detailed descriptions about the arguments are as following:

Parameter name	Description
`--network`	Path to the network file (e.g., `/path/to/string_net.txt`). Specifies the gene interaction network to process.
`--essential`	Path to the essential genes file (e.g., `../data/Essential_genes`). Lists genes critical for cell survival.
`--nonessential`	Path to the non-essential genes file (e.g., `../data/Non_essential_genes`). Lists non-critical genes.
`--features`	Path to the gene feature CSV file. Contains node feature data (e.g., gene expression PC50 features).
`--output`	Output pickle file name for the PyTorch Geometric dataset. The network name is appended (e.g., `esssubgraph_human_pc50_string.pkl`).

Usage

python EssSubgraph.py --epochs 200 --device 0 --dataset ./data/esssubgraph_human_pc50_string.pkl

The detailed descriptions about the arguments are as following:

Parameter name	Description of parameter
--dataset	The path of the input pkl file
--epochs	Number of epochs to train the model (defaults to 200)
device	Device id of gpus (defaults to 0)

Docker Setup

To ensure reproducibility, build and run the project with Docker:

#Build Docker Image
docker build -t esssubgraph .

#Run Docker Container
docker run -it -v $(pwd):/app esssubgraph

Benchmark Models

To reproduce performance comparisons with other models, scripts under /baseline can be used.

License

GNU General Public License v3.0 (see LICENSE).

Contact

If you have any questions, feel free to contact me through Email (dal462929@utdallas.edu) or Github issues. Pull requests are highly welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
baseline		baseline
data		data
model_files		model_files
models		models
network pertubation		network pertubation
network simulation		network simulation
script		script
unseen nodes		unseen nodes
utils		utils
Dockerfile		Dockerfile
EssSubgraph.py		EssSubgraph.py
LICENSE		LICENSE
README.md		README.md
method_overview.png		method_overview.png
requirements.txt		requirements.txt
test.png		test.png
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EssSubgraph: An inductive representation learning method that integrates graph-structured network data with omics features

Installation & Dependencies

Reproducibility

Network Preprocessing

Node feature Preprocessing

Dataset build

Usage

Docker Setup

Benchmark Models

License

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

wenmm/EssSubgraph

Folders and files

Latest commit

History

Repository files navigation

EssSubgraph: An inductive representation learning method that integrates graph-structured network data with omics features

Installation & Dependencies

Reproducibility

Network Preprocessing

Node feature Preprocessing

Dataset build

Usage

Docker Setup

Benchmark Models

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages