Installing the dependencies
pip install -r requirements.txt
There are three input files for the Personalized Bipartite Networks' (PBNs) construction step: the Protein-Protein Interaction (PPI) network edges file (STRING network (v11.5 or v10.5) or DawnRank network), a binary matrix of dysregulated genes (DEGs), and a binary matrix of mutated genes (MUT). Files for each cancer type are located in the data folder.
We employ three different interaction networks in our evaluations; STRING network v11.5, the STRING network v10.5 employed in (Dinstag and Shamir, 2020) and the DawnRank gene interaction network of (Hou and Ma, 2014). The files are located at data/[version]_network.csv
gene1 gene2 score
g1 g2 0.9
g3 g8 0.6
We employ two different KEGG versions (Kanehisa et al., 2020) for the input set of biological pathways, the KEGG Release 101 (denoted as v1) and the KEGG pathways used in Dinstag and Shamir (2020) (denoted as v2). The files are located at data/kegg_pathways_[version].csv
The file is located at data/[cancer]/MUT.csv
p1 p2 ... pn
g1 0 1 ... 1
g2 1 1 ... 0
gx 0 0 ... 1
The file is located at data/[cancer]/DEGs.csv
g1 g2 ... gy
p1 False True ... False
p2 True True ... False
...
Note: we use the R code from Bashashati et al., (2012) to generate the set of DEGs.
There are two input data for the PersonaDrive framework to prioritize mutated genes in Bi network: the generated .gml PBNs' files, and KEGG pathways data retrieved from the supplementary material of Dinstag and Shamir, (2020). The constructed PBNs will be located at graphs/[dataset]/[cancer]_[network]/.
The personalized reference sets are constructed with respect to several relevant reference sets of known cancer genes: Cancer Gene Census (CGC), Network of Cancer Genes (NCG), and CancerMine. Files are located at data/reference_sets/.
For this type of evaluation, for each available cell line, we define a novel reference gene set by compiling the target genes of drugs that are found to be sensitive based on data from GDSC (Yanget al., 2013) and DepMap databases for that cell line. Files are located at data/reference_sets/.
For this type of evaluation, we evaluate the methods based on KEGG and Reactome (Fabregat et al., 2018) enrichment analysis by checking the amounts of overlaps between the pathways enriched significantly in the genes output by some personalized prioritization method and those that are enriched in cell line reference sets constructed from drug sensitivity data.
For more details on the execution parameters please refer to the python files.
- Constructing PBNs:
python constructing_PBNs.py -d TCGA -c COAD -n ST11
- Rank Mutated Genes:
python PersonDrive.py -d TCGA -c COAD -n ST11
- Evaluation
python evaluation.py -d TCGA -c COAD -n ST11
- Enrichment Analysis
$ jupyter notebook
run KEGG_REAC_enrichment_analysis.ipynb
-
The 'constructing_PBNs.py' script will construct the personalized bipartite networks (PBNs).
-
The 'PersonDrive.py' script will output the personalized ranking for each sample in the chosen cancer type and dataset.
-
The 'evaluation.py' script will compute the mean precision, recall and F1 scores and plot them.
The data underlying this article can be accessed at: https://doi.org/10.5281/zenodo.6520187