-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the BitSeqVB_benchmarking wiki!
This set of script files can be used in order to replicate the simulation analysis presented in Section 3.1 (inference accuracy on synthetic data) of the BitSeqVB manuscript [1].
The following software is required:
- Anaconda Python Distribution
- BitSeq (version 0.7.0 or higher)
- Bowtie 2 (version 2.1.0 or higher).
- Cufflinks (version 2.1.1 or higher).
- eXpress (version 1.5.1 or higher)
- R with the following libraries: Genomic features, Rsamtools, Casper, parallel.
- RSEM (version 1.2.15 or higher)
- Sailfish (version 0.6.3 or higher)
- Samtools (version 0.1.18 or higher).
- Spanki simulator
- Tigar 2
- Tophat (version 2.0.9 or higher).
The gcc compiler (4.8.2 release or higher) should also be available in your machine.
This analysis is based on the UCSC/hg19 reference annotation (download link ~ 21GB). After downloading the annotation, follow the instructions written in the simulationScripts/README file. The main jobscript is written in the commented file commands.sh, consisting of the following steps:
- Choose dataset (4 simulation scenarios)
- Generate RPK values
- Simulate fastq files with spanki.
- Align reads with bowtie
- Align reads with tophat
- Run BitSeqMCMC
- Run BitSeqVB
- Run Casper
- Run Cufflinks
- Run RSEM
- Run Sailfish
- Run Tigar2
- Run eXpress
- Produce graphs
For a reasonable computing time the user should split the jobscript into parallel ones according to the instructions given in filecommands.sh.
This downstream analysis was processed using the linux operating system on the High Performance Computing cluster (CSF) at the University of Manchester. The user has to make sure that at least 2.5T of free disk space is available.
- J Hensman, P Papastamoulis, P Glaus, A Honkela, M Rattray (2014). Fast and accurate approximate inference of transcript expression from RNA-seq data. arXiv preprint arXiv:1412.5995