Pipeline for Calling Mitochondrial Homoplasmies and Heteroplasmies
A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data, Battle et. al, NAR 2022
bwa 0.7.17, minimap2 2.28, htslib 1.21, samtools 1.21, bcftools 1.21, bedtools 2.31.1, fastp 0.24.0, samblaster 0.1.26, gatk Mutect2 4.6.0.0, mutserve 2.0.0-rc15, freebayes 1.3.6, VarScan 2.4.6, Clair3 1.1.1, ClairS-TO 0.4.2, deepvariant 1.10.0-beta, deepsomatic 1.9.0, gridss 2.13.2, haplogrep v2.4.0, haplocheck 1.3.3, R 4.3.0, plink2 2.0
# install git
sudo apt-get -y update
sudo apt-get install -y git
# clone and init MitoHPC2
git clone https://github.com/dpuiu/MitoHPC2.git
cd MitoHPC2/scripts
export HP_SDIR=`pwd -P`
. ./init.sh
# install system prerequisites (if admin or checkInstall.sh below fails)
sudo ./install_sysprerequisites.sh
# install prerequisites and check install
./install_prerequisites.sh
./checkInstall.sh
For additional information check MitoHPC README
The pipleine has been updated so that all MitoHPC2/RefSeq/*.{vcf,bed}.gz files are used for annotation. If you have any custom annotation files you would like to use, just copy them to MitoHPC2/RefSeq/ Make sure the VCF/BED files are gzipped and indexed.
Check MitoHPC README first !!!
Copies of the Clair3, ClairS-To, DeepVarian, DeepSomatic SIF files can be found at ftp
Copies of the HPRC samples BAM alignments can be found at ftp
# copy init file to work directory
cp $HP_SDIR/init.sh .
# edit SNV caller if needed
nano ./init.sh
...
export HP_M=mutect2 # mutect2,mutserve,freebayes,varscan
# init
. ./init.sh
# run
$HP_SDIR/run.sh | tee run.all.sh | bash
# check output
ls $HP_ODIR/mutect2.*
In addition, one can run multiple(3) SNV callers and merge the results. Only the SNV called by at least 2 the metods make it into the final/merged set.
# init
cp $HP_SDIR/init3.sh .
cat init3.sh
...
export HP_M1=mutect2
export HP_M2=varscan
export HP_M3=freebayes
export HP_M=merge3
. ./init3.sh
# run
$HP_SDIR/run3.sh | tee run.all.sh | bash
# output
ls $HP_ODIR/merge3.*
# copy init file to work directory
cp $HP_SDIR/init.hifi.sh .
# edit init.lr.sh; set the SNV caller
cat init.hifi.sh
...
export HP_M=deepsomatic # varscan,bcftools,clair3,clairs-to,deepvariant,deepsomatic
export HP_PLATFORM="hifi_revio" # used by clairs-to
export HP_MODELTYPE="PACBIO_TUMOR_ONLY" # used by deepsomatic
# init
. ./init.hifi.sh
# copy init file to work directory
cp $HP_SDIR/init.ont.sh .
# edit init.lr.sh; set the SNV caller
cat init.ont.sh
...
export HP_M=clairs-to # varscan,bcftools,clair3,clairs-to,deepvariant,deepsomatic
export HP_PLATFORM="ont_r10_dorado_sup_5khz" # used by clairs-to ; ont_r10_dorado_sup_4khz, ont_r10_dorado_hac_4khz, ont_r10_dorado_sup_5khz, ont_r10_dorado_sup_5khz_ss, ont_r10_dorado_sup_5khz_ssrs
export HP_MODELTYPE="ONT_TUMOR_ONLY" # used by deepsomatic
# init
. ./init.ont.sh
# run
$HP_SDIR/run.lr.sh | tee run.all.sh | bash
# check output
ls $HP_ODIR/.*
# evaluate results(compare to Illumina mutect2 T=10)
$HP_SDIR/eval.sh Illumina/out/mutect2.10.concat.vcf $HP_M.10.concat.vcf | uniq.pl | column -t > $HP_M.10.eval
- Haplogroups A-Z
wgsim -N 111000 -1 150 -2 150 -d 300 ...
- 500x cvg of Illumina paired end reads (2*150bp) were simulated using the best HiFi10 raeds (avg Q30)
- commands: wgsim -N 22200 -1 150 -2 150 -d 300 -e 0 -r 0 -R 0
- There reads were processed using the MitoHPC2 pipeline ("mutect2" SNV caller)
10.10
- 500x cvg of Illumina paired end reads (2*150bp) were simulated using the best ONT10 raeds (avg Q20)
- commands: wgsim -N 22200 -1 150 -2 150 -d 300 -e 0 -r 0 -R 0
- There reads were processed using the MitoHPC2 pipeline ("mutect2" SNV caller)
10.10
Note: 1st number: short read heteroplasmy thold; 2nd: long read heteroplasmy thold