Skip to content

dpuiu/MitoHPC2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MitoHPC2 : Mitochondrial High Performance Caller v2

Pipeline for Calling Mitochondrial Homoplasmies and Heteroplasmies

Citing

A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data, Battle et. al, NAR 2022

Prerequisites

bwa 0.7.17, minimap2 2.28, htslib 1.21, samtools 1.21, bcftools 1.21, bedtools 2.31.1, fastp 0.24.0, samblaster 0.1.26, gatk Mutect2 4.6.0.0, mutserve 2.0.0-rc15, freebayes 1.3.6, VarScan 2.4.6, Clair3 1.1.1, ClairS-TO 0.4.2, deepvariant 1.10.0-beta, deepsomatic 1.9.0, gridss 2.13.2, haplogrep v2.4.0, haplocheck 1.3.3, R 4.3.0, plink2 2.0

INSTALL

 # install git
 sudo apt-get -y update                            
 sudo apt-get install -y git                          

 # clone and init MitoHPC2
 git clone https://github.com/dpuiu/MitoHPC2.git
 cd MitoHPC2/scripts
 export HP_SDIR=`pwd -P`
 . ./init.sh

 # install system prerequisites (if admin or checkInstall.sh below fails)
 sudo ./install_sysprerequisites.sh                

 # install prerequisites and check install
 ./install_prerequisites.sh
 ./checkInstall.sh

For additional information check MitoHPC README

CUSTOM ANNOTATION

The pipleine has been updated so that all MitoHPC2/RefSeq/*.{vcf,bed}.gz files are used for annotation. If you have any custom annotation files you would like to use, just copy them to MitoHPC2/RefSeq/ Make sure the VCF/BED files are gzipped and indexed.

RUN

Check MitoHPC README first !!!

Copies of the Clair3, ClairS-To, DeepVarian, DeepSomatic SIF files can be found at ftp

Copies of the HPRC samples BAM alignments can be found at ftp

SINGLE SNV CALLER (Illumina)

# copy init file to work directory
cp $HP_SDIR/init.sh .

# edit SNV caller if needed
nano ./init.sh 
  ...
  export HP_M=mutect2          # mutect2,mutserve,freebayes,varscan

# init
. ./init.sh

# run
$HP_SDIR/run.sh | tee run.all.sh | bash

# check output
ls $HP_ODIR/mutect2.*

MULTIPLE SNV CALLERS (Illumina)

In addition, one can run multiple(3) SNV callers and merge the results. Only the SNV called by at least 2 the metods make it into the final/merged set.

# init     
cp $HP_SDIR/init3.sh .
cat init3.sh
  ...
  export HP_M1=mutect2   
  export HP_M2=varscan
  export HP_M3=freebayes
  export HP_M=merge3

. ./init3.sh

# run
$HP_SDIR/run3.sh | tee run.all.sh | bash       

# output
ls $HP_ODIR/merge3.*

SNV CALLER IINIT PacBio HiFi

# copy init file to work directory
cp $HP_SDIR/init.hifi.sh .

# edit init.lr.sh; set the SNV caller
cat init.hifi.sh
  ...
  export HP_M=deepsomatic                       # varscan,bcftools,clair3,clairs-to,deepvariant,deepsomatic
  export HP_PLATFORM="hifi_revio"		    # used by clairs-to
  export HP_MODELTYPE="PACBIO_TUMOR_ONLY"       # used by deepsomatic

# init
. ./init.hifi.sh

SNV CALLER INIT ONT

# copy init file to work directory
cp $HP_SDIR/init.ont.sh .

# edit init.lr.sh; set the SNV caller
cat init.ont.sh
  ...
  export HP_M=clairs-to                         # varscan,bcftools,clair3,clairs-to,deepvariant,deepsomatic
  export HP_PLATFORM="ont_r10_dorado_sup_5khz"  # used by clairs-to ; ont_r10_dorado_sup_4khz, ont_r10_dorado_hac_4khz, ont_r10_dorado_sup_5khz, ont_r10_dorado_sup_5khz_ss, ont_r10_dorado_sup_5khz_ssrs
  export HP_MODELTYPE="ONT_TUMOR_ONLY"	    # used by deepsomatic

# init
. ./init.ont.sh

SNV CALLER LONG READS

# run
$HP_SDIR/run.lr.sh | tee run.all.sh | bash

# check output
ls $HP_ODIR/.*

# evaluate results(compare to Illumina mutect2 T=10)             
$HP_SDIR/eval.sh Illumina/out/mutect2.10.concat.vcf $HP_M.10.concat.vcf | uniq.pl | column -t > $HP_M.10.eval

Examples

30 Simulated samples

  • Haplogroups A-Z
    wgsim -N 111000 -1 150 -2 150 -d 300 ...

39 HPRC samples

HiFi10

ONT10

39 HPRC bases Simulated samples

HiFi10Sim

  • 500x cvg of Illumina paired end reads (2*150bp) were simulated using the best HiFi10 raeds (avg Q30)
  • commands: wgsim -N 22200 -1 150 -2 150 -d 300 -e 0 -r 0 -R 0
  • There reads were processed using the MitoHPC2 pipeline ("mutect2" SNV caller)
    10.10

ONT10Sim

  • 500x cvg of Illumina paired end reads (2*150bp) were simulated using the best ONT10 raeds (avg Q20)
  • commands: wgsim -N 22200 -1 150 -2 150 -d 300 -e 0 -r 0 -R 0
  • There reads were processed using the MitoHPC2 pipeline ("mutect2" SNV caller)
    10.10

Note: 1st number: short read heteroplasmy thold; 2nd: long read heteroplasmy thold

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published