Skip to content

Nextflow pipeline for polishing an assembly with short reads and freebayes

License

Notifications You must be signed in to change notification settings

WarrenLab/shortread-polish-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

shortread-polish-nf

Nextflow pipeline for polishing an assembly with short reads and freebayes

Introduction

When using error-prone long reads (e.g., PacBio CLR) to assemble a genome, it is often necessary to use short reads (e.g., Illumina) to error-correct ("polish") the assembly. This repository contains a nextflow pipeline implementing the Vertebrate Genomes Project best practices for using freebayes to polish an assembly (see their github for more information).

Requirements

Data

  • Short reads
  • An assembly created from erroneous long reads

Nextflow

  • nextflow — can be installed with the command curl -s https://get.nextflow.io | bash

Other software

This pipeline is set up to use mamba to create an environment with these programs in it, but you can always install them yourself or use modules. See configuration section below.

Configuration

Nextflow configuration is handled by the file nextflow.config in the directory where you're running the nextflow command. The configuration file in this repository is for running the pipeline on the lewis cluster at Mizzou using SLURM and mamba, but you can adjust it to use any batch or cloud system you want. Check out the nextflow docs for more information.

Lewis-specific stuff

Nextflow needs a filesystem where locking is allowed for keeping track of which jobs are running, but not to actually store the data or temporary files you're creating. On Lewis, HTC allows locking but is slow and HPC does not allow locking but is fast. To take advantage of the best of both worlds, run this pipeline from within a project directory in HTC, but set the environment variable $NXF_WORK to point to an empty directory on HPC.

Running

To download the pipeline and run it on your assembly, just run the command:

nextflow run WarrenLab/shortread-polish-nf \
    --assembly unpolished_assembly.fa \
    --sra SRX1234567

This will download short reads from SRA, align all the reads to your reference, and then use the alignments to correct the assembly. The output will be in consensus/polished.fa, and there will be a report of the number of changes made at consensus/report.txt.

You can also use the option --fastq instead of --sra to polish with local fastq files instead of reads from SRA.

About

Nextflow pipeline for polishing an assembly with short reads and freebayes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published