Nanoporcini

Nanoporcini is an experimental pipeline for long-read metabarcoding of fungi with Oxford Nanopore Technologies sequencing.

Features

Nanoporcini is written in the Nextflow workflow language and features:

Container support for Docker or Singularity.
Configurable quality filtering with cutadapt and chopper.
Full ITS region extraction with itsxpress.
Chimera detection with VSEARCH
A choice of clustering approaches:
- VSEARCH
- Custom implementation of NanoCLUST
Taxonomic classifications using dnabarcoder

Requirements

Nextflow v23.10.0 or later
A container platform. Either:
- Docker
- or Singularity
A network connection to download dependencies
A reference database containing ITS sequences
- UNITE+INSD 2024

Quick start

Run with minimal required parameters

nextflow run https://github.com/aringeri/nanoporcini \
  --input "path/to/data/sample*.fastq.gz" \
  --outdir "nanoporcini_output/" \
  --primers.fwd "AACTTAAAGGAATTGACGGAAG" \ 
  --primers.rev_rc "GGTAAGCAGAACTGGCG" \
  --chimera_filtering.ref_db "path/to/unite/db.fasta" \
  --taxonomic_assignment.dnabarcoder.ref_db "path/to/unite/db.fasta" \
  --taxonomic_assignment.dnabarcoder.ref_classifications "path/to/unite/db.classification" \
  --taxonomic_assignment.dnabarcoder.cutoffs "path/to/unite/db/cutoffs.best.json"

This example uses NS5 forward primer and LR6 reverse primer (see https://unite.ut.ee/primers.php).

Test run with example data

A small example can be run with two samples and 20 reads each. (see tests/test_data/example/fq). This takes ~25 minutes to run on a 4 core macbook pro with 16GB RAM.

First clone the repository and move into the nanoporcini directory:

git clone https://github.com/aringeri/nanoporcini
cd nanoporcini

Download the UNITE reference database using the command below or do it through your browser at https://zenodo.org/records/12580255. This may take a while:

mkdir -p data/db/unite2024/
curl https://zenodo.org/api/records/12580255/files-archive -o data/db/unite2024/unite2024.zip
unzip data/db/unite2024/unite2024.zip -d data/db/unite2024

mkdir -p data/db/dnabarcoder/
curl https://raw.githubusercontent.com/vuthuyduong/dnabarcoder/refs/heads/master/data/UNITE_2024_cutoffs/unite2024ITS.unique.cutoffs.best.json \
  -o data/db/dnabarcoder/unite2024ITS.unique.cutoffs.best.json

Run the examples:

nextflow run main.nf \
  -params-file "tests/test_data/example/example-params.yml" \
  --outdir "nanoporcini_output/" \
  -c conf/envs/local.config

If you are running from a HPC environment change the -c conf/envs/local.config line to -c conf/envs/cluster.config. This will allow the pipeline to use more resources and execute tasks using the Slurm workload manager.

Configuration

Many of the pipeline parameters can be configured to suit your needs. I recommend using creating a yaml file to pass parameters to the pipeline. See conf/params.yml for an example configuration file that can be given to the run command with the -params-file option:

nextflow run https://github.com/aringeri/nanoporcini \
  --input "path/to/data/sample*.fastq.gz" \
  -params-file conf/params.yml \
  --outdir "nanoporcini_output/"

Inputs

Samples

This pipeline is expecting ONT sequences to be basecalled and demultiplexed already. I recommend Dorado for these steps.

input - Required
- type - string
- A set of fastq files. One file per sample. File names will be used to identify samples. Uses glob syntax (*) to select multiple files.
- ex) "path/to/input/data/*.fastq.gz"

Primer sequences

primers.fwd
- type - string
- The forward primer sequence in 5' to 3' orientation.
- ex) "AACTTAAAGGAATTGACGGAAG" for NS5 primer (see https://unite.ut.ee/primers.php)
primers.rev_rc
- type - string
- The reverse primer sequence (which has been reverse complemented).
- ex) "GGTAAGCAGAACTGGCG" for LR6 primer (see https://unite.ut.ee/primers.php)

Outputs

Specify the output directory with:

outdir
qc_plot_sample_level

Quality Filtering

FULL_ITS:
- minQualityPhred: 20
- minLength: 300
- maxLength: 6000
chimera_filtering:
- ref_db

Clustering

methods
- VSEARCH
  - min cluster size
- NanoCLUST
  - min cluster sizes

Taxonomic Assignments

dnabarcoder
- ref_db
- ref_classifications
- cutoffs

Customising Resource Use (CPU/RAM)

For controlling the number of threads or RAM usage of various tasks see the configuration files in conf/envs. These will allow you to use the available resources on your system and improve the pipeline runtime. Pass this to the run command with the -c option:

nextflow run https://github.com/aringeri/nanoporcini \
  --input "path/to/data/sample*.fastq.gz" \
  -c conf/envs/local.config \
  --outdir "nanoporcini_output/"

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
bin		bin
conf		conf
data/fq		data/fq
lib		lib
modules		modules
pipelines		pipelines
scripts		scripts
templates		templates
tests		tests
workflows		workflows
.gitignore		.gitignore
.gitmodules		.gitmodules
.nf-core.yml		.nf-core.yml
README.MD		README.MD
main.nf		main.nf
modules.json		modules.json
nextflow.config		nextflow.config
nf-test.config		nf-test.config
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nanoporcini

Features

Requirements

Quick start

Run with minimal required parameters

Test run with example data

Configuration

Inputs

Samples

Primer sequences

Outputs

Quality Filtering

Clustering

Taxonomic Assignments

Customising Resource Use (CPU/RAM)

Further reading

About

Uh oh!

Releases

Languages

aringeri/nanoporcini

Folders and files

Latest commit

History

Repository files navigation

Nanoporcini

Features

Requirements

Quick start

Run with minimal required parameters

Test run with example data

Configuration

Inputs

Samples

Primer sequences

Outputs

Quality Filtering

Clustering

Taxonomic Assignments

Customising Resource Use (CPU/RAM)

Further reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages