Nanoporcini is an experimental pipeline for long-read metabarcoding of fungi with Oxford Nanopore Technologies sequencing.
Nanoporcini is written in the Nextflow workflow language and features:
- Container support for Docker or Singularity.
- Configurable quality filtering with cutadapt and chopper.
- Full ITS region extraction with itsxpress.
- Chimera detection with VSEARCH
- A choice of clustering approaches:
- Taxonomic classifications using dnabarcoder
- Nextflow v23.10.0 or later
- A container platform. Either:
- A network connection to download dependencies
- A reference database containing ITS sequences
nextflow run https://github.com/aringeri/nanoporcini \
--input "path/to/data/sample*.fastq.gz" \
--outdir "nanoporcini_output/" \
--primers.fwd "AACTTAAAGGAATTGACGGAAG" \
--primers.rev_rc "GGTAAGCAGAACTGGCG" \
--chimera_filtering.ref_db "path/to/unite/db.fasta" \
--taxonomic_assignment.dnabarcoder.ref_db "path/to/unite/db.fasta" \
--taxonomic_assignment.dnabarcoder.ref_classifications "path/to/unite/db.classification" \
--taxonomic_assignment.dnabarcoder.cutoffs "path/to/unite/db/cutoffs.best.json" This example uses NS5 forward primer and LR6 reverse primer (see https://unite.ut.ee/primers.php).
A small example can be run with two samples and 20 reads each. (see tests/test_data/example/fq). This takes ~25 minutes to run on a 4 core macbook pro with 16GB RAM.
First clone the repository and move into the nanoporcini directory:
git clone https://github.com/aringeri/nanoporcini
cd nanoporcini Download the UNITE reference database using the command below or do it through your browser at https://zenodo.org/records/12580255. This may take a while:
mkdir -p data/db/unite2024/
curl https://zenodo.org/api/records/12580255/files-archive -o data/db/unite2024/unite2024.zip
unzip data/db/unite2024/unite2024.zip -d data/db/unite2024
mkdir -p data/db/dnabarcoder/
curl https://raw.githubusercontent.com/vuthuyduong/dnabarcoder/refs/heads/master/data/UNITE_2024_cutoffs/unite2024ITS.unique.cutoffs.best.json \
-o data/db/dnabarcoder/unite2024ITS.unique.cutoffs.best.jsonRun the examples:
nextflow run main.nf \
-params-file "tests/test_data/example/example-params.yml" \
--outdir "nanoporcini_output/" \
-c conf/envs/local.configIf you are running from a HPC environment change the -c conf/envs/local.config line to -c conf/envs/cluster.config.
This will allow the pipeline to use more resources and execute tasks using the Slurm workload manager.
Many of the pipeline parameters can be configured to suit your needs.
I recommend using creating a yaml file to pass parameters to the pipeline.
See conf/params.yml for an example configuration file that can be given to the run command with the -params-file option:
nextflow run https://github.com/aringeri/nanoporcini \
--input "path/to/data/sample*.fastq.gz" \
-params-file conf/params.yml \
--outdir "nanoporcini_output/"This pipeline is expecting ONT sequences to be basecalled and demultiplexed already. I recommend Dorado for these steps.
input- Required- type -
string - A set of fastq files. One file per sample. File names will be used to identify samples. Uses glob syntax (
*) to select multiple files. - ex)
"path/to/input/data/*.fastq.gz"
- type -
primers.fwd- type -
string - The forward primer sequence in 5' to 3' orientation.
- ex)
"AACTTAAAGGAATTGACGGAAG"for NS5 primer (see https://unite.ut.ee/primers.php)
- type -
primers.rev_rc- type -
string - The reverse primer sequence (which has been reverse complemented).
- ex)
"GGTAAGCAGAACTGGCG"for LR6 primer (see https://unite.ut.ee/primers.php)
- type -
Specify the output directory with:
outdirqc_plot_sample_level
- FULL_ITS:
- minQualityPhred: 20
- minLength: 300
- maxLength: 6000
- chimera_filtering:
- ref_db
- methods
- VSEARCH
- min cluster size
- NanoCLUST
- min cluster sizes
- VSEARCH
- dnabarcoder
- ref_db
- ref_classifications
- cutoffs
For controlling the number of threads or RAM usage of various tasks see the configuration files in conf/envs.
These will allow you to use the available resources on your system and improve the pipeline runtime.
Pass this to the run command with the -c option:
nextflow run https://github.com/aringeri/nanoporcini \
--input "path/to/data/sample*.fastq.gz" \
-c conf/envs/local.config \
--outdir "nanoporcini_output/"This workflow was developed as part of my Masters degree. My thesis can be read online (https://aringeri.github.io/long-read-ITS-metabarcoding-thesis/), and it outlines the details of the pipeline, reasoning behind design choices and the validation approaches taken.