Skip to content

Clustering long-read 18S amplicons #21

@pjmramond

Description

@pjmramond

Hello there
Thanks a lot already for the work on this package!

I am trying to cluster 34,937,058 sequences of about 1000bp (18S amplicons) contained in a single fasta file, I'm using the following code on HPC:

meshclust \
  -d /export/lv6/projects/NIOZ320/Analysis/3.1_Ecological_Analysis/18S_NIOZ320_NIOZ326.fa \
  -o /export/lv6/projects/NIOZ320/Analysis/3.1_Ecological_Analysis/consensus_95/18S_NIOZ320_NIOZ326_cl_0.95.txt \
  -t 0.95 \
  -b 45000 \
  -v 180000

The code has been running for 125 days and was about to finish its 4th run, which I thought would be the last, but a 5th clustering run of the data has started (see screenshot). This last run indicate from the beginning that there are "0 unprocessed sequences" and the number of found centers has been stagnating around 47,900 for quite sometime.

I understand that this is a lot of data and that the error rate of Oxford Nanopore reads probably adds complexity to the clustering algorithm. The amplicons have nevertheless been quality filtered and represent consensuses of several amplicons (pre-clustered based Unique Molecular Identifiers). A previous Meshclust run with a similar approach but 16S data took ~80 days to cluster 33,306,880 amplicons and found 55,715 centers.

My questions are:

  1. Am I doing something wrong here? Can Meshclust support such a computation? ("swarm -d 3" ran faster but clustered only 500K reads).

  2. Is there a way to stop the run at this stage and get the current output (centers and their composition)? Is there a way to predict how many runs will it take Meshclust to give an output?

Any help would be highly appreciated!
Best
Pierre

Capture d’écran 2024-01-18 à 16 12 14

Capture d’écran 2024-01-18 à 16 30 44

Capture d’écran 2024-01-18 à 16 30 59

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions