-
Notifications
You must be signed in to change notification settings - Fork 6
Closed
Description
ISSUE
Hi,
There seems to be a bug in how PEKA is executed in the pipeline: the jobs are mixing up input files, using peak files from one sample and crosslink files from another. For example:
[...]
TMPDIR=$(pwd)/tmp peka \
-i SAMPLE-1_R1_genome.clippy._rollmean10_minHeightAdjust1.0_minPromAdjust1.0_minGeneCount5_Peaks.bed \
-x SAMPLE-2_R1.genome.xl.bed \
-g GRCh38.primary_assembly.genome.fa \
-gi GRCh38.primary_assembly.genome.fa.fai \
-r gencode_filtered_regions_genicOtherfalse.resolved.gtf
[...]
SUGGESTED SOLUTION:
I think is an issue with the way the CLIPPY_GENOME.out.peaks and ch_genome_crosslink_bed channels are passed separately into the PEKA module, without being joined on their metadata. So nextflow seems to combine whichever files are available first, instead of proper sample-matched pairs. I believe that Joining the channels beforehand should fix the issue. Suggested change:
Currently:
/*
* MODULE: Run peka on genome-level crosslinks
*/
PEKA (
CLIPPY_GENOME.out.peaks,
ch_genome_crosslink_bed,
ch_fasta.collect{ it[1] },
ch_fasta_fai.collect{ it[1] },
ch_regions_resolved_gtf.collect{ it[1] }
)
ch_versions = ch_versions.mix(PEKA.out.versions)
Suggested fix:
/*
* MODULE: Run peka on genome-level crosslinks
*/
ch_peka_input = CLIPPY_GENOME.out.peaks
.join(ch_genome_crosslink_bed, by: 0)
PEKA (
ch_peka_input.map { meta, peaks, crosslinks -> [meta, peaks] },
ch_peka_input.map { meta, peaks, crosslinks -> [meta, crosslinks] },
ch_fasta.collect{ it[1] },
ch_fasta_fai.collect{ it[1] },
ch_regions_resolved_gtf.collect{ it[1] }
)
ch_versions = ch_versions.mix(PEKA.out.versions)
Best
Lucia
PS: I’ve submitted a pull request to address this issue: #42
Metadata
Metadata
Assignees
Labels
No labels