Skip to content

Processing w/ a combined genome for spike normalization #3

@cmuyehara

Description

@cmuyehara

Hi,

I've been trying to use your pipeline to align samples that have Drosophila spike-ins. Rather than doing sequential alignment, I generated a combined mouse and Drosophila genome w/ the dmel chromosomes in the format "dm6_{chrom}". I didn't recover any signal along the Dmel genome. The problem seems to be that when you filter out rRNA and chrM, you also pass it through grep '_' -v here:

zcat ${TMPDIR}/$j.bed.gz | grep "rRNA\|chrM" -v | grep "_" -v | sort-bed - | gzip > ${TMPDIR}/$j.nr.rs.bed.gz
and
zcat ${TMPDIR}/$j.bed.gz | grep "rRNA\|chrM" -v | grep "_" -v | sort-bed - | gzip > ${TMPDIR}/$j.nr.rs.bed.gz

I edited those lines to remove the grep '_' -v section while still removing the rRNA and chrM reads, and it seems to have fixed the problem. However, I was wondering why that was there. In the mm10 annotation I'm using, none of the chromosomes have '_' in them.

I would also maybe recommend documenting that behavior, as this seems to be a relatively common way of doing spike normalization.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions