Skip to content

problem of deduplication #512

@crazysummerW

Description

@crazysummerW

Hello,
I'm having an issue when using sambamba 1.0.0 to deduplicate paired-end sequencing BAM files in NGS data.
After deduplication, the resulting BAM file contains reads that are completely identical, with exactly the same ID and detailed information. This is causing problems in my structural variation (SV) analysis.

What could be the reason for this, and is there a way to resolve it? I would like the deduplicated BAM file to have unique read IDs.

Information of reads before deduplication:
`samtools view sample.sorted.bam chrY|grep E100074100L1C005R0181713731

E100074100L1C005R0181713731 129 chrY 21869316 60 108M34S chr1 181745266 0 CGTCGTGAGCGCATACACAGTGGACACAGGAATTTTGTGTCCCATTCCCACCAGGCTAGCAGTGGAGATGAAGTGAGACTGGGCTTTGGAGAGGTGAGGAGATGGGGCGGCCGAGGGGCCTACGCACCATGCTGCTCGGTCA DDDDDDDCCDDDDDDDDDDDDDDDDDDDDCDDCDDDDCDDCDDCDDDDDDDDDCDDDCCDDCCDCDDDDCDCDDDDCDDDDCCCDDDDCDDCCDDDCDDCDCCCCCCDDCCDB@DCCDCDDCDDCDDDCDCDCDCDDCDCDC NM:i:0MD:Z:108 MC:Z:124M18S AS:i:108 XS:i:51 SA:Z:chr1,181745350,-,40M102S,60,0; RG:Z:DP19786-713309
E100074100L1C005R0181713731 129 chrY 21869316 60 108M34S chr1 181745266 0 CGTCGTGAGCGCATACACAGTGGACACAGGAATTTTGTGTCCCATTCCCACCAGGCTAGCAGTGGAGATGAAGTGAGACTGGGCTTTGGAGAGGTGAGGAGATGGGGCGGCCGAGGGGCCTACGCACCATGCTGCTCGGTCA DDDDDDDCCDDDDDDDDDDDDDDDDDDDDCDDCDDDDCDDCDDCDDDDDDDDDCDDDCCDDCCDCDDDDCDCDDDDCDDDDCCCDDDDCDDCCDDDCDDCDCCCCCCDDCCDB@DCCDCDDCDDCDDDCDCDCDCDDCDCDC NM:i:0MD:Z:108 MC:Z:124M18S AS:i:108 XS:i:51 SA:Z:chr1,181745350,-,40M102S,60,0; RG:Z:DP19786-713309`

Information of reads after deduplication:
`samtools view sample.sorted.dedup.bam chrY|grep E100074100L1C005R0181713731

E100074100L1C005R0181713731 129 chrY 21869316 60 108M34S chr1 181745266 0 CGTCGTGAGCGCATACACAGTGGACACAGGAATTTTGTGTCCCATTCCCACCAGGCTAGCAGTGGAGATGAAGTGAGACTGGGCTTTGGAGAGGTGAGGAGATGGGGCGGCCGAGGGGCCTACGCACCATGCTGCTCGGTCA DDDDDDDCCDDDDDDDDDDDDDDDDDDDDCDDCDDDDCDDCDDCDDDDDDDDDCDDDCCDDCCDCDDDDCDCDDDDCDDDDCCCDDDDCDDCCDDDCDDCDCCCCCCDDCCDB@DCCDCDDCDDCDDDCDCDCDCDDCDCDC NM:i:0MD:Z:108 MC:Z:124M18S AS:i:108 XS:i:51 SA:Z:chr1,181745350,-,40M102S,60,0; RG:Z:DP19786-713309
E100074100L1C005R0181713731 129 chrY 21869316 60 108M34S chr1 181745266 0 CGTCGTGAGCGCATACACAGTGGACACAGGAATTTTGTGTCCCATTCCCACCAGGCTAGCAGTGGAGATGAAGTGAGACTGGGCTTTGGAGAGGTGAGGAGATGGGGCGGCCGAGGGGCCTACGCACCATGCTGCTCGGTCA DDDDDDDCCDDDDDDDDDDDDDDDDDDDDCDDCDDDDCDDCDDCDDDDDDDDDCDDDCCDDCCDCDDDDCDCDDDDCDDDDCCCDDDDCDDCCDDDCDDCDCCCCCCDDCCDB@DCCDCDDCDDCDDDCDCDCDCDDCDCDC NM:i:0MD:Z:108 MC:Z:124M18S AS:i:108 XS:i:51 SA:Z:chr1,181745350,-,40M102S,60,0; RG:Z:DP19786-713309`

Looking forward to your reply.
Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions