-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Hi:
While examining the abundance estimation step in the compute_abundances_all.py, I noticed that SNP sites are currently filtered such that only positions with observed ALT reads in the sample are retained:
var_reads = pd.merge(df_read_counts, df_AF,
left_on=['position', 'ref', 'base', 'chrom'],
right_on=['POS', 'REF', 'ALT', 'CHROM'],
how='inner')
ref_reads = pd.merge(df_read_counts, df_AF,
left_on=['position', 'ref', 'base', 'chrom'],
right_on=['POS', 'REF', 'REF', 'CHROM'],
how='inner')
merged_ref_var = pd.merge(ref_reads.iloc[:, :5], var_reads.iloc[:, :5], on=['chrom','position'], how='inner')
However, all SNP sites observed in the sample—whether showing only REF reads or including ALT reads—can provide information. In particular, sites with only REF reads in the sample may still carry information about other strains that have ALT alleles at that position.
Is this filtering intentional, or could it be a potential bug?
Metadata
Metadata
Assignees
Labels
No labels