Skip to content

Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches #376

@Willt1128

Description

@Willt1128

Hello, I am attempting to use PyDESeq2 with scRNA seq data from 4 separate sequencing batches. I am attempting to do comparisons between individual cell types (which have 3 replicates per cell type per batch, derived by pseudobulking data from 3 separate organoids per batch) across batches. However, I also want to do batch correction. I am finding that I cannot do both batch correction and statistical comparisons between different cell types across batches.

From other forum posts regarding DESeq2 and PyDESeq2, I understand that batch effects must be accounted for by including batch in one's design (e.g., design = "~batch + cell_type"). I also understand that batch correction cannot be done before running PyDESeq2 because it requires the raw counts data as an input.

One thought I had was to make a separate column in my adata.obs instance called comparison_group, which combines both batch and cell type (e.g., for astrocytes in batch D250WT, "D250WT_astrocytes"). Then I could either run PyDESeq2 with design = "~comparison_group" or design = "~batch + comparison_group". Unsurprisingly, using design = "~batch + comparison_group" produces a Singular Matrix error, preventing the DESeq from being completed. Using design = "~comparison_group" allows the DESeq to run, but I am concerned that this will not have a comparable effect to simply modeling 'batch' as a covariate by including it in the design, given that there are many different cell types within each batch.

I was also considering subsetting adata prior to DESeq to exclusively include pseudobulked samples belonging to one cell type, then iteratively performing DESeq for each cell type, but I don't believe this is a good solution because its batch correction's effectiveness is dependent upon the batch effect being homogenous across cell types.

Does anyone know how I can do an effective batch correction while also doing a DESeq (including statistical comparisons) between specific cell types (i.e., subsets of each batch) across different batches?

Please let me know if anything needs clarification. In case it is helpful, I have included an example screenshot of my metadata for 2 batches. Thank you very much.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions