-
Notifications
You must be signed in to change notification settings - Fork 81
Description
Hello, I am attempting to use PyDESeq2 with scRNA seq data from 4 separate sequencing batches. I am attempting to do comparisons between individual cell types (which have 3 replicates per cell type per batch, derived by pseudobulking data from 3 separate organoids per batch) across batches. However, I also want to do batch correction. I am finding that I cannot do both batch correction and statistical comparisons between different cell types across batches.
From other forum posts regarding DESeq2 and PyDESeq2, I understand that batch effects must be accounted for by including batch in one's design (e.g., design = "~batch + cell_type"). I also understand that batch correction cannot be done before running PyDESeq2 because it requires the raw counts data as an input.
One thought I had was to make a separate column in my adata.obs instance called comparison_group, which combines both batch and cell type (e.g., for astrocytes in batch D250WT, "D250WT_astrocytes"). Then I could either run PyDESeq2 with design = "~comparison_group" or design = "~batch + comparison_group". Unsurprisingly, using design = "~batch + comparison_group" produces a Singular Matrix error, preventing the DESeq from being completed. Using design = "~comparison_group" allows the DESeq to run, but I am concerned that this will not have a comparable effect to simply modeling 'batch' as a covariate by including it in the design, given that there are many different cell types within each batch.
I was also considering subsetting adata prior to DESeq to exclusively include pseudobulked samples belonging to one cell type, then iteratively performing DESeq for each cell type, but I don't believe this is a good solution because its batch correction's effectiveness is dependent upon the batch effect being homogenous across cell types.
Does anyone know how I can do an effective batch correction while also doing a DESeq (including statistical comparisons) between specific cell types (i.e., subsets of each batch) across different batches?
Please let me know if anything needs clarification. In case it is helpful, I have included an example screenshot of my metadata for 2 batches. Thank you very much.
