-
Notifications
You must be signed in to change notification settings - Fork 81
Add Relative Log Expression (RLE) Plots #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
BorisMuzellec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR @jonathjd !
I just have one suggestion to make: unless I missed something,I think we should avoid re-computing size factors when we already have them.
Happy to merge once this is fixed!
| if normalize: | ||
| print("Plotting normalized RLE plot...") | ||
| geometric_mean = np.exp(np.mean(np.log(count_matrix + 1), axis=0)) | ||
| size_factors = np.median(count_matrix / geometric_mean, axis=1) | ||
| count_matrix = count_matrix / size_factors[:, np.newaxis] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is recomputing median-of-ratios size factors, right? (up to the +1 in the log)
In that case, it might be better to add a size_factors argument to make_rle_plot.
From there, in dds.plot_rle, if normalize=True, we would first check whether size factors (self.obsm["size_factors"]) were already computed. If so, we pass them directly. If not, we call self.fit_size_factors first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @BorisMuzellec ,
For some reason when I use the size factors computed in dds.fit_size_factors() I do not get an RLE plot with the sample medians centered around 0.
But when I compute the sizefactors internally I do get an RLE plot with the sample medians centered around 0
Do you know what could be causing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we would expect the sample medians to be zero given that RLE plot subtracts the gene medians. Also, I would be wary of making any conclusion from the test data.
That being said what you wrote is what is implemented in R's plotRLE method (https://rdrr.io/github/davismcc/scater/src/R/plotRLE.R). If it's standard, I'm happy to keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathjd I think we may still have some consistency issues though: if we're using log(counts + 1) for size factors, shouldn't we also do the same everywhere (gene medians, and plotting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BorisMuzellec Happy holidays! Sorry for the delay, I was on vacation. It's probably a good idea to add the psuedocount to the gene_median calculations so we don't divide by a gene with 0 counts across all samples. I can add that now.
I added the psuedocount and ran some tests and the psuedocount alters the gene medians distance from the sample medians and skews the plot significantly.
|
Is there anything specific holding this back? |


Reference Issue or PRs
Issue Reference: #320
What does your PR implement? Be specific.
The Relative Log Expression (RLE) is a useful diagnostic plot to visualize the differences in count distributions between samples. The x-axis is each sample from the count matrix and the y-axis is the log difference between each gene and the median expression of that gene across all samples.
Given:
gene_ijis the expression of genejin sampleimedian_jis the median expression of genejacross all samplesThe RLE for gene
jin sampleiis calculated as:Where:
gene_{ij}is the count of genejin sampleimedian_jis the median count of gene j across all samplesThis issue takes in the raw counts
self.X, anormalizeboolean,design_matrix.indexfor the sample_ids, and asave_pathand produces an RLE plot.The normalize boolean is set to
Falseby default but can be set toTrueto normalize the raw counts before plottingThis example was produced with the synthetic data in the
./datasets/dir