-
Notifications
You must be signed in to change notification settings - Fork 42
Speedup For EDistance-like distances #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #880 +/- ##
==========================================
- Coverage 73.54% 71.88% -1.67%
==========================================
Files 48 48
Lines 5613 5733 +120
==========================================
- Hits 4128 4121 -7
- Misses 1485 1612 +127
🚀 New features to boost your workflow:
|
|
Ok I found out why the test fail. I checked myself and when I directly run this cell on the main branch I get import matplotlib.pyplot as plt
import numpy as np
import pertpy as pt
import scanpy as sc
from seaborn import clustermap
adata = pt.dt.distance_example()
obs_key = "perturbation" # defines groups to testTurns out the notebook was giving this invalid if f"{self.obsm_key}_{self.cell_wise_metric}_predistances" not in adata.obsp:
self.precompute_distances(adata, n_jobs=n_jobs, **kwargs)while it hits in newer implementation because it doesn't precompute the whole distance matrices in this cell anymore distance = pt.tl.Distance(metric="euclidean", obsm_key="X_pca")
df = distance.pairwise(adata, groupby=obs_key)So https://github.com/scverse/pertpy-tutorials/blob/main/distances.ipynb would need updating. That's why I am against kwargs and even if they are used all of the items in there should be checked if they are being passed or not. |
Yes, I agree with you. We can happily change that.
Are you willing to make this change or do you want me to do it? Ideally, we'd include the update commit in this PR. Thank you! |
Zethson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very very much for your work!
I really appreciate that you took the time to make the code much more explicit and clearer.
Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>
Signed-off-by: Lukas Heumos <lukas.heumos@posteo.net>
…pertpy into speedup/edistance
New numba kernels that speed up distances that are based on the mean.
Here are my local benchmark results. The
pairwise_meandoesn't speed-up with numba much but it's more memory efficient so I will keep it that way. We can maybe get into discussion about why sklearn mean pairwise is faster in a topic. Here is the benchmark script: https://gist.github.com/selmanozleyen/f78e2f87661348615bad03485935fcf0Update: with fastmath its faster