-
Notifications
You must be signed in to change notification settings - Fork 39
Description
What is your question?
Hi there,
First of all, thank you very much for maintaining this repository. I really appreciate the effort that has gone into making GPU-accelerated single-cell analysis possible.
I am writing to ask for advice regarding an issue I encountered when saving results.
Dataset context
• Dataset size: ~880,000 cells
• Workflow involves GPU-based preprocessing (via rapids_singlecell multi GPU workflow)
• One of the layers (e.g. adata.layers["scaled"]) is very large
• According to Dask diagnostics, this layer corresponds to ~94 GB of data
The issue:
When trying to save the AnnData object, both h5ad and zarr formats fail. For example, when writing to h5ad, I encounter errors like:
File ~/miniconda3/envs/rapids_singlecell/lib/python3.13/site-packages/dask/local.py:191, in start_state_from_dask(dsk, cache, sortkey, keys)
189 if task is None:
190 if dependents[key] and not cache.get(key):
--> 191 raise ValueError(
192 "Missing dependency {} for dependents {}".format(
193 key, dependents[key]
194 )
195 )
196 continue
197 elif isinstance(task, DataNode):
ValueError: Missing dependency ('scale_kernel_center-0940b9c96e20102f885e498c537f15dd', 9, 0) for dependents {('store-map-cc0b083db7973a40ae52a6e4cdb83699', 9, 0)}
Error raised while writing key 'scaled' of <class 'h5py._hl.group.Group'> to /layers
I also tried:
• Writing to Zarr instead of H5AD
• adata.layers['scaled'].compute() is not feasible in my case due to memory constraints (the layer alone is ~94 GB).
My question:
What would be the recommended way to store final results in this situation?
Specifically:
• Is it expected that large intermediate layers (e.g. scaled data) should be dropped before saving?
• Is there a recommended pattern for persisting results when working with very large GPU/Dask-backed layers?
• Are there best practices for saving AnnData objects at this scale that I may be missing?
Any guidance on how you would handle this in a production-scale workflow would be greatly appreciated.
Thank you again for your time and for maintaining this project.