[QST] Question about how to save the files

**What is your question?**
Hi there, 

First of all, thank you very much for maintaining this repository. I really appreciate the effort that has gone into making GPU-accelerated single-cell analysis possible.

I am writing to ask for advice regarding an issue I encountered when saving results.

Dataset context
	•	Dataset size: ~880,000 cells
	•	Workflow involves GPU-based preprocessing (via rapids_singlecell multi GPU workflow)
	•	One of the layers (e.g. adata.layers["scaled"]) is very large
	•	According to Dask diagnostics, this layer corresponds to ~94 GB of data

The issue: 
When trying to save the AnnData object, both h5ad and zarr formats fail. For example, when writing to h5ad, I encounter errors like:
File ~/miniconda3/envs/rapids_singlecell/lib/python3.13/site-packages/dask/local.py:191, in start_state_from_dask(dsk, cache, sortkey, keys)
    189 if task is None:
    190     if dependents[key] and not cache.get(key):
--> 191         raise ValueError(
    192             "Missing dependency {} for dependents {}".format(
    193                 key, dependents[key]
    194             )
    195         )
    196     continue
    197 elif isinstance(task, DataNode):

ValueError: Missing dependency ('scale_kernel_center-0940b9c96e20102f885e498c537f15dd', 9, 0) for dependents {('store-map-cc0b083db7973a40ae52a6e4cdb83699', 9, 0)}
Error raised while writing key 'scaled' of <class 'h5py._hl.group.Group'> to /layers

I also tried:
	•	Writing to Zarr instead of H5AD
	•	adata.layers['scaled'].compute() is not feasible in my case due to memory constraints (the layer alone is ~94 GB).

My question:
What would be the recommended way to store final results in this situation?

Specifically:
	•	Is it expected that large intermediate layers (e.g. scaled data) should be dropped before saving?
	•	Is there a recommended pattern for persisting results when working with very large GPU/Dask-backed layers?
	•	Are there best practices for saving AnnData objects at this scale that I may be missing?

Any guidance on how you would handle this in a production-scale workflow would be greatly appreciated.

Thank you again for your time and for maintaining this project.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QST] Question about how to save the files #524

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] Question about how to save the files #524

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions