Skip to content

Conversation

@votti
Copy link

@votti votti commented Jan 27, 2025

Background

I noticed that mudata currently has issues to load legacy .h5mu files generated
by mudata==0.1.2

Changes

  • Adds test files for .h5mu from mudata==0.1.2
  • Adds a test that generally handles regression testing against files from older versions
  • Adds fix for loading .h5mu from mudata==0.1.2

Vito Zanotelli added 2 commits January 27, 2025 10:28
I noticed errors loading h5mu files from
mudata==0.1.2

To generate the test files using this mudata
version, uv can be used from within the data
archive directory:
`uv run create_testfiles.py`

The concept of the data archive for legacy files
has been modeled from the `anndata` repository.
Modeled after the anndata test for backwards
compatibility.
Checks if loading of all tests/data/archives/vxxx/mudata.h5mu files
can be loaded.
This works around an issue with loading legacy
.h5mu files where empty obsp/varp are stored as None
instead or missing/empty dict.

None values will be converted to emtpy dicts during loading.

Fixes: scverse#91
@votti votti marked this pull request as ready for review January 27, 2025 10:37
@votti
Copy link
Author

votti commented Jan 27, 2025

PR #87 is required to fix the underlying build failure. Happy to rebase once this PR is merged.

@gtca
Copy link
Collaborator

gtca commented Feb 10, 2025

Thank you, @votti!

The suggested tests will probably not be run by CI anyway?

I refactored the code a bit more in b87797e to be concise there and yet to include the suggested way to handle None.

@gtca gtca closed this Feb 10, 2025
@votti
Copy link
Author

votti commented Feb 10, 2025

The testfile would be small and would have run as part of CI.

If this kind of regression tests make sense depends on the scope of the project: if it is a priority that old (published?) mudata files should be readable by the latest version, tests like these would be valuable.

@gtca
Copy link
Collaborator

gtca commented Feb 11, 2025

Thanks, @votti, that's definitely in scope and will be valuable to have.

The testfile would be small and would have run as part of CI.

I am currently not sure we want to start including actual hdf5 files in contrast to generating them, which would not be executed by the CI (if I understand it correctly). We would then also need to include MuData files populated with different attributes beyond the simplest case with two modalities with no annotations.

Let's continue discussing that in #92!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants