-
Notifications
You must be signed in to change notification settings - Fork 4
perf: remove dtype + fill val handling per chunk #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ld/zarrs_0.23.0
Are you sure you want to change the base?
Conversation
| &self.data_type, | ||
| &self.fill_value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general not sure what best practice here - put behind a method that returns the reference or is this ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine! I’d also remove the key and shape functions and just make them pub for consistency.
| @pytest.mark.filterwarnings( | ||
| # TODO: Fix handling of string fill values for Zarr v2 bytes data | ||
| "ignore:Array is unsupported by ZarrsCodecPipeline. incompatible fill value .eAAAAAAAAA==. for data type bytes:UserWarning" | ||
| "ignore:Array is unsupported by ZarrsCodecPipeline. incompatible fill value 0 for data type r56:UserWarning" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what’s a “r56”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a question probably for @LDeakin - presumably the representation of V7 in rust where 56 is the number of bits maybe and r is equivalent to V?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a human readable repr we can emit instead? (assuming we emit that message and not zarrs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That error is emitted by zarrs and it is the v3 name for the "raw bits" data type https://zarr-specs.readthedocs.io/en/latest/v3/data-types/index.html#core-data-types
I can have a look into tightening that error so it displays the V2 name when running on a V2 array.
| #[derive(Clone)] | ||
| #[gen_stub_pyclass] | ||
| #[pyclass] | ||
| pub(crate) struct WithSubset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename this to ChunkItem or ChunkSlice now that there’s only one?
Calling
to_native_dtype+__str__came up as one of the only python-CPU-bound things when doing some benchmarking. My use-case is quite contrived (generating thousands ofWithSubsetobjects) but I think it's probably worth investigating getting rid of these calls. Some observations:1. I wonder if all getting theSo it turns out the array actually doesn't have to be created yet when the pipeline is generated. So I ended up doing something similar with the metadata (making it an actualdtypeandfill_valbe wrapped up in just relying on https://docs.rs/zarrs/latest/zarrs/array/struct.Array.html#method.open and then using the values directly (there are probably other benefits of doing this) but I think this is a separate PRStructand then working with that to get fill and dtype).2. Regardless, most of this refactor is around removing
Basicanyway so that chunk handling is independent of the ability. I noticed thatChunkRepresentationrequires ownership over its arguments which means we copy per-chunk. Not sure what would go into making that a reference, but it's no worse than the previous situation where I think we were generating copies repeatedly, but from PyO3 calling python --> Hooray! No longer!The benefit wasn't crazy ~5% but I think going in this direction is good (see point 1)