-
Notifications
You must be signed in to change notification settings - Fork 205
feat: Add global order min/max per tile to the fragment metadata #5646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…metdata-global-order-bounds
…to load default profile in ordinary execution
…rlapping single-tile fragment section
…metdata-global-order-bounds
…metdata-global-order-bounds
I would call a change to the storage format a |
Fair enough. I imagine I wrote "chore" because there isn't really any value added yet; just the enablement of certain enhancements. Edited the title. |
ypatia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to commit the autogenerated tiledb-rest.capnp.{c++,h} files now that we've reverted #5734
| * or if the fragment was written in a format version which does not contain | ||
| * the bounding rectangle global order bounds. | ||
| */ | ||
| std::vector<std::vector<uint8_t>> global_order_lower_bound( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw the vector of vectors, which is not good for performance. We should better return the raw values (in an std::pair<size_t, void*>), which both matches other fragment info C++ APIs, and does not require knowing the array schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C API tiledb_fragment_info_get_global_order_lower_bound copies data into the user's out-argument. The inner vector is necessary as a buffer to copy that data into.
Alternatives would include:
- user supplying their own allocated buffer as an out-argument. Very C-like and requires either knowing the schema to size the buffers correctly, or making two calls to get the size first and then the data (as is done here already)
- returning something like
std::pair<std::vector<uint8_t>, std::vector<uint64_t>>containing the concatenated data and offset into it for each dimension.
If performance is paramount then the C API is available. I think the usability is worth the tradeoff.
Resolves CORE-321.
Today's fragment metadata contains the minimum bounding rectangle of each tile. This is very useful for determining whether tiles can satisfy spatial queries, but much less useful for determining how different tiles from different fragments may or may not interleave (knowledge of which can be used to implement several possible optimizations). This is because the bounding rectangle can under-estimate the true lower bound of the global order value in a tile, potentially by a lot.
We would like for our queries to start to leverage the interleaving of tiles from different fragments in the ways described in that linked document. First we must extend the fragment metadata to include the tile global order bounds. This pull request implements that:
TYPE: C_API | CPP_API | FORMAT
DESC: Add per-tile global order bounds to fragment metadata