-
Notifications
You must be signed in to change notification settings - Fork 1.1k
make_decoder accepts borrowed DataType instead of owned #9270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
arrow-json/src/reader/mod.rs
Outdated
| } | ||
| let (data_type, nullable) = if self.is_field { | ||
| let field = &self.schema.fields[0]; | ||
| (Cow::Borrowed(field.data_type()), field.is_nullable()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: This Cow::Borrowed is necessary to preserve pointer stability of field.data_type() that would otherwise have to be cloned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scovich
|
(this PR needs to have some conflicts resolved) |
|
@alamb should be good now |
| // If this struct nullable, need to permit nullability in child array | ||
| // StructArrayDecoder::decode verifies that if the child is not nullable | ||
| // it doesn't contain any nulls not masked by its parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#9271 merged too quickly... this comment was supposed to remain inside the .map call. So I'm restoring this code back to follow the original upstream approach, instead of opening a separate PR just for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
|
@alamb -- Is it normal for MIRI to take O(hours)? It's been stuck here for a very long time: |
It does take a while. I haven't tracked its overall time recently -- maybe some new tests are overly long |
Which issue does this PR close?
Rationale for this change
Today's json decoder helper,
make_decoder, takes an owned data type whose components are cloned at every level during the recursive decoder initialization process. This breaks pointer stability of the resultingDataTypeinstances that a custom JSON decoder factory would see, vs. those of the schema it and the reader builder were initialized with.The lack of pointer stability prevents users from creating "path based" decoder factories, that are able to customize decoder behavior based not only on type, but also on the field's path in the schema. See the
PathBasedDecoderFactoryin arrow-json/tests/custom_decoder_tests.rs of #9259, for an example.What changes are included in this PR?
By passing
&DataTypeinstead, we change code like this:to this:
Result: Every call to
make_decoderreceives a reference to the actual original data type from the builder's input schema. The final decoderSelfis unchanged -- it already received a clone and continues to do so.NOTE: There is one additional clone of the top-level
DataType::Structwe create for normal (!is_field) builders. But that's a cheap arc clone of aFieldsmember.Are these changes tested?
Yes, existing unit tests validate the change.
Are there any user-facing changes?
No. All functions and data types involved are private -- the array decoders are marked
pubbut are defined in a private mod with no public re-export that would make them available outside the crate.