Skip to content

Conversation

@scovich
Copy link
Contributor

@scovich scovich commented Jan 26, 2026

Which issue does this PR close?

Rationale for this change

Today's json decoder helper, make_decoder, takes an owned data type whose components are cloned at every level during the recursive decoder initialization process. This breaks pointer stability of the resulting DataType instances that a custom JSON decoder factory would see, vs. those of the schema it and the reader builder were initialized with.

The lack of pointer stability prevents users from creating "path based" decoder factories, that are able to customize decoder behavior based not only on type, but also on the field's path in the schema. See the PathBasedDecoderFactory in arrow-json/tests/custom_decoder_tests.rs of #9259, for an example.

What changes are included in this PR?

By passing &DataType instead, we change code like this:

let decoder = make_decoder(field.data_type().clone(), ...);
Ok(Self {
    data_type,
    decoder,
      ...
})

to this:

let child_decoder = make_decoder(field.data_type(), ...);
Ok(Self {
    data_type: data_type.clone(),
    decoder,
      ...
})

Result: Every call to make_decoder receives a reference to the actual original data type from the builder's input schema. The final decoder Self is unchanged -- it already received a clone and continues to do so.

NOTE: There is one additional clone of the top-level DataType::Struct we create for normal (!is_field) builders. But that's a cheap arc clone of a Fields member.

Are these changes tested?

Yes, existing unit tests validate the change.

Are there any user-facing changes?

No. All functions and data types involved are private -- the array decoders are marked pub but are defined in a private mod with no public re-export that would make them available outside the crate.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 26, 2026
}
let (data_type, nullable) = if self.is_field {
let field = &self.schema.fields[0];
(Cow::Borrowed(field.data_type()), field.is_nullable())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: This Cow::Borrowed is necessary to preserve pointer stability of field.data_type() that would otherwise have to be cloned.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scovich

@alamb
Copy link
Contributor

alamb commented Jan 26, 2026

(this PR needs to have some conflicts resolved)

@scovich
Copy link
Contributor Author

scovich commented Jan 26, 2026

@alamb should be good now

Comment on lines -90 to -92
// If this struct nullable, need to permit nullability in child array
// StructArrayDecoder::decode verifies that if the child is not nullable
// it doesn't contain any nulls not masked by its parent
Copy link
Contributor Author

@scovich scovich Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9271 merged too quickly... this comment was supposed to remain inside the .map call. So I'm restoring this code back to follow the original upstream approach, instead of opening a separate PR just for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

@scovich
Copy link
Contributor Author

scovich commented Jan 27, 2026

@alamb -- Is it normal for MIRI to take O(hours)? It's been stuck here for a very long time:

  ...
test rank::tests::test_bytes ... ok
test rank::tests::test_get_boolean_rank_index ... ok
test rank::tests::test_nullable_booleans ... ok
test rank::tests::test_primitive ... ok

@alamb
Copy link
Contributor

alamb commented Jan 27, 2026

@alamb -- Is it normal for MIRI to take O(hours)? It's been stuck here for a very long time:

It does take a while. I haven't tracked its overall time recently -- maybe some new tests are overly long

@alamb alamb merged commit fe126d0 into apache:main Jan 27, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants