Skip to content

Conversation

@Jefffrey
Copy link
Contributor

@Jefffrey Jefffrey commented Jan 20, 2026

Which issue does this PR close?

Rationale for this change

Support RunEndEncoded scalar values, similar to how we support for Dictionary.

What changes are included in this PR?

  • Add new ScalarValue::RunEndEncoded enum variant
  • Fix ScalarValue::new_default to support Decimal32 and Decimal64
  • Support RunEndEncoded type in proto for both ScalarValue message and ArrowType message

Are these changes tested?

Added tests.

Are there any user-facing changes?

New variant for ScalarValue

Protobuf changes to support RunEndEncoded type

@github-actions github-actions bot added sql SQL Planner common Related to common crate proto Related to proto crate labels Jan 20, 2026
@Jefffrey Jefffrey changed the title Ree scalarvalue Add ScalarValue::RunEndEncoded variant Jan 20, 2026
Comment on lines +432 to +433
/// (run-ends field, value field, value)
RunEndEncoded(FieldRef, FieldRef, Box<ScalarValue>),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mimicking the arrow type where it stores fields:

I tried initially only storing the index DataType and the ScalarValue value, but figured it would be better to try be as accurate as possible 🤔

Comment on lines +1604 to +1605
| DataType::Decimal32(_, _)
| DataType::Decimal64(_, _)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little fix since we were missing these


// Unsupported types for now
_ => {
DataType::ListView(_) | DataType::LargeListView(_) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just getting rid of the catch-all to be more rigorous

_ => unreachable!("Invalid dictionary keys type: {}", key_type),
}
}
DataType::RunEndEncoded(run_ends_field, value_field) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're building the runarray efficiently here, as unlike dictionary above which would require keeping a hashmap of values to build an efficient dictionary array, run arrays are simpler in that we just need to track when a new run starts.

Most of the verbosity here is related to destructuring input ScalarValues and ensuring we have consistent types from them.

let run_ends = PrimitiveArray::<R>::from_iter_values(run_ends);
let values = ScalarValue::iter_to_array(value_scalars)?;

// Using ArrayDataBuilder so we can maintain the fields
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only way to construct runarrays with fields we want, since try_new creates the fields for us:

https://github.com/apache/arrow-rs/blob/ddb6c42194fa45516e1bd4a27cdacf10fda56b5a/arrow-array/src/array/run_array.rs#L99-L105

);
let err = scalar.eq_array(&run_array, 1).unwrap_err();
let expected = "Internal error: could not cast array of type Float32 to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Float64Type>";
assert!(err.to_string().starts_with(expected));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to use starts_with since backtrace feature can affect the error message, so direct equality can succeed for cargo test but fail in CI

return Err(Error::General(
"Proto serialization error: The RunEndEncoded data type is not yet supported".to_owned()
))
DataType::Decimal32(precision, scale) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only change here is for RunEndEncoded; for some reason other formatting changes were applied for the other arms here

@Jefffrey Jefffrey marked this pull request as ready for review January 24, 2026 08:38
@alamb alamb added the api change Changes the API exposed to users of the crate label Jan 27, 2026
@alamb
Copy link
Contributor

alamb commented Jan 27, 2026

FYI @brancz

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't say I reviewed every line of this PR carefully but I did read them all and they look structurally good to me -- thank you for pushing this along @Jefffrey

}
(Dictionary(_, _), _) => None,
(RunEndEncoded(rf1, vf1, v1), RunEndEncoded(rf2, vf2, v2)) => {
// Don't compare if the run ends fields don't match (it is effectively a different datatype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is exactly what we want. The run arrays could be logically identical, but their index types might differ. I don't think we'd want the scalar not to equal in that case. I realize that's not what we have for dictionaries either, but is that really the intention of scalars? My understanding has always been that the integer width of codes should be irrelevant from a logical equality perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want that logic as this level; for example if we fix PartialOrd here to compare REE/Dicts based on inner values only, then we'd probably have to do the same for PartialEq right? But then we run into an issue with Hash not being consistent unless we also fix Hash 🤔

I think it might be better to leave these as is, and if we want proper comparison it would make more sense to do at a high level (e.g. via type coercion)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate common Related to common crate proto Related to proto crate sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing ScalarValue variant for RunEndEncoded

3 participants