Add `ScalarValue::RunEndEncoded` variant #19895

Jefffrey · 2026-01-20T08:28:40Z

Which issue does this PR close?

Closes Missing ScalarValue variant for RunEndEncoded #18563

Rationale for this change

Support RunEndEncoded scalar values, similar to how we support for Dictionary.

What changes are included in this PR?

Add new ScalarValue::RunEndEncoded enum variant
Fix ScalarValue::new_default to support Decimal32 and Decimal64
Support RunEndEncoded type in proto for both ScalarValue message and ArrowType message

Are these changes tested?

Added tests.

Are there any user-facing changes?

New variant for ScalarValue

Protobuf changes to support RunEndEncoded type

… type

Jefffrey · 2026-01-24T08:08:37Z

datafusion/common/src/scalar/mod.rs

+    /// (run-ends field, value field, value)
+    RunEndEncoded(FieldRef, FieldRef, Box<ScalarValue>),


Mimicking the arrow type where it stores fields:

https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html#variant.RunEndEncoded

I tried initially only storing the index DataType and the ScalarValue value, but figured it would be better to try be as accurate as possible 🤔

Jefffrey · 2026-01-24T08:09:05Z

datafusion/common/src/scalar/mod.rs

+            | DataType::Decimal32(_, _)
+            | DataType::Decimal64(_, _)


Little fix since we were missing these

Jefffrey · 2026-01-24T08:09:24Z

datafusion/common/src/scalar/mod.rs


-            // Unsupported types for now
-            _ => {
+            DataType::ListView(_) | DataType::LargeListView(_) => {


Just getting rid of the catch-all to be more rigorous

Jefffrey · 2026-01-24T08:11:35Z

datafusion/common/src/scalar/mod.rs

                    _ => unreachable!("Invalid dictionary keys type: {}", key_type),
                }
            }
+            DataType::RunEndEncoded(run_ends_field, value_field) => {


We're building the runarray efficiently here, as unlike dictionary above which would require keeping a hashmap of values to build an efficient dictionary array, run arrays are simpler in that we just need to track when a new run starts.

Most of the verbosity here is related to destructuring input ScalarValues and ensuring we have consistent types from them.

Jefffrey · 2026-01-24T08:12:53Z

datafusion/common/src/scalar/mod.rs

+                    let run_ends = PrimitiveArray::<R>::from_iter_values(run_ends);
+                    let values = ScalarValue::iter_to_array(value_scalars)?;
+
+                    // Using ArrayDataBuilder so we can maintain the fields


I think this is the only way to construct runarrays with fields we want, since try_new creates the fields for us:

https://github.com/apache/arrow-rs/blob/ddb6c42194fa45516e1bd4a27cdacf10fda56b5a/arrow-array/src/array/run_array.rs#L99-L105

datafusion/common/src/scalar/mod.rs

Jefffrey · 2026-01-24T08:16:01Z

datafusion/common/src/scalar/mod.rs

+        );
+        let err = scalar.eq_array(&run_array, 1).unwrap_err();
+        let expected = "Internal error: could not cast array of type Float32 to arrow_array::array::primitive_array::PrimitiveArray<arrow_array::types::Float64Type>";
+        assert!(err.to_string().starts_with(expected));


Needed to use starts_with since backtrace feature can affect the error message, so direct equality can succeed for cargo test but fail in CI

Jefffrey · 2026-01-24T08:17:06Z

datafusion/proto-common/src/to_proto/mod.rs

-                return Err(Error::General(
-                    "Proto serialization error: The RunEndEncoded data type is not yet supported".to_owned()
-                ))
+            DataType::Decimal32(precision, scale) => {


Only change here is for RunEndEncoded; for some reason other formatting changes were applied for the other arms here

alamb · 2026-01-27T22:45:38Z

FYI @brancz

alamb

I won't say I reviewed every line of this PR carefully but I did read them all and they look structurally good to me -- thank you for pushing this along @Jefffrey

datafusion/common/src/scalar/mod.rs

brancz · 2026-01-28T07:33:39Z

datafusion/common/src/scalar/mod.rs

            }
            (Dictionary(_, _), _) => None,
+            (RunEndEncoded(rf1, vf1, v1), RunEndEncoded(rf2, vf2, v2)) => {
+                // Don't compare if the run ends fields don't match (it is effectively a different datatype)


I'm not sure this is exactly what we want. The run arrays could be logically identical, but their index types might differ. I don't think we'd want the scalar not to equal in that case. I realize that's not what we have for dictionaries either, but is that really the intention of scalars? My understanding has always been that the integer width of codes should be irrelevant from a logical equality perspective.

I'm not sure if we want that logic as this level; for example if we fix PartialOrd here to compare REE/Dicts based on inner values only, then we'd probably have to do the same for PartialEq right? But then we run into an issue with Hash not being consistent unless we also fix Hash 🤔

I think it might be better to leave these as is, and if we want proper comparison it would make more sense to do at a high level (e.g. via type coercion)

Jefffrey added 2 commits January 20, 2026 10:15

Add ScalarValue::RunEndEncoded variant

8c7740c

Store fields in ScalarValue::RunEndEncoded instead of just run-ends…

33f08f0

… type

github-actions bot added sql SQL Planner common Related to common crate proto Related to proto crate labels Jan 20, 2026

Jefffrey changed the title ~~Ree scalarvalue~~ Add ScalarValue::RunEndEncoded variant Jan 20, 2026

Jefffrey added 3 commits January 24, 2026 16:20

Merge branch 'main' into ree-scalarvalue

a13b948

fix error asserts to account for backtrace

af2b46e

fix

f69ee86

Jefffrey commented Jan 24, 2026

View reviewed changes

fix proto field name & comment

1c0b400

Jefffrey marked this pull request as ready for review January 24, 2026 08:38

Merge branch 'main' into ree-scalarvalue

1293f58

alamb added the api change Changes the API exposed to users of the crate label Jan 27, 2026

alamb approved these changes Jan 27, 2026

View reviewed changes

datafusion/common/src/scalar/mod.rs Show resolved Hide resolved

brancz reviewed Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ScalarValue::RunEndEncoded` variant #19895

Add `ScalarValue::RunEndEncoded` variant #19895

Jefffrey commented Jan 20, 2026 •

edited

Loading

Uh oh!

Jefffrey Jan 24, 2026

Uh oh!

Jefffrey Jan 24, 2026

Uh oh!

Jefffrey Jan 24, 2026

Uh oh!

Jefffrey Jan 24, 2026

Uh oh!

Jefffrey Jan 24, 2026

Uh oh!

Uh oh!

Jefffrey Jan 24, 2026

Uh oh!

Jefffrey Jan 24, 2026

Uh oh!

alamb commented Jan 27, 2026

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

brancz Jan 28, 2026

Uh oh!

Jefffrey Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		/// (run-ends field, value field, value)
		RunEndEncoded(FieldRef, FieldRef, Box<ScalarValue>),

Add ScalarValue::RunEndEncoded variant #19895

Are you sure you want to change the base?

Add ScalarValue::RunEndEncoded variant #19895

Conversation

Jefffrey commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jan 27, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add `ScalarValue::RunEndEncoded` variant #19895

Add `ScalarValue::RunEndEncoded` variant #19895

Jefffrey commented Jan 20, 2026 •

edited

Loading