Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
c2c8ba0
feat: add experimental native columnar to row conversion
andygrove Jan 19, 2026
49a5b20
cargo fmt
andygrove Jan 19, 2026
e558073
cargo clippy
andygrove Jan 19, 2026
a44066f
docs
andygrove Jan 19, 2026
fd58cba
update benchmark [skip ci]
andygrove Jan 19, 2026
bac9164
fix: use correct element sizes in native columnar to row for array/map
andygrove Jan 19, 2026
3ca5553
test: add fuzz test with nested types to native C2R suite
andygrove Jan 19, 2026
7f2e64d
test: add deeply nested type tests to native C2R suite
andygrove Jan 19, 2026
7afc4ba
test: add fuzz test with generateNestedSchema for native C2R
andygrove Jan 20, 2026
adc13a6
format
andygrove Jan 20, 2026
56df742
fix: handle LargeList and improve error handling in native C2R
andygrove Jan 20, 2026
461c625
fix
andygrove Jan 20, 2026
8b8741c
fix: add Dictionary-encoded array support to native C2R
andygrove Jan 20, 2026
b8ed2e7
format
andygrove Jan 20, 2026
330dbb2
clippy [skip ci]
andygrove Jan 20, 2026
8231a75
test: add benchmark comparing JVM and native columnar to row conversion
andygrove Jan 20, 2026
f2cc61c
perf: optimize native C2R by eliminating Vec allocations for strings
andygrove Jan 20, 2026
3ebcaca
perf: add fixed-width fast path for native C2R
andygrove Jan 20, 2026
ed72c29
test: add fixed-width-only benchmark and refactor C2R benchmark
andygrove Jan 20, 2026
17d83d5
perf: optimize complex types in native C2R by eliminating intermediat…
andygrove Jan 20, 2026
5f26a81
perf: add bulk copy optimization for primitive arrays in native C2R
andygrove Jan 20, 2026
e5b2c61
perf: add pre-downcast optimization for native C2R general path
andygrove Jan 20, 2026
7743138
fix: correct array element bulk copy for Date32, Timestamp, Boolean
andygrove Jan 20, 2026
9c66ef6
perf: Velox-style optimization for array/map C2R (40-52% faster)
andygrove Jan 20, 2026
64c5212
perf: inline type dispatch for struct fields in native C2R
andygrove Jan 20, 2026
04c49fb
perf: pre-downcast struct fields for native C2R
andygrove Jan 20, 2026
47d4c50
perf: optimize general path for mixed fixed/variable-length columns
andygrove Jan 20, 2026
081b3ed
revert
andygrove Jan 20, 2026
f696595
upmerge
andygrove Jan 20, 2026
92e1abb
revert doc format change
andygrove Jan 20, 2026
e735434
fix: address clippy warnings and remove dead code in native C2R
andygrove Jan 20, 2026
ab074bd
Remove #[inline] hint from bulk_copy_range
andygrove Jan 20, 2026
377214a
fix
andygrove Jan 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ jobs:
value: |
org.apache.comet.exec.CometShuffleSuite
org.apache.comet.exec.CometShuffle4_0Suite
org.apache.comet.exec.CometNativeColumnarToRowSuite
org.apache.comet.exec.CometNativeShuffleSuite
org.apache.comet.exec.CometShuffleEncryptionSuite
org.apache.comet.exec.CometShuffleManagerSuite
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ jobs:
value: |
org.apache.comet.exec.CometShuffleSuite
org.apache.comet.exec.CometShuffle4_0Suite
org.apache.comet.exec.CometNativeColumnarToRowSuite
org.apache.comet.exec.CometNativeShuffleSuite
org.apache.comet.exec.CometShuffleEncryptionSuite
org.apache.comet.exec.CometShuffleManagerSuite
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
CLAUDE.md
target
.idea
*.iml
Expand Down
11 changes: 11 additions & 0 deletions common/src/main/scala/org/apache/comet/CometConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,17 @@ object CometConf extends ShimCometConf {
val COMET_EXEC_LOCAL_TABLE_SCAN_ENABLED: ConfigEntry[Boolean] =
createExecEnabledConfig("localTableScan", defaultValue = false)

val COMET_NATIVE_COLUMNAR_TO_ROW_ENABLED: ConfigEntry[Boolean] =
conf(s"$COMET_EXEC_CONFIG_PREFIX.columnarToRow.native.enabled")
.category(CATEGORY_EXEC)
.doc(
"Whether to enable native columnar to row conversion. When enabled, Comet will use " +
"native Rust code to convert Arrow columnar data to Spark UnsafeRow format instead " +
"of the JVM implementation. This can improve performance for queries that need to " +
"convert between columnar and row formats. This is an experimental feature.")
.booleanConf
.createWithDefault(false)

val COMET_EXEC_SORT_MERGE_JOIN_WITH_JOIN_FILTER_ENABLED: ConfigEntry[Boolean] =
conf("spark.comet.exec.sortMergeJoinWithJoinFilter.enabled")
.category(CATEGORY_ENABLE_EXEC)
Expand Down
20 changes: 20 additions & 0 deletions common/src/main/scala/org/apache/comet/vector/NativeUtil.scala
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,26 @@ class NativeUtil {
(arrays, schemas)
}

/**
* Exports a ColumnarBatch to Arrow FFI and returns the memory addresses.
*
* This is a convenience method that allocates Arrow structs, exports the batch, and returns
* just the memory addresses (without exposing the Arrow types).
*
* @param batch
* the columnar batch to export
* @return
* a tuple of (array addresses, schema addresses, number of rows)
*/
def exportBatchToAddresses(batch: ColumnarBatch): (Array[Long], Array[Long], Int) = {
val numCols = batch.numCols()
val (arrays, schemas) = allocateArrowStructs(numCols)
val arrayAddrs = arrays.map(_.memoryAddress())
val schemaAddrs = schemas.map(_.memoryAddress())
val numRows = exportBatch(arrayAddrs, schemaAddrs, batch)
(arrayAddrs, schemaAddrs, numRows)
}

/**
* Exports a Comet `ColumnarBatch` into a list of memory addresses that can be consumed by the
* native execution.
Expand Down
1 change: 1 addition & 0 deletions docs/source/user-guide/latest/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Comet provides the following configuration settings.
| `spark.comet.dppFallback.enabled` | Whether to fall back to Spark for queries that use DPP. | true |
| `spark.comet.enabled` | Whether to enable Comet extension for Spark. When this is turned on, Spark will use Comet to read Parquet data source. Note that to enable native vectorized execution, both this config and `spark.comet.exec.enabled` need to be enabled. It can be overridden by the environment variable `ENABLE_COMET`. | true |
| `spark.comet.exceptionOnDatetimeRebase` | Whether to throw exception when seeing dates/timestamps from the legacy hybrid (Julian + Gregorian) calendar. Since Spark 3, dates/timestamps were written according to the Proleptic Gregorian calendar. When this is true, Comet will throw exceptions when seeing these dates/timestamps that were written by Spark version before 3.0. If this is false, these dates/timestamps will be read as if they were written to the Proleptic Gregorian calendar and will not be rebased. | false |
| `spark.comet.exec.columnarToRow.native.enabled` | Whether to enable native columnar to row conversion. When enabled, Comet will use native Rust code to convert Arrow columnar data to Spark UnsafeRow format instead of the JVM implementation. This can improve performance for queries that need to convert between columnar and row formats. This is an experimental feature. | false |
| `spark.comet.exec.enabled` | Whether to enable Comet native vectorized execution for Spark. This controls whether Spark should convert operators into their Comet counterparts and execute them in native space. Note: each operator is associated with a separate config in the format of `spark.comet.exec.<operator_name>.enabled` at the moment, and both the config and this need to be turned on, in order for the operator to be executed in native. | true |
| `spark.comet.exec.replaceSortMergeJoin` | Experimental feature to force Spark to replace SortMergeJoin with ShuffledHashJoin for improved performance. This feature is not stable yet. For more information, refer to the [Comet Tuning Guide](https://datafusion.apache.org/comet/user-guide/tuning.html). | false |
| `spark.comet.exec.strictFloatingPoint` | When enabled, fall back to Spark for floating-point operations that may differ from Spark, such as when comparing or sorting -0.0 and 0.0. For more information, refer to the [Comet Compatibility Guide](https://datafusion.apache.org/comet/user-guide/compatibility.html). | false |
Expand Down
Loading
Loading