Skip to content

Conversation

@poshul
Copy link
Contributor

@poshul poshul commented Dec 4, 2025

Summary by CodeRabbit

  • Chores
    • Improved relocatability for Arrow and Parquet packages by normalizing build paths and replacing hardcoded build-path occurrences with a relocatable form.
    • Applied consistent handling across Windows and Unix-like platforms.
    • Enhanced diagnostic output with per-file status messages and overall completion reporting.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Walkthrough

Extends relocatability fixes in libraries.cmake/arrow.cmake to normalize build paths and replace hardcoded absolute _IMPORT_PREFIX occurrences in generated ArrowTargets*.cmake and ParquetTargets*.cmake files, with platform-specific handling and verbose logging.

Changes

Cohort / File(s) Summary
Arrow CMake relocatability logic
libraries.cmake/arrow.cmake
Adds normalization of project/build paths and computes ARROW_RELOCATABLE_REPLACEMENT and ARROW_ORIGINAL_PATTERN. Applies pattern-based replacements to ArrowTargets*.cmake and ParquetTargets*.cmake (IMPORTED_LOCATION and related properties), handling Windows/MSVC and Linux/macOS branches separately, with verbose status/warning messages per file.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review path normalization (TO_CMAKE_PATH) and cross-platform slash/backslash handling.
  • Verify correctness and safety of regex/pattern matching and replacement across ArrowTargets*/ParquetTargets* variants.
  • Check file I/O and edge-case handling when pattern not found (warning + 800-char preview).
  • Confirm logging/verbosity behavior and that no unintended properties are altered.

Possibly related PRs

  • More arrow bullshit #165 — Similar changes to libraries.cmake/arrow.cmake implementing relocatable replacements for ArrowTargets and ParquetTargets files; likely a closely related implementation or prior iteration.

Poem

🐇 I hopped through CMake paths today,
Replaced the hardcode gone astray,
Normalized each twist and slash,
So Arrow’s files can freely dash —
Hooray for builds that find their way! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'try again' is vague and generic, providing no meaningful information about the changeset's actual modifications to CMake files or the relocatability fix. Replace with a descriptive title that summarizes the main change, such as 'Fix Arrow and Parquet CMake target relocatability' or 'Extend CMake relocatability fix to multiple target files'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch more_arrow_bullshit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
libraries.cmake/arrow.cmake (4)

92-107: Relocatability approach looks good but relies on an exact _IMPORT_PREFIX line match

The strategy of replacing Arrow’s hardcoded _IMPORT_PREFIX with:

get_filename_component(_IMPORT_PREFIX "${CMAKE_CURRENT_LIST_DIR}/../../.." ABSOLUTE)

is solid and should make Arrow/Parquet configs relocatable on both Windows and Linux/macOS.

However, the ARROW_ORIGINAL_PATTERN you search/replace is the full set(_IMPORT_PREFIX "...") line:

  • Windows: set(_IMPORT_PREFIX "${PROJECT_BINARY_DIR_NORMALIZED}")
  • Linux/macOS: set(_IMPORT_PREFIX "${PROJECT_BINARY_DIR}")

This makes the fix quite sensitive to small upstream formatting changes in Arrow’s generated *Targets.cmake (extra spaces, different quoting, added options, trailing slash, etc.). Any such change will cause string(FIND) to fail and the replacement to silently do nothing (beyond the warning and preview).

If you want this to be more robust across Arrow versions, consider matching only the path portion (or using a string(REGEX REPLACE ...) that looks for set(_IMPORT_PREFIX ".*") and replaces the whole command), while still producing the same get_filename_component call.

Also applies to: 221-234


99-104: Consider normalizing PROJECT_BINARY_DIR consistently across platforms

On Windows you normalize:

file(TO_CMAKE_PATH "${PROJECT_BINARY_DIR}" PROJECT_BINARY_DIR_NORMALIZED)
set(ARROW_ORIGINAL_PATTERN "set(_IMPORT_PREFIX \"${PROJECT_BINARY_DIR_NORMALIZED}\")")

but on Linux/macOS you use the raw PROJECT_BINARY_DIR in ARROW_ORIGINAL_PATTERN.

Since CMake usually renders paths with forward slashes on all platforms, this asymmetry is probably harmless, but for long‑term maintainability it might be cleaner to use the same normalization step (and pattern construction) on all platforms so the matching behavior is identical.

Also applies to: 228-230


110-127: Verify that the narrower globbing still catches all relevant Arrow/Parquet targets

You now only patch:

file(GLOB ARROW_TARGET_FILES   "${ARROW_CMAKE_DIR}/ArrowTargets.cmake")
file(GLOB PARQUET_TARGET_FILES "${PARQUET_CMAKE_DIR}/ParquetTargets.cmake")

instead of ArrowTargets*.cmake / ParquetTargets*.cmake (per the summary). That’s nice for predictability, but it assumes that the only _IMPORT_PREFIX assignments that matter live in those exact files and not in configuration-specific ones (e.g. ArrowTargets-debug.cmake).

Given Arrow’s current layout that’s likely true, but it would be good to double‑check the generated CMake package to ensure:

  • ArrowTargets.cmake / ParquetTargets.cmake are the authoritative places _IMPORT_PREFIX is set.
  • There aren’t additional set(_IMPORT_PREFIX ...) occurrences in other *Targets*.cmake files that also need the same treatment.

If those assumptions hold, this change is fine and helps avoid over‑touching unrelated files.

Also applies to: 130-141, 236-253, 256-267


115-123: CMake version dependency for string(SUBSTRING ...) with fixed length 800

The debug preview:

string(SUBSTRING "${TARGET_CONTENT}" 0 800 TARGET_PREVIEW)

is helpful. On CMake ≥3.2 this is safe even when the file is shorter than 800 characters (the substring is just truncated). On CMake 3.1 and below, though, an over‑long length is an error.

If the project’s minimum required CMake version is already ≥3.2, this is fine as‑is. If you still support very old CMake versions, you may want to guard this or compute min(800, actual_length) first.

Also applies to: 239-249

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b41c2c5 and 6035da8.

📒 Files selected for processing (1)
  • libraries.cmake/arrow.cmake (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: build (ubuntu-24.04-arm)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-24.04-arm)
  • GitHub Check: build (ubuntu-latest)
🔇 Additional comments (1)
libraries.cmake/arrow.cmake (1)

92-144: Nice diagnostic logging and symmetry between Windows and Linux/macOS

The added status/warning messages and the consistent application of the relocatability fix across both MSVC and non‑MSVC branches make this much easier to debug if Arrow’s generated configs ever change. The final:

message(STATUS "Fixed Arrow/Parquet CMake configs for relocatability")

is also helpful as a clear indicator that the patch step ran.

No functional issues here from my side.

Also applies to: 221-270

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
libraries.cmake/arrow.cmake (3)

109-122: Consider adding validation for GLOB results.

If the Arrow build or installation fails partially, file(GLOB ...) may return an empty list, and the loop will silently do nothing. Consider adding a check to warn or fail if no matching files are found.

 file(GLOB ARROW_TARGET_FILES "${ARROW_CMAKE_DIR}/ArrowTargets*.cmake")
+list(LENGTH ARROW_TARGET_FILES ARROW_TARGET_FILES_COUNT)
+if(ARROW_TARGET_FILES_COUNT EQUAL 0)
+  message(WARNING "No ArrowTargets*.cmake files found in ${ARROW_CMAKE_DIR}")
+endif()
 foreach(TARGET_FILE ${ARROW_TARGET_FILES})

218-220: Inconsistency: Missing path normalization on Linux/macOS.

The Windows block uses file(TO_CMAKE_PATH ...) at line 99 for path normalization, but this block uses ${PROJECT_BINARY_DIR} directly. While Linux/macOS paths typically already use forward slashes, adding normalization would ensure consistency and robustness.

+# Normalize path separators for replacement (for consistency)
+file(TO_CMAKE_PATH "${PROJECT_BINARY_DIR}" PROJECT_BINARY_DIR_NORMALIZED)
+
 # Fix 1: Replace hardcoded _IMPORT_PREFIX in ArrowTargets.cmake
 set(ARROW_RELOCATABLE_REPLACEMENT "get_filename_component(_IMPORT_PREFIX \"\${CMAKE_CURRENT_LIST_DIR}/../../..\" ABSOLUTE)")
-set(ARROW_ORIGINAL_PATTERN "set(_IMPORT_PREFIX \"${PROJECT_BINARY_DIR}\")")
+set(ARROW_ORIGINAL_PATTERN "set(_IMPORT_PREFIX \"${PROJECT_BINARY_DIR_NORMALIZED}\")")

Then update line 235 to use PROJECT_BINARY_DIR_NORMALIZED as well.


92-134: Consider extracting relocatability fix into a helper macro.

The Windows and Linux/macOS blocks contain nearly identical relocatability fix logic (pattern definition, GLOB, foreach, string replacement). This duplication could be reduced by extracting a helper macro, improving maintainability.

Example approach:

# Define helper macro before the platform-specific blocks
macro(FIX_CMAKE_RELOCATABILITY CMAKE_DIR FILE_PATTERN BUILD_DIR)
  file(TO_CMAKE_PATH "${BUILD_DIR}" _NORMALIZED_DIR)
  set(_REPLACEMENT "get_filename_component(_IMPORT_PREFIX \"\${CMAKE_CURRENT_LIST_DIR}/../../..\" ABSOLUTE)")
  set(_ORIGINAL "set(_IMPORT_PREFIX \"${_NORMALIZED_DIR}\")")
  
  file(GLOB _TARGET_FILES "${CMAKE_DIR}/${FILE_PATTERN}")
  foreach(_TARGET_FILE ${_TARGET_FILES})
    file(READ "${_TARGET_FILE}" _CONTENT)
    string(REPLACE "${_ORIGINAL}" "${_REPLACEMENT}" _CONTENT "${_CONTENT}")
    string(REPLACE "\"${_NORMALIZED_DIR}/" "\"\${_IMPORT_PREFIX}/" _CONTENT "${_CONTENT}")
    file(WRITE "${_TARGET_FILE}" "${_CONTENT}")
    message(STATUS "  Fixed: ${_TARGET_FILE}")
  endforeach()
endmacro()

Then call it for each directory:

FIX_CMAKE_RELOCATABILITY("${ARROW_CMAKE_DIR}" "ArrowTargets*.cmake" "${PROJECT_BINARY_DIR}")
FIX_CMAKE_RELOCATABILITY("${PARQUET_CMAKE_DIR}" "ParquetTargets*.cmake" "${PROJECT_BINARY_DIR}")

Also applies to: 212-252

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6035da8 and 55fdcfb.

📒 Files selected for processing (1)
  • libraries.cmake/arrow.cmake (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-24.04-arm)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: build (macos-latest)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-24.04-arm)
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: build (macos-latest)
🔇 Additional comments (3)
libraries.cmake/arrow.cmake (3)

98-99: Good: Path normalization for Windows.

Using file(TO_CMAKE_PATH ...) ensures consistent forward slashes in the replacement pattern, which is important for matching Arrow's generated CMake files.


124-132: LGTM - Consistent relocation pattern for Parquet.

The Parquet files are processed using the same replacement strategy, which is correct since they share the same _IMPORT_PREFIX mechanism.


101-103: Approve: Well-structured relocation strategy.

The approach of replacing the hardcoded set(_IMPORT_PREFIX "...") with get_filename_component(_IMPORT_PREFIX "${CMAKE_CURRENT_LIST_DIR}/../../.." ABSOLUTE) is a standard and effective way to make CMake configs relocatable. Combined with replacing absolute paths in IMPORTED_LOCATION properties, this should correctly handle package relocation.

@poshul poshul merged commit f2780aa into master Dec 5, 2025
11 checks passed
@poshul
Copy link
Contributor Author

poshul commented Dec 5, 2025

Arrow's build system helpfully produces cmake files to tell any dependent projects where the generated libraries are, but it unhelpfully uses absolute paths, which makes the whole build fold non-relocatable. This is fine in the linux builds since it doesn't get relocated, but for windows it means the build system looks for the arrow libraries in a path that doesn't exist.

@jpfeuffer
Copy link
Contributor

Please report this to arrow devs, as this workaround feels very unstable.

There are definitely ways to make this relocatable if they do it correctly!

https://stackoverflow.com/questions/45730997/create-relocatable-package-with-proper-autogenerated-config-cmake

@jpfeuffer
Copy link
Contributor

Maybe even patch the install/export parts here instead of post-replacing paths if you can find the error quickly.

@coderabbitai coderabbitai bot mentioned this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants