Skip to content

Conversation

@IrisMeasure
Copy link

@IrisMeasure IrisMeasure commented Dec 30, 2025

Feature or improvement description
This PR rewrites the subroutines LAPACK_DPPTRF and LAPACK_SPPTRF in NWTC_LAPACK.f90, replacing the packed storage Cholesky decomposition (xPPTRF) with the full storage Cholesky decomposition (xPOTRF). To ensure compatibility with existing callers, the subroutine signature remains unchanged by using an internal wrapper to handle the conversion between packed and full storage formats.

This change results in a substantial speed improvement for TurbSim on macOS, with minimal additional memory overhead.

Related issue, if one exists
#3120

Impacted areas of the software
TurbSim

Test results, if applicable
(1) macOS
I compiled TurbSim using GCC 15.2.0 with the following build flags:

BUILD_UNIT_TESTING=OFF
DOUBLE_PRECISION=OFF
VARIABLE_TRACKING=OFF

I used both versions of TurbSim to generate (i) Grid = 43 x 43, 120-second .bts file; (ii) Grid = 23 x 23, 600-second .bts file. The performance results (on macOS 26.2, M4 Pro) are shown below (Coh2h() is the caller of LAPACK_xPPTRF, and unit in seconds):

(i)

Version Total Time Coh2h()
Original (SPPTRF) 113.4 105.6
Modified (SPOTRF) 11.0 3.5

(ii)

Version Total Time Coh2h()
Original (SPPTRF) 15.5 12.4
Modified (SPOTRF) 4.3 1.1

Furthermore, the two version .bts files differ only in the metadata section, specifically at 0x42 ($n_{character}$) and the related $Character_i$ (typically version info and generated time), while the subsequent data sections are identical.

(2) Windows
I compiled TurbSim using IFORT (from Intel oneAPI 2024.2.1) and IFX (from Intel oneAPI 2025.0.1) with O2 optimization level. The performance results (on Windows 11 24H2, AMD 9950X) are shown below:

(i)

Version Total Time Coh2h()
IFORT + Original (SPPTRF) 45.6 38.1
IFORT + Modified (SPOTRF) 40.9 34.1
IFX + Original (SPPTRF) 44.0 36.5
IFX + Modified (SPOTRF) 44.0 36.4
Release 4.12 50.2 N/A

(ii)

Version Total Time Coh2h()
IFORT + Original (SPPTRF) 11.6 8.2
IFORT + Modified (SPOTRF) 9.5 6.2
IFX + Original (SPPTRF) 12.5 9.3
IFX + Modified (SPOTRF) 11.7 8.9
Release 4.12 13.5 N/A

After switching to SPOTRF, the computation speed of TurbSim on Windows has at least not decreased.
It should be noted that the .bts files generated by two versions of TurbSim (same compiler) are slightly different on Windows. However, in terms of engineering accuracy, this difference is negligible.

!=======================================================================
!> Compute the Cholesky factorization of a real symmetric positive definite matrix A stored in packed format.
!! use LAPACK_PPTRF (nwtc_lapack::lapack_pptrf) instead of this specific function.
SUBROUTINE LAPACK_DPPTRF (UPLO, N, AP, ErrStat, ErrMsg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The routines in NWTC_LAPACK.f90 are named for the LAPACK routine they calls. Since this change results in calling a different LAPACK routine, it seems like this should just be a new subroutine called `LAPACK_DPOTRF'. Though, it is a little tricky with the data conversion from packed to full matrix storage here since the function inputs are different than the LAPACK routine. Thoughts @andrew-platt, @deslaughter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We should add an interface for that. I haven't looked at the details though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a new LAPACK routine should be added and used directly in Turbsim

@andrew-platt andrew-platt added this to the v5.0.0 milestone Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants