Skip to content

Conversation

@keceli
Copy link
Contributor

@keceli keceli commented Sep 25, 2025

  • Root cause: Array constructor allocated memory but didn't initialize it
  • For primitive types like int_t, this caused garbage values in memory
  • LAPACK workspace queries read these garbage values as workspace sizes
  • Result: unreasonable workspace sizes (e.g., 127 trillion) causing std::bad_alloc

Fix:

  • Initialize new memory to zero using std::fill() when no existing data to copy
  • Ensures all Array objects are properly initialized
  • Prevents similar memory corruption issues throughout codebase

Resolves: std::bad_alloc errors in LAPACK eigenvalue calculations

- Root cause: Array<T> constructor allocated memory but didn't initialize it
- For primitive types like int_t, this caused garbage values in memory
- LAPACK workspace queries read these garbage values as workspace sizes
- Result: unreasonable workspace sizes (e.g., 127 trillion) causing std::bad_alloc

Fix:
- Initialize new memory to zero using std::fill() when no existing data to copy
- Ensures all Array<T> objects are properly initialized
- Prevents similar memory corruption issues throughout codebase

Resolves: std::bad_alloc errors in LAPACK eigenvalue calculations
@keceli
Copy link
Contributor Author

keceli commented Sep 25, 2025

@avcopan did you have a chance to test this one?

@avcopan
Copy link
Collaborator

avcopan commented Sep 26, 2025

I did. I didn't time it, but seemed like it runs much slower, so I'm hesitant to merge.

@keceli
Copy link
Contributor Author

keceli commented Sep 26, 2025

I don't know why would these changes slow down the code. Did you check with CMAKE_BUILD_TYPE=Release maybe the difference is due to debug flags.

@avcopan
Copy link
Collaborator

avcopan commented Sep 26, 2025

@keceli I can't do an apples to apples comparison because I can't compile the code without this, but comparing Yuri's static executable to my compiled executable there is a significant difference...

Yuri's executable:

$ time mess mess.inp
master_equation: WARNING: reference energy has not been initialized: using the default

real    0m1.823s
user    0m7.723s
sys     0m0.227s

My executable, build with CMAKE_BUILD_TYPE=Release after cherry-picking your commit:

$ time mess mess.inp
master_equation: WARNING: reference energy has not been initialized: using the default

real    0m11.897s
user    1m31.348s
sys     0m0.429s

@avcopan
Copy link
Collaborator

avcopan commented Sep 26, 2025

Here are the files if you want to try it out yourself. You can just rename the Pixi configs to pixi.toml, put them in a directory and do pixi shell to create the environment. It is pulling from my channels, which I only use for testing.

(Note: I had to add the .txt file extension to upload them.)

pixi.toml_yg.txt
pixi.toml_mk.txt
mess.inp.txt

@avcopan
Copy link
Collaborator

avcopan commented Sep 26, 2025

The slowdown may well be caused by something other than this change, but I am wary to merge it without proper testing and without Yuri's approval.

@keceli
Copy link
Contributor Author

keceli commented Sep 27, 2025

Thanks @avcopan. I reproduced the slowdown. It is due to GSL CBLAS wrapper (libgslcblas) which is not optimized. I observe the same slowdown when it links to libgslcblas vs libcblas. I am trying to find a way to enforce it in CMake.

TODO: Check with Yuri about the reason of removing them before
- Removed test_integer_interface executable from CMakeLists.txt
- Deleted test_mkl_ilp64.cpp source file
- Cleaned up build configuration
- Remove malformed LDFLAGS and LIBS from GSL_CONFIGURE_OPTIONS
- Use CMAKE_COMMAND -E env to properly set environment variables
- Fix LDFLAGS to point to correct library directory
- This should resolve the CI build failure
- Properly quote environment variables in CMAKE_COMMAND -E env
- This should resolve the syntax warning and command parsing issues
- Fixes CI build failure with GSL configure step
- Use GSL's internal CBLAS instead of trying to link external BLAS
- Remove complex environment variable setup that was causing build failures
- This should make the CI build more reliable and avoid linking issues
- GSL will use its internal CBLAS implementation which is sufficient for basic operations
@keceli
Copy link
Contributor Author

keceli commented Sep 29, 2025

@avcopan could you try again? I am now getting the same timing with the pixi installed version and this one.

…ummary

- Set CMAKE_BUILD_TYPE to Release by default if not specified
- Add comprehensive build configuration summary at configure time
- Display build type, C++ standard, and all project options
- Makes it easier to verify build settings during configuration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants