Thrust Overflow Fix #699

otbrown · 2025-10-24T15:49:19Z

gpu_thrust.cuh: modified initial thrust counting iterator declarations to use long long to avoid overflow at >30 qubits. Fixes #698.

…s to use long long to avoid overflow at >30 qubits. Fixes #698.

otbrown · 2025-10-24T15:52:06Z

@JPRichings Fix branch for your testing pleasure!

otbrown · 2025-10-24T16:52:47Z

Single AMD GPU on ARCHER2 ✅:

otbz19@ln02:/work/z19/z19/otbz19/QuEST/QuEST> cat slurm-11307502.out

QuEST execution environment:
  precision:       2
  multithreaded:   1
  distributed:     1
  GPU-accelerated: 1
  GPU-sharing ok:  0
  cuQuantum:       0
  num nodes:       1

Testing configuration:
  test all deployments:  0
  num qubits in qureg:   6
  max num qubit perms:   25
  max num superop targs: 4
  num mixed-deploy reps: 10

Tested Qureg deployments:
  GPU + OMP + MPI

Randomness seeded to: 2726962016
===============================================================================
All tests passed (51879 assertions in 269 test cases)

and okay one failure on 4 GPUs, but I believe that's entirely unrelated to this change...

QuEST execution environment:
  precision:       2
  multithreaded:   1
  distributed:     1
  GPU-accelerated: 1
  GPU-sharing ok:  0
  cuQuantum:       0
  num nodes:       4

Testing configuration:
  test all deployments:  0
  num qubits in qureg:   6
  max num qubit perms:   25
  max num superop targs: 4
  num mixed-deploy reps: 10

Tested Qureg deployments:
  GPU + OMP + MPI

Randomness seeded to: 4273463917

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tests is a Catch2 v3.8.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
rightapplyCompMatr
  validation
  targeted amps fit in node
-------------------------------------------------------------------------------
/work/z19/z19/otbz19/QuEST/QuEST/tests/unit/operations.cpp:1043
...............................................................................

/work/z19/z19/otbz19/QuEST/QuEST/tests/unit/operations.cpp:1083: FAILED:
  REQUIRE_THROWS_WITH( apiFunc(), ContainsSubstring("cannot simultaneously store") && ContainsSubstring("remote amplitudes") )
with expansion:
  "rightapplyCompMatr: Expected a density matrix Qureg but received a
  statevector." ( contains: "cannot simultaneously store" and contains: "remote
  amplitudes" )
with messages:
  minNumCtrls := 0
  numNewTargs := 5
  numQubits - minNumCtrls := 6
  ctrls := {  }
  targs := { 0, 1, 2, 3, 4 }

===============================================================================
test cases:   269 |   268 passed | 1 failed
assertions: 51315 | 51314 passed | 1 failed

I'll try a 31/32 qubit QFT too.

otbrown · 2025-10-24T17:23:09Z

31-qubit QFT works and 32-qubit doesn't work on our AMD GPUs, as expected. I also checked and everything works fine with 32 qubits on 4 GPUs 👍

TysonRayJones · 2025-10-27T01:27:12Z

Eep good catch! Thrust literal mischief had already bitten me!

To be extremely defensive, one could replace each 0LL literal in the patch with a reference to e.g. QINDEX_ZERO defined somewhere like bitwise.h*, like is already done (albeit ) for the 1 literal:

QuEST/quest/src/core/bitwise.hpp

Line 31 in 00ddd93

#define QINDEX_ONE 1ULL

This would protect against future silent Thrust type issues if qindex was ever changed. And perhaps to evidence why that's a good idea, the QINDEX_ONE macro above is actually wrong since it treats qindex as unsigned, aha! You could correct that to 1LL in this patch too if you fancied :^)

Probably also worth then replacing that #define macro(s) with something explicitly typed to more securely avoid these literal-type issues, e.g.

// 0 remains agnostic to qindex type now
constexpr qindex QINDEX_ZERO = 0;

*It feels a little ill-fitting to define QINDEX_ZERO in bitwise.hpp rather than types.h or precision.h but the latter two are user-facing ¯\_(ツ)_/¯

Let me know if you agree but don't have a sec, in which case I can add the changes to this PR!

TysonRayJones · 2025-10-27T01:34:47Z

PS: I'll patch that "rightapplyCompMatr" error. It's because of this line...

QuEST/tests/unit/operations.cpp

Lines 1043 to 1047 in 00ddd93

    
           SECTION( "targeted amps fit in node" ) { 
        
               // simplest to trigger validation using a statevector 
        
               qureg = getCachedStatevecs().begin()->second;

which always uses a statevector to test the "targeted amps fit in node" validation, though the rightapply*() functions cannot accept statevectors, instead only density matrices. Because the "was given a density matrix" validation happens before "targeted amps fit in node" validation, the latter intended triggered error was beaten out by the earlier unintended one.

The operation validation tests previously always uses a statevector to test the "targeted amps fit in node" validation, though the rightapply*() functions cannot accept statevectors, instead only density matrices. Because the "was given a density matrix" validation happens before "targeted amps fit in node" validation, the latter intended triggered error was beaten out by the earlier unintended one. Now, we are careful to pass a density matrix Qureg to the validation of "targeted amps fit in node" when triggered by a function which 'right-applies' (and is ergo only compatible with density matrices)

TysonRayJones · 2025-10-27T01:56:48Z

My changes are passing (caution: testing only the affected functions) with nvcc v12.8 🙏 Happy to squash and co-commit as you fancy!

otbrown · 2025-10-27T10:12:25Z

Hi Tyson,

Thanks for this! I like the principle of basing this on the qindex type, and was pondering similar over the weekend. Only note is that bitwise.hpp was probably using 1ULL intentionally as bitwise shift is undefined on some signed integers until C++20.

Having looked at bitwise.hpp I don't see anything that should be an issue, but we should keep it in mind if we get any weird bugs 😆

I'll retest this branch on our GPU systems and then merge, all being well!

otbrown · 2025-10-27T16:47:04Z

Okay so there is still a rightapplyCompMatr test failure, but it is still unrelated to the GPU implementation -- it can be reproduced with any distributed build. I've spun that out into a new issue (#700) as all the other tests pass on the AMD GPUs, and importantly I can run the 31/32 qubit jobs no problem. As long as @JPRichings tests on the CUDA platforms pass I think we can merge this safely 😄

JPRichings · 2025-10-28T15:03:43Z

Updated branch passed tests on grace-hopper:

QuEST execution environment:
precision: 2
multithreaded: 1
distributed: 0
GPU-accelerated: 1
GPU-sharing ok: 0
cuQuantum: 0
num nodes: 1

Testing configuration:
test all deployments: 0
num qubits in qureg: 6
max num qubit perms: 25
max num superop targs: 4
num mixed-deploy reps: 10

Tested Qureg deployments:
GPU + OMP

Randomness seeded to: 3270576438

All tests passed (51882 assertions in 269 test cases)

otbrown · 2025-10-28T15:10:32Z

Excellent, thanks all. I'll merge.

gpu_thrust.cuh: modified initial thrust counting iterator declaration…

1539f70

…s to use long long to avoid overflow at >30 qubits. Fixes #698.

otbrown requested a review from JPRichings October 24, 2025 15:49

TysonRayJones added 2 commits October 26, 2025 21:45

changed literals to defensive type

0d7488d

otbrown mentioned this pull request Oct 27, 2025

rightapplyCompMatr distributed input validation #700

Closed

otbrown merged commit ae0ed4d into devel Oct 28, 2025
130 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thrust Overflow Fix #699

Thrust Overflow Fix #699

Uh oh!

otbrown commented Oct 24, 2025

Uh oh!

otbrown commented Oct 24, 2025

Uh oh!

otbrown commented Oct 24, 2025

Uh oh!

otbrown commented Oct 24, 2025

Uh oh!

TysonRayJones commented Oct 27, 2025

Uh oh!

TysonRayJones commented Oct 27, 2025

Uh oh!

TysonRayJones commented Oct 27, 2025 •

edited

Loading

Uh oh!

otbrown commented Oct 27, 2025

Uh oh!

otbrown commented Oct 27, 2025

Uh oh!

JPRichings commented Oct 28, 2025

Uh oh!

otbrown commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Thrust Overflow Fix #699

Thrust Overflow Fix #699

Uh oh!

Conversation

otbrown commented Oct 24, 2025

Uh oh!

otbrown commented Oct 24, 2025

Uh oh!

otbrown commented Oct 24, 2025

Uh oh!

otbrown commented Oct 24, 2025

Uh oh!

TysonRayJones commented Oct 27, 2025

Uh oh!

TysonRayJones commented Oct 27, 2025

Uh oh!

TysonRayJones commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

otbrown commented Oct 27, 2025

Uh oh!

otbrown commented Oct 27, 2025

Uh oh!

JPRichings commented Oct 28, 2025

Randomness seeded to: 3270576438

Uh oh!

otbrown commented Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TysonRayJones commented Oct 27, 2025 •

edited

Loading