Skip to content

Conversation

@otbrown
Copy link
Contributor

@otbrown otbrown commented Oct 24, 2025

gpu_thrust.cuh: modified initial thrust counting iterator declarations to use long long to avoid overflow at >30 qubits. Fixes #698.

…s to use long long to avoid overflow at >30 qubits. Fixes #698.
@otbrown otbrown requested a review from JPRichings October 24, 2025 15:49
@otbrown
Copy link
Contributor Author

otbrown commented Oct 24, 2025

@JPRichings Fix branch for your testing pleasure!

@otbrown
Copy link
Contributor Author

otbrown commented Oct 24, 2025

Single AMD GPU on ARCHER2 ✅:

otbz19@ln02:/work/z19/z19/otbz19/QuEST/QuEST> cat slurm-11307502.out

QuEST execution environment:
  precision:       2
  multithreaded:   1
  distributed:     1
  GPU-accelerated: 1
  GPU-sharing ok:  0
  cuQuantum:       0
  num nodes:       1

Testing configuration:
  test all deployments:  0
  num qubits in qureg:   6
  max num qubit perms:   25
  max num superop targs: 4
  num mixed-deploy reps: 10

Tested Qureg deployments:
  GPU + OMP + MPI

Randomness seeded to: 2726962016
===============================================================================
All tests passed (51879 assertions in 269 test cases)

and okay one failure on 4 GPUs, but I believe that's entirely unrelated to this change...

QuEST execution environment:
  precision:       2
  multithreaded:   1
  distributed:     1
  GPU-accelerated: 1
  GPU-sharing ok:  0
  cuQuantum:       0
  num nodes:       4

Testing configuration:
  test all deployments:  0
  num qubits in qureg:   6
  max num qubit perms:   25
  max num superop targs: 4
  num mixed-deploy reps: 10

Tested Qureg deployments:
  GPU + OMP + MPI

Randomness seeded to: 4273463917

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tests is a Catch2 v3.8.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
rightapplyCompMatr
  validation
  targeted amps fit in node
-------------------------------------------------------------------------------
/work/z19/z19/otbz19/QuEST/QuEST/tests/unit/operations.cpp:1043
...............................................................................

/work/z19/z19/otbz19/QuEST/QuEST/tests/unit/operations.cpp:1083: FAILED:
  REQUIRE_THROWS_WITH( apiFunc(), ContainsSubstring("cannot simultaneously store") && ContainsSubstring("remote amplitudes") )
with expansion:
  "rightapplyCompMatr: Expected a density matrix Qureg but received a
  statevector." ( contains: "cannot simultaneously store" and contains: "remote
  amplitudes" )
with messages:
  minNumCtrls := 0
  numNewTargs := 5
  numQubits - minNumCtrls := 6
  ctrls := {  }
  targs := { 0, 1, 2, 3, 4 }

===============================================================================
test cases:   269 |   268 passed | 1 failed
assertions: 51315 | 51314 passed | 1 failed

I'll try a 31/32 qubit QFT too.

@otbrown
Copy link
Contributor Author

otbrown commented Oct 24, 2025

31-qubit QFT works and 32-qubit doesn't work on our AMD GPUs, as expected. I also checked and everything works fine with 32 qubits on 4 GPUs 👍

@TysonRayJones
Copy link
Member

Eep good catch! Thrust literal mischief had already bitten me!

To be extremely defensive, one could replace each 0LL literal in the patch with a reference to e.g. QINDEX_ZERO defined somewhere like bitwise.h*, like is already done (albeit ) for the 1 literal:

#define QINDEX_ONE 1ULL

This would protect against future silent Thrust type issues if qindex was ever changed. And perhaps to evidence why that's a good idea, the QINDEX_ONE macro above is actually wrong since it treats qindex as unsigned, aha! You could correct that to 1LL in this patch too if you fancied :^)

Probably also worth then replacing that #define macro(s) with something explicitly typed to more securely avoid these literal-type issues, e.g.

// 0 remains agnostic to qindex type now
constexpr qindex QINDEX_ZERO = 0;

*It feels a little ill-fitting to define QINDEX_ZERO in bitwise.hpp rather than types.h or precision.h but the latter two are user-facing ¯\_(ツ)_/¯

Let me know if you agree but don't have a sec, in which case I can add the changes to this PR!

@TysonRayJones
Copy link
Member

PS: I'll patch that "rightapplyCompMatr" error. It's because of this line...

SECTION( "targeted amps fit in node" ) {
// simplest to trigger validation using a statevector
qureg = getCachedStatevecs().begin()->second;

which always uses a statevector to test the "targeted amps fit in node" validation, though the rightapply*() functions cannot accept statevectors, instead only density matrices. Because the "was given a density matrix" validation happens before "targeted amps fit in node" validation, the latter intended triggered error was beaten out by the earlier unintended one.

The operation validation tests previously always uses a statevector to test the "targeted amps fit in node" validation, though the rightapply*() functions cannot accept statevectors, instead only density matrices. Because the "was given a density matrix" validation happens before "targeted amps fit in node" validation, the latter intended triggered error was beaten out by the earlier unintended one.

Now, we are careful to pass a density matrix Qureg to the validation of "targeted amps fit in node" when triggered by a function which 'right-applies' (and is ergo only compatible with density matrices)
@TysonRayJones
Copy link
Member

TysonRayJones commented Oct 27, 2025

My changes are passing (caution: testing only the affected functions) with nvcc v12.8 🙏 Happy to squash and co-commit as you fancy!

@otbrown
Copy link
Contributor Author

otbrown commented Oct 27, 2025

Hi Tyson,

Thanks for this! I like the principle of basing this on the qindex type, and was pondering similar over the weekend. Only note is that bitwise.hpp was probably using 1ULL intentionally as bitwise shift is undefined on some signed integers until C++20.

Having looked at bitwise.hpp I don't see anything that should be an issue, but we should keep it in mind if we get any weird bugs 😆

I'll retest this branch on our GPU systems and then merge, all being well!

@otbrown
Copy link
Contributor Author

otbrown commented Oct 27, 2025

Okay so there is still a rightapplyCompMatr test failure, but it is still unrelated to the GPU implementation -- it can be reproduced with any distributed build. I've spun that out into a new issue (#700) as all the other tests pass on the AMD GPUs, and importantly I can run the 31/32 qubit jobs no problem. As long as @JPRichings tests on the CUDA platforms pass I think we can merge this safely 😄

@JPRichings
Copy link
Contributor

Updated branch passed tests on grace-hopper:

QuEST execution environment:
precision: 2
multithreaded: 1
distributed: 0
GPU-accelerated: 1
GPU-sharing ok: 0
cuQuantum: 0
num nodes: 1

Testing configuration:
test all deployments: 0
num qubits in qureg: 6
max num qubit perms: 25
max num superop targs: 4
num mixed-deploy reps: 10

Tested Qureg deployments:
GPU + OMP

Randomness seeded to: 3270576438

All tests passed (51882 assertions in 269 test cases)

@otbrown
Copy link
Contributor Author

otbrown commented Oct 28, 2025

Excellent, thanks all. I'll merge.

@otbrown otbrown merged commit ae0ed4d into devel Oct 28, 2025
130 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants