Adds C++ stdpar vesions #15
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds several versions of the code using C++ standard parallel algorithms.
In the
cdirectory there are 4 new versions that are the product of Serial and MPI xstd::for_eachandstd::for_each_n. Bothfor_eachandfor_each_nare idiomatic C++ and having both versions allows showing the difference both in how the code is written and in performance. In some cases we have observed small differences in performance between these two algorithms. These are based on their respective C baseline versions.In the
cppdirectory you will find an additional 4 versions, the same combinations. These differ from the above in that they use C++23mdspanin place of raw pointers (and in place of YAKL Arrays). Compared to the above versions, you should only see differences in the function prototypes (passing mdspans rather than raw pointers) and the access to those variables, which no longer requires calculating offsets. Currently the nvc++ compiler hasmdspanin the experimental namespace, but this will likely change in the future.One other change to note is the use of the
idx2dandidx3dconstexpr functions. These allow simple extraction of the 2D and 3D loop indices from the 1D execution space. Whencartesian_productbecomes ubiquitously available those functions will no longer be necessary.