Skip to content

Conversation

@jefflarkin
Copy link
Collaborator

This PR adds several versions of the code using C++ standard parallel algorithms.

In the c directory there are 4 new versions that are the product of Serial and MPI x std::for_each and std::for_each_n. Both for_each and for_each_n are idiomatic C++ and having both versions allows showing the difference both in how the code is written and in performance. In some cases we have observed small differences in performance between these two algorithms. These are based on their respective C baseline versions.

In the cpp directory you will find an additional 4 versions, the same combinations. These differ from the above in that they use C++23 mdspan in place of raw pointers (and in place of YAKL Arrays). Compared to the above versions, you should only see differences in the function prototypes (passing mdspans rather than raw pointers) and the access to those variables, which no longer requires calculating offsets. Currently the nvc++ compiler has mdspan in the experimental namespace, but this will likely change in the future.

One other change to note is the use of the idx2d and idx3d constexpr functions. These allow simple extraction of the 2D and 3D loop indices from the 1D execution space. When cartesian_product becomes ubiquitously available those functions will no longer be necessary.

@jefflarkin
Copy link
Collaborator Author

Hi @mrnorman , any thoughts on merging this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant