-
Notifications
You must be signed in to change notification settings - Fork 22
GEMMTestSuite: use rocrand for input data generation #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
alextmagro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! Do you have any rough estimates of how much faster the entire cpp test suite is?
| template <typename T> | ||
| void generate_data_uniformly(T* data, const size_t size, std::mt19937* gen) { | ||
| #ifdef __HIP_PLATFORM_AMD__ | ||
| // TODO: Introduce a parallel RNG library (Random123, PCG, rocRAND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably remove this entire #ifdef guarded section, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, unless we want to keep it around for future reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove, we can always revert if we need it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole method seems unused on ROCm and can be guarded
| #else | ||
| T *data = t->columnwise_cpu_dptr<T>(); | ||
| generate_data_uniformly(data, size, &(t->gen())); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to use rocRAND for fillCase_special as well? Also, I think there were a few tests that for some reason generate their own data... I might be wrong about that, or it may have been updated.
Thanks! I left some performance numbers here: https://github.com/ROCm/frameworks-internal/issues/14746#issuecomment-3761360289 |
|
Update copyright date of modified files |
| } | ||
|
|
||
| #ifdef __HIP_PLATFORM_AMD__ | ||
| #include <rocrand/rocrand.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though it does not cause errors, better move #include to the top of the file, out of test namespace
| TRANSFORMER_ENGINE_TYPE_SWITCH_ALL(t->dtype(), T, | ||
| { | ||
| #ifdef __HIP_PLATFORM_AMD__ | ||
| fillUniformDevice(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any test that tests this generation? I think using GPU generation here does not produce correct result because of using t->from_cpu() below in this method
| template <typename T> | ||
| void generate_data_uniformly(T* data, const size_t size, std::mt19937* gen) { | ||
| #ifdef __HIP_PLATFORM_AMD__ | ||
| // TODO: Introduce a parallel RNG library (Random123, PCG, rocRAND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole method seems unused on ROCm and can be guarded
| rocrand_generate_uniform(gen, tmp, N); | ||
|
|
||
| // map to [-2.0, 1.0] (like generate_data_uniformly) and cast into tensor dtype | ||
| TRANSFORMER_ENGINE_TYPE_SWITCH_ALL(t->dtype(), T, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T should either be template parameter and no TRANSFORMER_ENGINE_TYPE_SWITCH_ALL here, or the method calling should be moved out of TRANSFORMER_ENGINE_TYPE_SWITCH_ALL in fillUniform
Description
Use
rocrandrandom number generation (instead of CPU/OpenMP).Partly addresses https://github.com/ROCm/frameworks-internal/issues/14746.
Notes:
Type of change
Changes
Please list the changes introduced in this PR:
-Use
rocrandrandom number generation (instead of CPU/OpenMP).Checklist: