[CK profiler] Perform verification on GPU when using GPU reference #3482
+1,319
−40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed changes
After the GPU reference kernels for grouped convolutions were added, the biggest bottleneck in tests became verification.
When the reference is computed on GPU, there is no reason to move both reference and real kernel output back to host and compare using a CPU loop. Instead, a simple kernel for that comparison is introduced and used when GPU reference is enabled.
If the GPU verification fails, both reference and real output are copied back to host and CPU verification is run to get the same error statistics as before. This is to keep the implementation as simple as possible while optimizing for the happy path of succeeding tests that is the norm in CI.
Unit tests are added for the verification kernel, they run in ~0.6 s total.
Currently, this affects three tests:
test_grouped_convnd_fwd,test_grouped_convnd_bwd_weight, andtest_grouped_convnd_bwd_data_xdl. The following table shows some test runtimes on MI300X with 48 CPU cores:The test times vary quite a bit from run to run, but are consistently much lower (fwd, bwd_data_xdl) or slightly lower (bwd_weight) than previously.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed files