[CK profiler] Perform verification on GPU when using GPU reference #3482

johannes-graner · 2025-12-23T09:45:15Z

Proposed changes

After the GPU reference kernels for grouped convolutions were added, the biggest bottleneck in tests became verification.
When the reference is computed on GPU, there is no reason to move both reference and real kernel output back to host and compare using a CPU loop. Instead, a simple kernel for that comparison is introduced and used when GPU reference is enabled.

If the GPU verification fails, both reference and real output are copied back to host and CPU verification is run to get the same error statistics as before. This is to keep the implementation as simple as possible while optimizing for the happy path of succeeding tests that is the norm in CI.

Unit tests are added for the verification kernel, they run in ~0.6 s total.

Currently, this affects three tests: test_grouped_convnd_fwd, test_grouped_convnd_bwd_weight, and test_grouped_convnd_bwd_data_xdl. The following table shows some test runtimes on MI300X with 48 CPU cores:

test	Before (s)	After (s)	Speedup
fwd	479	82	5.8x
bwd_data_xdl	267	21	12.7x
bwd_weight	54	45	1.2x

The test times vary quite a bit from run to run, but are consistently much lower (fwd, bwd_data_xdl) or slightly lower (bwd_weight) than previously.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

johannes-graner added 2 commits December 23, 2025 02:45

Simple verification kernel for ckProfiler

39ec430

Verification kernel unit tests

10d68b5

johannes-graner requested review from ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners December 23, 2025 09:45

Explicit synchronization

9caa9d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK profiler] Perform verification on GPU when using GPU reference #3482

[CK profiler] Perform verification on GPU when using GPU reference #3482

johannes-graner commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CK profiler] Perform verification on GPU when using GPU reference #3482

Are you sure you want to change the base?

[CK profiler] Perform verification on GPU when using GPU reference #3482

Conversation

johannes-graner commented Dec 23, 2025

Proposed changes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants