Skip to content

Conversation

@johannes-graner
Copy link
Contributor

Proposed changes

After the GPU reference kernels for grouped convolutions were added, the biggest bottleneck in tests became verification.
When the reference is computed on GPU, there is no reason to move both reference and real kernel output back to host and compare using a CPU loop. Instead, a simple kernel for that comparison is introduced and used when GPU reference is enabled.

If the GPU verification fails, both reference and real output are copied back to host and CPU verification is run to get the same error statistics as before. This is to keep the implementation as simple as possible while optimizing for the happy path of succeeding tests that is the norm in CI.

Unit tests are added for the verification kernel, they run in ~0.6 s total.

Currently, this affects three tests: test_grouped_convnd_fwd, test_grouped_convnd_bwd_weight, and test_grouped_convnd_bwd_data_xdl. The following table shows some test runtimes on MI300X with 48 CPU cores:

test Before (s) After (s) Speedup
fwd 479 82 5.8x
bwd_data_xdl 267 21 12.7x
bwd_weight 54 45 1.2x

The test times vary quite a bit from run to run, but are consistently much lower (fwd, bwd_data_xdl) or slightly lower (bwd_weight) than previously.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants