[CK_BUILDER] validation #3471

Snektron · 2025-12-19T16:38:47Z

Proposed changes

This pull request builds on #3267 by proving the "validation" infrastructure, the means to compare a set of Outputs.

Design

The design of the validation infrastructure is relatively straight forward:

Each SIGNATURE should come with a validate() implementation, which should be implemented in a similar way that the other functions/types from testing.hpp are implemented.
validate() returns a ValidationReport, which is a structure that keeps all relevant information about comparing the tensors from two Outputs. Note that crucially, validate() should not do any reporting by itself. Rather, glue logic should be implemented by the user to turn ValidationReport into a relevant error message.
You can see this clue code for CK-Builder itself in testing_utils.hpp, its MatchesReference(). This functionality is relatively barebones right now, it will be expanded upon in a different PR to keep the scope of this one down.

Implementation

The comparison is done on the GPU (using an atomic for now), to keep tests relatively quick. Some notable items from this PR:

To help compare the tensors and with writing tests, I've written a generic function tensor_foreach which invokes a callback on every element of a tensor.
For that it was useful that the TensorDescriptor has a rank which is known at compile-time, so I've changed the implementation of TensorDescriptor for that. I felt like it was a better approach than keeping it dynamic, for multiple reasons:
- This is C++ and we should use static typing where possible and useful. This way, we don't have to implement runtime assertions about the tensor rank.
- We know already know the rank of tensors statically, as it can be derived from the SIGNATURE.
- It simpifies the implementation of tensor_foreach and other comparison code.
There are a lot of new tests for validating the validation implementation, validating validation validation tests (Only 3 recursive levels though...). For a few of those functions, I felt like it would be useful to expose them to the user.
Doc comments everywhere.

This is the simplest most implementation for validation.

This implements very barebones Googletest integration by implementing a custom Matcher which calls ckt::validate().

This is a better version of the existing hip_check_error implementations. It has the following improvements: - Throws exceptions. Thos way the underlying unit testing implementation can cleanly catch & report errors, instead of a test just exiting in a hostile way. - Uses std::source_location for source locations, so that we don't have to use a macro for the root "checking" function. - Allows us to define and catch different types of errors without additional required control flow.

This implementation just performs the comparison as if it was a flat piece of memory, meaning including stride data. While this is primarily done because its much simpler, checking strides is also useful so that we can check that there were no out-of-bounds writes. In the future, the implementation of compare_tensors may need to change such that the "actual" and "expected" tensors can have a different stride/layout.

The specialization of DataTypeToCK was missing for DataType::U8. This commit also changes the test such that if there is a variant missing in the future, we will get a warning/compile error.

CKB testing should include user-readable error messages if a tensor fails to validate. For this purpose, ckt::validate now returns a "ValidationReport" a set of "cases" which hold information about how and why a particular check failed. The testing infrastructure (see testing_utils.hpp) can then turn this metadata into readable error messages. This mechanism is to be expanded.

This adds initial unit tests for CKB testing's validation utilities. This mainly adds tests for the current version of ValidationReport, as well as some utilities for performing the tests (which are also tested themselves).

This new version compares the tensors actually, rather than comparing just the backing buffers. This should make it easier to gather coordinates, as well as allow us to compare non-packed kernels.

This nets us a few advantages: - More static validation of the rank (we don't need to explicitly check). - Easier metaprogramming in validation kernels. There are also some disadvantages though: - More metaprogramming in general. - Harder to use everywhere else.

Putting this code into a separate header allows us to reuse it for data generation and more validation stuff.

Now that we have this sort of abstract implementation, we can re-use this implementation.

This cleans up the test a bit.

This frees up the Extent name for use with the TensorDescriptor. Since it will be used to create extents like ckt::Extent{1, 2, 3}, it makes sense to use this sorter name for that. The FilterExtent typename will mostly be used in contexts where it can be inferred, so its not so its better to use a longer name there.

This functionality is useful outside of unit_validation.cpp, so move it to a more common location.

…tor.hpp At this point the tensor descriptor stuff has become larger than the actual tensor buffer stuff, so lets move it into a separate file.

Snektron added 23 commits December 19, 2025 17:24

ck-builder: host-based validation implementation

2206482

This is the simplest most implementation for validation.

ck-builder: initial googletest validation matcher support

fb69edf

This implements very barebones Googletest integration by implementing a custom Matcher which calls ckt::validate().

ck-builder: add missing u8 ck type conv

b394ebf

The specialization of DataTypeToCK was missing for DataType::U8. This commit also changes the test such that if there is a variant missing in the future, we will get a warning/compile error.

ck-builder: validation testing

8fa3add

This adds initial unit tests for CKB testing's validation utilities. This mainly adds tests for the current version of ValidationReport, as well as some utilities for performing the tests (which are also tested themselves).

ck-builder: fix copyrights

302bf04

ck-builder: fix inline diff test namespace

37b62bb

ck-builder: nd-comparison kernel

7999b2a

This new version compares the tensors actually, rather than comparing just the backing buffers. This should make it easier to gather coordinates, as well as allow us to compare non-packed kernels.

ck-builder: abstract tensor_foreach

c25cd73

Putting this code into a separate header allows us to reuse it for data generation and more validation stuff.

ck-builder: reimplement fill_tensor_buffer with tensor_foreach

69dc4b8

Now that we have this sort of abstract implementation, we can re-use this implementation.

ck-builder: validation unit tensor constructor helper

284ed4d

This cleans up the test a bit.

ck-builder: move tensor layout computation into tensor_buffer.hpp

6635c2c

This functionality is useful outside of unit_validation.cpp, so move it to a more common location.

ck-builder: move TensorDescriptor and related types to tensor_descrip…

0356822

…tor.hpp At this point the tensor descriptor stuff has become larger than the actual tensor buffer stuff, so lets move it into a separate file.

ck-builder: tensor descriptor tests

3c77a35

ck-builder: tensor_foreach test

e914694

ck-builder: move fill_tensor(_buffer) to ckb::test

2cbf343

ck-builder: more validation tests

a11f2e5

ck-builder: MatchesReference tests

76f79f6

ck-builder: validation.hpp docs

dce7b8e

Snektron requested a review from illsilin as a code owner December 19, 2025 16:38

Snektron added the organization: streamhpc label Dec 19, 2025

Snektron requested review from aosewski, carlushuang, geyyer, poyenc and qianfengz as code owners December 19, 2025 16:38

Snektron requested review from ThomasNing, afagaj, andriy-ca, asleepzzz, bartekxk, cgmillette, coderfeli, shumway, tenpercent and vidyasagar-amd as code owners December 19, 2025 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_BUILDER] validation #3471

[CK_BUILDER] validation #3471

Uh oh!

Snektron commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CK_BUILDER] validation #3471

Are you sure you want to change the base?

[CK_BUILDER] validation #3471

Uh oh!

Conversation

Snektron commented Dec 19, 2025

Proposed changes

Design

Implementation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants