Introduce quick-tune lists for Attention #2203

mirza-halilcevic · 2026-01-09T14:39:10Z

Motivation

Introduce attention quick-tune lists for the following architectures:
- gfx950
- gfx942
- gfx90a
- gfx908
- gfx1201
- gfx1101
Support BF16 lists
Improve quickTuningGen.py to handle multiple architectures at once, type aliases for fallback, and have unix-style input handling.

Technical Details

Related to https://github.com/ROCm/rocMLIR-internal/issues/2136

Test Plan

Confirm if the new quick-tune lists perform better than the old ones.

Test Result

On gfx942, the new quick-tune lists performs ~30% better.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

dhernandez0

I expect a huge speed up for quick tuning, as the list was hardcoded and arbitrary for attention. Can you show perf of develop vs this branch for tier1 attention (and g+g) list?

dhernandez0 · 2026-01-12T08:39:11Z

mlir/lib/Dialect/Rock/IR/RockDialect.cpp

-    return std::nullopt;
+  // Parse "vN:" - if not present, assume version 1
+  int version = 1;
+  if (rest.consume_front("v")) {


why is this needed?

I wasn't really sure if v1 had the version string or not, so I played it safe. I suppose this is not that important since I doubt v1 is being used anywhere.

dhernandez0 · 2026-01-12T08:40:17Z

mlir/utils/performance/analysis/quickTuningGen.py

        raise ValueError(f"Unknown operation: {op}")


+def parse_perfconfig(perfconfig):


nit: could we use python bindings to parse this?

mlir/utils/performance/analysis/quickTuningGen.py

dhernandez0 · 2026-01-12T08:42:58Z

mlir/include/mlir/Dialect/Rock/Tuning/QuickTuningPerfconfigs.inc

+};
+// END_ATTENTION_GemmGemm_f32_gfx942_DEFS
+
+// BEGIN_ATTENTION_GemmGemm_f16_gfx942_DEFS


I wasn't aware we had bf16 and f16 perfConfigs, I think they should be the same?

For the other operations we didn't tune for bf16 so far, and currently we will use the f16 list anyway:

rocMLIR/mlir/lib/Dialect/Rock/Tuning/ParamLookupTable.cpp

Line 128 in ca26d61

// We use "f16" for bf16 and f16 generically

But since now we have a bf16 list we should change this behavior, so it only falls back to f16 if no bf16 list is present.

but my point is that I think, we shouldn't make the distinction between f16/b16. I don't see a reason why perfConfig should be different for those two. The reasons why one type has diff perfConfig is that mfma/wmma is different or how efficient it is at loading from memory/LDS etc. That should be the same for f16/bf16, right?

so, the same convs/attn/gemm that you happen to have bf16 in the list could be run with f16 as well or the other way around.

Oh, I see what you mean. That makes sense.

In that case, I can just treat bf16 and f16 as the same during generation and it will consolidate everything into one list. I can also skip tuning the same test vector for bf16 if it has already been tuned for f16, and vice versa.

if you already tuned bf16/f16 for the same problem config, do you see very diff TFlops?

however, if at some point we enable split-k quick tuning list. We would probably want to keep separate lists for bf16/f16 (for some archs). Because some archs have atomic instructions for f16 but not for bf16.

dhernandez0 · 2026-01-12T08:43:44Z

mlir/include/mlir/Dialect/Rock/Tuning/QuickTuningPerfconfigs.inc

+// END_ATTENTION_GemmGemm_f16_gfx1000_DEFS
+
+// BEGIN_ATTENTION_GemmGemm_f32_gfx942_DEFS
+const StringRef PopulateParamsGemmGemm::initParametersF32AttentionGfx942[] = {


how come gfx942 list is small compared to gfx12 for example?

The way the selection works, it means that there's less variation across best performing configs on gfx942. I can try and look into the data some more to see why that is.

Also to note, gfx942 has been tuned on 4 GPUs, while gfx12 has been tuned on 2 GPUs. May or may not be relevant.

…-quick-tune-lists

missing.

mirza-halilcevic added 6 commits January 9, 2026 13:58

Improve quickTuningGen.py.

7f73c30

Add attention quick-tune lists for gfx942.

0f9b096

Add attention quick-tune lists for gfx1201.

e4a6cc3

Add attention quick-tune lists for gfx950.

7b84248

Add attention quick-tune lists for gfx908.

3829482

Add attention quick-tune lists for gfx90a.

a4ca568

mirza-halilcevic requested review from dhernandez0 and umangyadav January 9, 2026 14:40

mirza-halilcevic and others added 2 commits January 11, 2026 22:01

Add attention quick-tune lists for gfx1101.

fb7d7e2

Merge branch 'develop' into attn-quick-tune-lists

9ac3a47

dhernandez0 reviewed Jan 12, 2026

View reviewed changes

mirza-halilcevic and others added 11 commits January 12, 2026 12:01

Merge remote-tracking branch 'origin/develop' into attn-quick-tune-lists

58c79ca

Merge branch 'develop' into attn-quick-tune-lists

de25224

Merge remote-tracking branch 'origin/attn-quick-tune-lists' into attn…

b6fad99

…-quick-tune-lists

Handle input files in a unix-like fashion.

2d70ed2

Fix formatting.

eb3e42e

Merge remote-tracking branch 'origin/develop' into attn-quick-tune-lists

35aa884

Merge branch 'develop' into attn-quick-tune-lists

4fcaacf

Merge remote-tracking branch 'origin/develop' into attn-quick-tune-lists

968a7bb

Merge remote-tracking branch 'origin/attn-quick-tune-lists' into attn…

99fdf80

…-quick-tune-lists

Handle bf16 quick-tune lists and rename f4 to fp4 for consistency.

ff6d5e0

Fix bf16 lookup bug.

2b7eb24

mirza-halilcevic marked this pull request as ready for review January 15, 2026 00:03

mirza-halilcevic requested a review from causten as a code owner January 15, 2026 00:03

mirza-halilcevic and others added 3 commits January 15, 2026 17:34

Merge remote-tracking branch 'origin/develop' into attn-quick-tune-lists

7d8e277

Add support for type aliases and alias bf16 to f16 where bf16 data is

c207813

missing.

Merge branch 'develop' into attn-quick-tune-lists

b705f09

mirza-halilcevic requested a review from dhernandez0 January 16, 2026 15:00

		raise ValueError(f"Unknown operation: {op}")


		def parse_perfconfig(perfconfig):

Introduce quick-tune lists for Attention #2203

Are you sure you want to change the base?

Introduce quick-tune lists for Attention #2203

Conversation

mirza-halilcevic commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

dhernandez0 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhernandez0 Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mirza-halilcevic commented Jan 9, 2026 •

edited

Loading

dhernandez0 left a comment •

edited

Loading

dhernandez0 Jan 12, 2026 •

edited

Loading