Use `tosa.matmul_t_block_scaled` for the scaled GEMMs #2164

umangyadav · 2025-12-09T21:20:07Z

Motivation

When scaled GEMM was implemented in rocMLIR, TOSA did not have matmul operation which would take scale arguments. Therefore as a workaround rocMLIR implemented "decompose" transform for the migraphx.quant_dot operation which would decompose scaled gemm into

migraphx.convert %scale : f8E8M0FNU to f32
migraphx.convert %a : f4E2M1FN to f32
migraphx.mul %a, %scale : f32
migraphx.dot %aScaled, %bScaled

And then during TosaToRock it would try to patten match "convert" + "mul" with data type checking to see if it was doing scaled GEMM or not.

TOSA now has tosa.matmul_t_block_scaled which can take two additional arguments for scale_a and scale_b. Therefore this PR makes changes to make use of this new TOSA operator and adds lowering for the same in MIGraphXToTosa and then TosaToRock.

Fixes https://github.com/ROCm/rocMLIR-internal/issues/2139

Technical Details

Host side

TOSA has not implemented TosaToLinalg conversion for the tosa.matmul_t_block_scaled and therefore we still require decomposition for the host functions.

As a result when writing E2E test it should first do --clone-harness and then run host pipeline and kernel pipeline to apply different migraphx pipelines for both host and kernel functions. This PR has changes for the same.

Broadcasting of scales

migraphx.quant_dot operator expects the scales and A/B arguments to be of same shape. It achieves this by broadcasting scales. But tosa.matmul_t_block_scaled doestn't have this restriction. Therefore when doing MIGraphXToTosa, it would need to do unbroadcast scales and later during TosaToRock it adds back broadcasts as rock.gemm also requires scales and A/B to be of same shape.

Test Plan

CI passes

Copilot

Pull request overview

This PR replaces the workaround for scaled GEMM operations by using the new tosa.matmul_t_block_scaled operator that directly supports scale arguments, eliminating the need to decompose scaled GEMMs into separate convert, multiply, and matmul operations.

Key changes:

Added support for tosa.matmul_t_block_scaled in MIGraphXToTosa and TosaToRock conversions
Modified migraphx.quant_dot decomposition to only apply for non-kernel (host) functions
Updated test pipelines to apply different transformations for host vs kernel functions

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
mlir/test/fusion/pr-e2e/mixr-gemm-fp4/mixr-dot-fp4.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/pr-e2e/mixr-gemm-fp4/migraphx-quant-dot-fp4.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/e2e/mixr-gemm-fp4/mixr_trp_rsp_rsp_quant_dot_unsqueeze_broadcast_add_add.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/e2e/mixr-gemm-fp4/mixr_trp_rsp_rsp_quant_dot_rsp_broadcast_add_relu.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/e2e/mixr-gemm-fp4/mixr_rsp_trp_squeeze_rsp_trp_squeeze_rsp_trp_rsp_rsp_trp_rsp_quant_dot_rsp.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/e2e/mixr-gemm-fp4/mixr_rsp_trp_squeeze_rsp_rsp_trp_rsp_quant_dot.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/e2e/mixr-gemm-fp4/mixr_rsp_rsp_quant_dot_rsp_broadcast_mul_add.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/e2e/mixr-gemm-fp4/mixr_rsp_rsp_quant_dot_rsp_broadcast_add.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/e2e/mixr-gemm-fp4/mixr_rsp_rsp_quant_dot_add_add.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/fusion/e2e/mixr-gemm-fp4/mixr_rsp_rsp_quant_dot.mlir	Updated RUN command to use separate host/kernel pipelines
mlir/test/Dialect/MIGraphX/quant-dot-decompose.mlir	Added test verifying kernel functions don't get decomposed
mlir/test/Conversion/TosaToRock/tosa-to-rock.mlir	Removed old pattern-matching tests for scaled GEMM
mlir/test/Conversion/TosaToRock/tosa-to-rock-matmul-t-block-scaled.mlir	Added comprehensive tests for matmul_t_block_scaled lowering
mlir/test/Conversion/MIGraphXToTosa/quant-dot-scaled-to-matmul-t-block-scaled.mlir	Added tests for quant_dot to matmul_t_block_scaled conversion
mlir/lib/Dialect/MIGraphX/Transforms/MIGraphXTransform.cpp	Restricted decomposition to non-kernel functions
mlir/lib/Conversion/TosaToRock/TosaToRockPass.cpp	Added matmul_t_block_scaled as illegal op
mlir/lib/Conversion/TosaToRock/TosaToRock.cpp	Removed scale extraction logic from MatMulConverter; added MatmulTBlockScaledConverter with scale broadcasting
mlir/lib/Conversion/MIGraphXToTosa/MIGraphXToTosa.cpp	Added support for converting quant_dot with scales to matmul_t_block_scaled

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mlir/lib/Conversion/TosaToRock/TosaToRock.cpp