Note
This repo is a fork of the original QuACK project, with modifications to enhance compatibility and integration with PaddlePaddle. Currently branch is align with 3d0ab3ec2164749caac8f269f771e66a40efd2de
Installation
git clone https://github.com/PFCCLab/quack.git
cd quack
pip install .Usage
import paddle
paddle.enable_compat(scope={"quack"}) # Enable torch proxy before importing quack
import quack
# use quackThe original README.md content is as follows:
Kernels are written in the CuTe-DSL.
pip install quack-kernels- H100 or B200 GPU
- CUDA toolkit 12.9+
- Python 3.12
- 🦆 RMSNorm forward + backward
- 🦆 Softmax forward + backward
- 🦆 Cross entropy forward + backward
- 🦆 Layernorm forward
- 🦆 Hopper gemm + epilogue
- 🦆 Blackwell gemm + epilogue
from quack import rmsnorm, softmax, cross_entropy
[2025-07-10] We have a comprehensive blogpost on how to get memory-bound kernels to speed-of-light, right in the comfort of Python thanks to the CuTe-DSL.
See our blogpost for the details.
To set up the development environment:
pip install -e '.[dev]'
pre-commit install