Skip to content

Releases: huggingface/kernel-builder

v0.8.0

26 Nov 13:59

Choose a tag to compare

New features

Support Metal 4 on macOS

kernel-builder builds Metal kernels using Metal 4 support since this release. The minimum required SDK and macOS versions are 26. For more information on how to set up a development environment, see our Metal docs.

Experimental support for Python dependencies

This version adds support for kernel Python dependencies. So far, we mostly considered kernels to be either pure PyTorch + Triton or compiled CUDA/ROCm/XPU with a small Torch wrapper. This assumption made kernels easy to deploy everywhere, since they do not have external dependencies. However, DSLs for writing kernels, such as the CUTLASS DSL, are becoming increasingly popular.

To accommodate such DSLs without bringing back the issues that dependencies have, we allow a small, curated set of dependencies. Currently the only allowed dependencies are einops and nvidia-cutlass-dsl. Dependencies can be added using the new python-depends option of the general section in build.toml:

[general]
name = "my-kernel"
# ...
python-depends = ["nvidia-cutlass-dsl"]

The dependencies are also validated by kernels when a kernel that uses dependencies is downloaded.

build-and-upload

A new build-and-upload command is added that builds and uploads a kernel in one go. If the kernel is not in kernels-community, you can specify the upload location in general.hub:

[general.hub]
repo-id = "my-org/my-kernel"

Flattened build directories

Thus far, kernels were stored in build/<variant>/<module_name>. This version of kernel-builder changes this to build/<variant>. This solves the issue where are kernel cannot be loaded when module_name does not match the repository name (e.g. after a rename). For the next few releases, kernel-builder will put a compatibility module at build/<variant>/<module_name> to make sure that a kernel can be loaded with an older version of kernels.

What's Changed

  • misc(builder): enable detection of ARM64 arch on Windows and turn on correct VS / CMake environments by @mfuntowicz in #272
  • fix(windows): always define _WIN32 preprocessor macro to prevent PyTorch compiling unsupported code by @mfuntowicz in #275
  • bug(windows): fix invalid generated build name by @mfuntowicz in #274
  • feat(windows): allow detecting Python executable by @mfuntowicz in #276
  • Missing Windows knobs to make it compatible with kernels by @mfuntowicz in #277
  • do not use ONEDNN_XPU_INCLUDE_DIR since it's only needed for torch2_7. by @sywangyi in #273
  • Add build-and-upload command by @danieldk in #278
  • Remove examples/activation by @danieldk in #261
  • fix(build2cmake): ignore untracked files when looking for modified files to suffix with _dirty by @mfuntowicz in #280
  • feat(windows): do not include cxx11 ABI flag when generating names by @mfuntowicz in #281
  • Remove duplicate build variant name code by @danieldk in #285
  • Add support for building CPU-only kernels by @danieldk in #284
  • Remove Python bytecode after checks by @danieldk in #286
  • Use correct Python interpreter for metallib_to_header by @danieldk in #288
  • Fix metal kernels support by @MekkCyber in #287
  • Switch to binary Torch wheels by @danieldk in #289
  • Update to macOS SDK 26 and Metal 4 by @danieldk in #290
  • Add doc on the required environment for Metal by @danieldk in #292
  • Also remove bytecode from universal builds by @danieldk in #294
  • Include CPU kernels in CI builds by @danieldk in #296
  • Flatten build variants to build/<variant> by @danieldk in #293
  • Allow dashes in kernel names by @danieldk in #297
  • extensionName -> moduleName by @danieldk in #298
  • feat: support metal cpp by @drbh in #295
  • Add support for (limited) Python dependencies: nvidia-cutlass-dsl and einops by @danieldk in #302
  • Copy over Torch from hf-nix and fix the AArch64 build by @danieldk in #304
  • Remove dependency on hf-nix by @danieldk in #305
  • fix(windows): force USE_CUDA/USE_ROCM definitions to ensure PyTorch guards are not bypassed by @mfuntowicz in #303
  • Fix typos by @omahs in #309
  • Extend cutlass to bmg by @sywangyi in #307
  • Update tracing-subscriber to solve dependabot issue by @danieldk in #310
  • gen-flake-outputs: add backendBundle output by @danieldk in #312
  • Add a Discord link by @danieldk in #313
  • Set version to 0.8.0-dev0 by @danieldk in #315

New Contributors

Full Changelog: v0.7.0...v0.8.0

v0.7.0

19 Oct 10:49

Choose a tag to compare

New features

PyTorch 2.9.0 support

kernel-builder now builds kernels for PyTorch 2.8.0 and 2.9.0 by default. Support for PyTorch 2.7.0 was removed, conforming to our policy to support the latest two releases.

Windows builder

This release contains experimental support for building Windows kernels. Since Nix is not supported on Windows, the separate PowerShell script scripts/windows/builder.ps1 is provided to build kernels on Windows.

Binary Torch wheels

kernel-builder now supports building against binary Torch wheels. This speeds up roll-out of support for new Torch versions or vendor-specific Torch builds. Build variants that do not have sourceBuild = true set will use a Torch binary wheel. We will soon switch over to using binary wheels as a default.

What's Changed

New Contributors

Full Changelog: v0.6.2...v0.7.0

v0.6.2

24 Sep 15:30

Choose a tag to compare

New Features

Intel XPU support

This release of kernel-builder adds XPU support. Many thanks to @sywangyi for implementing this! You can use the xpu backend type in build.toml for XPU kernels. For example:

[kernel.activation_xpu]
backend = "xpu"
depends = ["torch"]
src = ["relu_xpu/relu.cpp"]

The ReLU example kernel shows how you can make a kernel that support CUDA, ROCm and XPU backends.

kernel-abi-check Python binding

kernel-abi-check now also has a Python binding. This will be used by the upcoming kernels check subcommand.

API changes

Prior to this version, a kernel would have to provide the Git revision to genFlakeOutputs in its flake.nix. For example:

kernel-builder.lib.genFlakeOutputs {
  path = ./.;
  rev = self.shortRev or self.dirtyShortRev or self.lastModifiedDate;
};

Starting with version 0.6.2, kernel-builder determines the revision. Instead, a kernel has to pass through the flake itself (self):

kernel-builder.lib.genFlakeOutputs {
  inherit self;
  path = ./.;
};

The old invocation of genFlakeOutputs still works with a warning, but will be deprecated in the future.

What's Changed

Full Changelog: v0.6.1...v0.6.2

v0.6.1

05 Sep 08:34

Choose a tag to compare

New Features

build-and-copy command

Before this release one had to build a kernel with nix build first and then copy the build variants from result to build. This can now be done in a single step with build-and-copy:

$ nix run .#build-and-copy -L

Automatic virtual environment for nix develop

Running nix develop in a kernel will now automatically create a virtual environment in .venv (if it does not exist) and activate it.

Docs

examples/relu-backprop-compile provides an example on how to make a kernel with backprop and torch.compile support.

What's Changed

  • Disable cachix pushes, sandboxing is not enabled by @danieldk in #198
  • [XPU]Add support for cutlass-sycl by @danieldk in #200
  • Fix handling of the 9.0a and 12.0a capabilities by @danieldk in #202
  • Update hf-nix and remove sanitiseHeaderPathsHook workaround by @danieldk in #201
  • kernel devshell: automatically create venv by @danieldk in #205
  • Move kernel flake outputs generation to a separate file by @danieldk in #207
  • Add a full ReLU example with backprop and torch.compile support by @danieldk in #206
  • Add build-and-copy package by @danieldk in #208

Full Changelog: v0.6.0...v0.6.1

v0.6.0

06 Aug 20:07

Choose a tag to compare

New features

PyTorch 2.8 support

kernel-builder now supports PyTorch 2.8 in the following (upstream) build configurations:

  • CUDA 12.6, 12.8, and 12.9 on aarch64-linux and x86_64-linux.
  • ROCm 6.3 and 6.4 on x86_64-linux.
  • Metal on aarch64-darwin (macOS).

Following the kernel-builder support policy, support for Torch 2.6 is removed.

Additional compliance testing

Besides the ABI checks (manylinux and abi3 compliance), kernel-builder now also checks if the kernel can be loaded by the kernels package. This ensures, among other things, that imports are relative. This check can be an issue with some Triton kernels that use the autotune decorator, since the build sandbox does not have access to GPUs. In this case the check can be disabled by passing doGetKernelCheck = false

Support for generating PTX

When defining CUDA capabilities, it is now possible to add the +PTX suffix to generate PTX code. For example:

cuda-capabilities = [ "7.0", "8.0+PTX"]

When no CUDA capabilities are specified for a kernel, PTX is generated for capability 9.0 (and 12.0 on CUDA >= 12.8).

What's Changed

New Contributors

Full Changelog: v0.5.2...v0.6.0

v0.5.2

04 Jul 12:11
99306a9

Choose a tag to compare

This release contains changes for handling more complex kernels:

  • Support minimum CUDA versions for 'subkernels' (kernel.<name>) for when a kernel has specializations for e.g. Blackwell.
  • Support passing custom flags to the C++ compiler.
  • Add support for building kernels for a subset of CUDA versions. Use leads to non-compliant kernels, so should only be used as a last resort.

What's Changed

  • Add cuda-maxver option to the general section by @danieldk in #170
  • build2cmake: cxx-flags option for C++ compile flags for kernels by @danieldk in #171
  • hotfix: cuda-maxver by @danieldk in #172
  • hotfix: cuda-maxver nit in Nix by @danieldk in #173
  • Add support for building for a custom set of Torch versions by @danieldk in #174
  • Add cuda-minver option for CUDA kernels by @danieldk in #176
  • Set build2cmake and kernel-abi-check to 0.5.2 for release prep by @danieldk in #177

Full Changelog: v0.5.1...v0.5.2

v0.5.1

25 Jun 08:07
965a356

Choose a tag to compare

This release contains various bugfixes.

What's Changed

Full Changelog: v0.5.0...v0.5.1

v0.5.0

23 Jun 08:16
6704ae8

Choose a tag to compare

This release adds support for building Metal kernels for Apple Silicon Macs. To accommodate non-CUDA/ROCm kernels, the build.toml format has been updated. You can update an existing build.toml using build2cmake:

$ build2cmake update-build /path/to/build.toml

You can also directly run this command with Nix:

$ nix run github:huggingface/kernel-builder/v0.5.0#update-build /path/to/build.toml

What's Changed

  • Update the build.toml format in preparation for Metal by @danieldk in #144
  • Provide better errors when deserializing build.toml by @danieldk in #145
  • feat: built root and user docker image variants by @drbh in #139
  • Add basic support for building Metal 🤘 kernels by @danieldk in #146
  • Add support for building macOS Metal kernels by @danieldk in #147
  • fix: adjust the update build command in the container by @drbh in #149
  • Enable Metal as part of bundle builds by @danieldk in #151
  • Propagate ABI check errors and fix on macOS by @danieldk in #154
  • feat: allow precompilation for metal kernels by @EricLBuehler in #152
  • build2cmake: add clean subcommand by @EricLBuehler in #156
  • kernel-abi-check: check macOS minimum version by @danieldk in #157
  • Add cutlass 3.9 as a dependency by @danieldk in #159
  • build2cmake: cuda_flags option for compile flags for CUDA kernels by @danieldk in #160
  • Update build.toml docs by @danieldk in #161
  • Accept checkInputs and nativeCheckInputs in genFlakeOutputs by @danieldk in #155
  • Small Nix documentation improvements by @danieldk in #162
  • Hotfix: append CUDA flags by @danieldk in #163
  • Set build2cmake and kernel-abi-check to 0.5.0 for release prep by @danieldk in #164

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.4.0

28 May 08:13
fd0376f

Choose a tag to compare

What's Changed

  • Add a CUDA 12.9 build variant for Torch 2.7 by @danieldk in #136
  • feat: update docker for remote build and push by @drbh in #115
  • build2cmake: attempt to get shorthash-based ops id using git by @danieldk in #137
  • Standardizing torch_binding.cpp and torch_binding.h in the doc by @MekkCyber in #138
  • Bump nixpkgs to version with cuDNN sbsa by @danieldk in #140
  • Switch to the to-be hf-nix repo by @danieldk in #141

Full Changelog: v0.3.0...v0.4.0