Improve CUDA capability handling #329

danieldk · 2025-12-17T09:40:36Z

We computed a kernel's capabilities by taking the loose intersection of the stated kernel capabilities (or the default) and the capabilities reported to be supported by CMake/Torch. However, this led to issues with e.g. capability 8.9, which is not in these lists (anymore?), but is fine to compile for.

To solve this issue, we will ignore the capabilities reported by CMake/Torch and instead use our own list of capabilities for the loose intersection with the kernel capabilities. This list is the list of all capabilities supported by a CUDA version minus some really old capabilities that are not supported by Torch anyway. This behavior is used by enabling the new BUILD_ALL_SUPPORTED_ARCHS CMake option (which is the default for the Nix and Windows builders).

When BUILD_ALL_SUPPORTED_ARCHS is not set, we will try to detect the capability of the user's CUDA GPU. This speeds up development - since one then only has to compile for a single capability. If this fails for some reason, we'll revert to using all capabilities as if BUILD_ALL_SUPPORTED_ARCHS was set.

We computed a kernel's capabilities by taking the loose intersection of the stated kernel capabilities (or the default) and the capabilities reported to be supported by CMake/Torch. However, this led to issues with e.g. capability 8.9, which is not in these lists (anymore?), but is fine to compile for. To solve this issue, we will ignore the capabilities reported by CMake/Torch and instead use our own list of capabilities. This list is the list of all capabilities supported by a CUDA version minus some really old capabilities that are not supported by Torch anyway. This behavior is used by enabling the new `BUILD_ALL_SUPPORTED_ARCHS` CMake option (which is the default for the Nix and Windows builders). When `BUILD_ALL_SUPPORTED_ARCHS` is not set, we will try to detect the capability of the user's CUDA GPU. This speeds up development - since one then only has to compile for a single capability. If this fails for some reason, we'll revert to using all capabilities as if `BUILD_ALL_SUPPORTED_ARCHS` was set.

MekkCyber

Thanks for fixing!

build2cmake/src/templates/cuda/preamble.cmake

build2cmake/src/templates/utils.cmake

MekkCyber

lgtm Thank you

MekkCyber reviewed Dec 17, 2025

View reviewed changes

build2cmake/src/templates/cuda/preamble.cmake Outdated Show resolved Hide resolved

build2cmake/src/templates/cuda/preamble.cmake Show resolved Hide resolved

build2cmake/src/templates/utils.cmake Show resolved Hide resolved

danieldk added 2 commits December 17, 2025 13:09

Add comments to clarify some of the capability handling

62dbd27

Simplify gencode flags clearing

ecefc61

MekkCyber approved these changes Dec 17, 2025

View reviewed changes

danieldk merged commit 9ea57a8 into main Dec 18, 2025
28 of 29 checks passed

danieldk deleted the redo-capabilities branch December 18, 2025 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve CUDA capability handling #329

Improve CUDA capability handling #329

Uh oh!

danieldk commented Dec 17, 2025 •

edited

Loading

Uh oh!

MekkCyber left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MekkCyber left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve CUDA capability handling #329

Improve CUDA capability handling #329

Uh oh!

Conversation

danieldk commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danieldk commented Dec 17, 2025 •

edited

Loading