Improve CUDA capability handling #329
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We computed a kernel's capabilities by taking the loose intersection of the stated kernel capabilities (or the default) and the capabilities reported to be supported by CMake/Torch. However, this led to issues with e.g. capability 8.9, which is not in these lists (anymore?), but is fine to compile for.
To solve this issue, we will ignore the capabilities reported by CMake/Torch and instead use our own list of capabilities for the loose intersection with the kernel capabilities. This list is the list of all capabilities supported by a CUDA version minus some really old capabilities that are not supported by Torch anyway. This behavior is used by enabling the new
BUILD_ALL_SUPPORTED_ARCHSCMake option (which is the default for the Nix and Windows builders).When
BUILD_ALL_SUPPORTED_ARCHSis not set, we will try to detect the capability of the user's CUDA GPU. This speeds up development - since one then only has to compile for a single capability. If this fails for some reason, we'll revert to using all capabilities as ifBUILD_ALL_SUPPORTED_ARCHSwas set.