Following the pattern used for 11.4:
https://github.com/llvm/llvm-project/commit/49d982d8cbbbb6e01b6f8e4f173ed6325beab08b
Details
- Reviewers
tra yaxunl Hahnfeld - Commits
- rG7ecec3f0f521: [CUDA] Bump supported CUDA version to 11.5
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm not sure if it's actually correct to advertise full support for CUDA 11.5, but I didn't look into exact changes since 11.4
Good point. I was confused about the fact that 11.4 is both FULLY_SUPPORTED and PARTIALLY_SUPPORTED, so I thought to just follow the existing pattern. I didn't find any extra tests added for the bump 11.2 -> 11.4. Do we have infrastructure in place to test this, or how does it work?
I think we're missing few more changes here:
- The driver needs to enable ptx75 when it constructs cc1 command line in clang/lib/Driver/ToolChains/Cuda.cpp
- We also need to handle PTX75 in clang/include/clang/Basic/BuiltinsNVPTX.def
Technically we never support all the features supported by NVCC, so for clang it essentially means "works well enough". I.e. no known regressions vs previous clang and CUDA versions. Usually it boils down to being able to compile CUDA headers.
"Partially supported" happens when we can compile code compileable with the older CUDA versions, but are missing something critical introduced by the new CUDA version. E.g. a new GPU variant. Or new compiler builtins/functions that a user may expect from the new CUDA version. Or some CUDA headers may use new instructions in inline asm that would not compile with ptxas unless we generate PTX output using the new PTX version.
AFAICT from the CUDA-11.5 release notes, it didn't introduce anything particularly interesting. We've been using clang with CUDA-11.5 for a few weeks w/o any issues, so I think it's fine to stamp it as supported, once the missing bits are in place.
Experimental support for __int128 is new in CUDA 11.5, not sure if Clang enables this for CUDA. The release notes also specify
builtin_assume can now be used to specify address space to allow for efficient loads and stores.
The docs are very scarce on this, I could only find void __builtin_assume(bool exp) which I think is not what they are talking about...
I think we've added support for i128 a while back: https://godbolt.org/z/18bEbhMYb
The release notes also specify
builtin_assume can now be used to specify address space to allow for efficient loads and stores.
The docs are very scarce on this, I could only find void __builtin_assume(bool exp) which I think is not what they are talking about...
AMD folks have D112041 under review which will have builtin_assume help AS inference. In any case, we've already been doing it reasonably well automatically.
- The driver needs to enable ptx75 when it constructs cc1 command line in clang/lib/Driver/ToolChains/Cuda.cpp
@tra Haven't I already done it in line 712? Or where should I enable it?
@Hahnfeld Are you satisfied with the replies to your questions? If so I can go ahead and merge.