- Add the missing NVVM predicate builtins on address space checking
- Redefine them as pure functions so that they could be used in __builtin_assume.
Details
- Reviewers
tra - Commits
- rG6fe902daf931: [cuda] Add address space predicate funuctions.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
With this patch, the following code could be compiled into the same PTX as NVCC. Check https://godbolt.org/z/q6EYE1q1o for the difference between NVCC and the current Clang output.
LGTM in general.
clang/include/clang/Basic/BuiltinsNVPTX.def | ||
---|---|---|
691–694 | CUDA appears to be using __nv_isGlobal_impl for the AS predicates. Perhaps we want to add those, too, forwarding them to the __nvvm_... implementations above. I've already added a few other AS-related __nv_* builtins in lib/Headers/__clang_cuda_intrinsics.h. |
clang/include/clang/Basic/BuiltinsNVPTX.def | ||
---|---|---|
691–694 | __nv_isGlobal_impl is exposed as an official interface. In fact, in CUDA SDK 10.0 or earlier, __isGlobal is directly implemented as inline asm. If possible, we should avoid defining unofficial or undocumented interfaces. __nv_isGlobal_impl was introduced from CUDA SDK 10.1 but there is no documentation on it. // This function returns 1 if generic address "ptr" is in global memory space. // It returns 0 if "ptr" is in shared, local or constant memory space. __SM_20_INTRINSICS_DECL__ unsigned int __isGlobal(const void *ptr) { unsigned int ret; asm volatile ("{ \n\t" " .reg .pred p; \n\t" " isspacep.global p, %1; \n\t" " selp.u32 %0, 1, 0, p; \n\t" #if (defined(_MSC_VER) && defined(_WIN64)) || defined(__LP64__) || defined(__CUDACC_RTC__) "} \n\t" : "=r"(ret) : "l"(ptr)); #else "} \n\t" : "=r"(ret) : "r"(ptr)); #endif return ret; } |
clang/include/clang/Basic/BuiltinsNVPTX.def | ||
---|---|---|
691–694 | typo, __nv_isGlobal_impl is *not* exposed as an official interface.
|
clang/include/clang/Basic/BuiltinsNVPTX.def | ||
---|---|---|
691–694 |
In general, I agree. I just wish NVIDIA would stop using undocumented APIs in the public headers they ship. By necessity, clang either has to rely on preprocessor hacks to edit out uncompileable code, or guess what the undocumented APIs do and implement them. In this case I do not think __nv_is*_impl are used anywhere other than in the functions you have renamed, so we're fine without them. They would be needed if the patch didn't have to rename __isGlobal and friends. |
clang/include/clang/Basic/BuiltinsNVPTX.def | ||
---|---|---|
691–694 |
another motivation we have to redefine them is that we need to add const (or pure) attributes for them. |
CUDA appears to be using __nv_isGlobal_impl for the AS predicates. Perhaps we want to add those, too, forwarding them to the __nvvm_... implementations above. I've already added a few other AS-related __nv_* builtins in lib/Headers/__clang_cuda_intrinsics.h.