This is an archive of the discontinued LLVM Phabricator instance.

[cuda] Add address space predicate funuctions.
ClosedPublic

Authored by hliao on Oct 18 2021, 9:17 PM.

Details

Summary
  • Add the missing NVVM predicate builtins on address space checking
  • Redefine them as pure functions so that they could be used in __builtin_assume.

Diff Detail

Event Timeline

hliao created this revision.Oct 18 2021, 9:17 PM
hliao requested review of this revision.Oct 18 2021, 9:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2021, 9:17 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
hliao added a comment.Oct 18 2021, 9:24 PM

With this patch, the following code could be compiled into the same PTX as NVCC. Check https://godbolt.org/z/q6EYE1q1o for the difference between NVCC and the current Clang output.

tra accepted this revision.Oct 19 2021, 11:27 AM

LGTM in general.

clang/include/clang/Basic/BuiltinsNVPTX.def
691–694

CUDA appears to be using __nv_isGlobal_impl for the AS predicates. Perhaps we want to add those, too, forwarding them to the __nvvm_... implementations above. I've already added a few other AS-related __nv_* builtins in lib/Headers/__clang_cuda_intrinsics.h.

This revision is now accepted and ready to land.Oct 19 2021, 11:27 AM
This revision was automatically updated to reflect the committed changes.
hliao added inline comments.Oct 19 2021, 1:22 PM
clang/include/clang/Basic/BuiltinsNVPTX.def
691–694

__nv_isGlobal_impl is exposed as an official interface. In fact, in CUDA SDK 10.0 or earlier, __isGlobal is directly implemented as inline asm. If possible, we should avoid defining unofficial or undocumented interfaces. __nv_isGlobal_impl was introduced from CUDA SDK 10.1 but there is no documentation on it.

// This function returns 1 if generic address "ptr" is in global memory space.
// It returns 0 if "ptr" is in shared, local or constant memory space.
__SM_20_INTRINSICS_DECL__ unsigned int __isGlobal(const void *ptr)
{
  unsigned int ret;
  asm volatile ("{ \n\t"
                "    .reg .pred p; \n\t"
                "    isspacep.global p, %1; \n\t"
                "    selp.u32 %0, 1, 0, p;  \n\t"
#if (defined(_MSC_VER) && defined(_WIN64)) || defined(__LP64__) || defined(__CUDACC_RTC__)
                "} \n\t" : "=r"(ret) : "l"(ptr));
#else
                "} \n\t" : "=r"(ret) : "r"(ptr));
#endif

  return ret;
}
hliao added inline comments.Oct 19 2021, 1:23 PM
clang/include/clang/Basic/BuiltinsNVPTX.def
691–694

typo, __nv_isGlobal_impl is *not* exposed as an official interface.

__nv_isGlobal_impl is exposed as an official interface. In fact, in CUDA SDK 10.0 or earlier, __isGlobal is directly implemented as inline asm. If possible, we should avoid defining unofficial or undocumented interfaces. __nv_isGlobal_impl was introduced from CUDA SDK 10.1 but there is no documentation on it.

// This function returns 1 if generic address "ptr" is in global memory space.
// It returns 0 if "ptr" is in shared, local or constant memory space.
__SM_20_INTRINSICS_DECL__ unsigned int __isGlobal(const void *ptr)
{
  unsigned int ret;
  asm volatile ("{ \n\t"
                "    .reg .pred p; \n\t"
                "    isspacep.global p, %1; \n\t"
                "    selp.u32 %0, 1, 0, p;  \n\t"
#if (defined(_MSC_VER) && defined(_WIN64)) || defined(__LP64__) || defined(__CUDACC_RTC__)
                "} \n\t" : "=r"(ret) : "l"(ptr));
#else
                "} \n\t" : "=r"(ret) : "r"(ptr));
#endif

  return ret;
}
tra added inline comments.Oct 19 2021, 1:35 PM
clang/include/clang/Basic/BuiltinsNVPTX.def
691–694

__nv_isGlobal_impl is *not* exposed as an official interface.
If possible, we should avoid defining unofficial or undocumented interfaces. __nv_isGlobal_impl was introduced from CUDA SDK 10.1 but there is no documentation on it.

In general, I agree. I just wish NVIDIA would stop using undocumented APIs in the public headers they ship. By necessity, clang either has to rely on preprocessor hacks to edit out uncompileable code, or guess what the undocumented APIs do and implement them.

In this case I do not think __nv_is*_impl are used anywhere other than in the functions you have renamed, so we're fine without them. They would be needed if the patch didn't have to rename __isGlobal and friends.

hliao added inline comments.Oct 19 2021, 3:30 PM
clang/include/clang/Basic/BuiltinsNVPTX.def
691–694

__nv_isGlobal_impl is *not* exposed as an official interface.
If possible, we should avoid defining unofficial or undocumented interfaces. __nv_isGlobal_impl was introduced from CUDA SDK 10.1 but there is no documentation on it.

In general, I agree. I just wish NVIDIA would stop using undocumented APIs in the public headers they ship. By necessity, clang either has to rely on preprocessor hacks to edit out uncompileable code, or guess what the undocumented APIs do and implement them.

In this case I do not think __nv_is*_impl are used anywhere other than in the functions you have renamed, so we're fine without them. They would be needed if the patch didn't have to rename __isGlobal and friends.

another motivation we have to redefine them is that we need to add const (or pure) attributes for them.