We should (almost) never consider a device-side declaration to match a
builtin. If we do, the un-inlined device-side functions provided by
CUDA headers that ship with clang may be ignored. We may end up emitting
as a call to a llvm intrinsic which would typically be lowered as
an external library call. This results in a back-end error because NVPTX
back-end does not support it.
Details
Diff Detail
- Build Status
Buildable 14043 Build 14043: arc lint + arc unit
Event Timeline
How does this affect e.g. calling memcpy()? There isn't a standard library implementation of this on nvptx, but we do want calls to memcpy() to be lowered to llvm.memcpy so that they can be optimized.
We implement memcpy as a call to __builtin_memcpy() which gets code-gen-ed as usual. NVPTX also lowers all memcpy/memset/memmove as loads/stores, so those don't need external library. This behavior is not affected by this patch.
This patch's goal is to prevent clang codegen-ing its idea of the library builtin function while ignoring the implementation we've provided in the headers for device side.
Original issue I had was triggered by code roughly similar to this:
extern "C" __device__ int logf(float a) { return __nv_logf(a); } __global__ void kernel() { logf(0.0f); }
In the AST, the kernel was calling the logf functions above However, when clang generated code, it considered that logf is a library builtin with known semantics and happily codegen'ed a call to @llvm.log.f32, which NVPTX back-end has no way to lower. The patch adds a safety net in clang so it does not generate code for builtins which we have disabled (or can't handle) in NVPTX.