This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] CUDA has no device-side library builtins.
ClosedPublic

Authored by tra on Jan 19 2018, 2:40 PM.

Download Raw Diff

Details

Reviewers

jlebar
sanjoy

Commits

rG5ecdb94487bb: [CUDA] CUDA has no device-side library builtins.
rC323239: [CUDA] CUDA has no device-side library builtins.
rL323239: [CUDA] CUDA has no device-side library builtins.

Summary

We should (almost) never consider a device-side declaration to match a
builtin. If we do, the un-inlined device-side functions provided by
CUDA headers that ship with clang may be ignored. We may end up emitting
as a call to a llvm intrinsic which would typically be lowered as
an external library call. This results in a back-end error because NVPTX
back-end does not support it.

Diff Detail

Repository: rL LLVM

Event Timeline

tra created this revision.Jan 19 2018, 2:40 PM

Herald added a subscriber: sanjoy. · View Herald TranscriptJan 19 2018, 2:40 PM

How does this affect e.g. calling memcpy()? There isn't a standard library implementation of this on nvptx, but we do want calls to memcpy() to be lowered to llvm.memcpy so that they can be optimized.

jlebar added a reviewer: sanjoy.Jan 21 2018, 8:54 PM

In D42319#983377, @jlebar wrote:

How does this affect e.g. calling memcpy()? There isn't a standard library implementation of this on nvptx, but we do want calls to memcpy() to be lowered to llvm.memcpy so that they can be optimized.

We implement memcpy as a call to __builtin_memcpy() which gets code-gen-ed as usual. NVPTX also lowers all memcpy/memset/memmove as loads/stores, so those don't need external library. This behavior is not affected by this patch.

This patch's goal is to prevent clang codegen-ing its idea of the library builtin function while ignoring the implementation we've provided in the headers for device side.

Original issue I had was triggered by code roughly similar to this:

extern "C" __device__ int logf(float a) { return __nv_logf(a); }
__global__ void kernel() { logf(0.0f); }

In the AST, the kernel was calling the logf functions above However, when clang generated code, it considered that logf is a library builtin with known semantics and happily codegen'ed a call to @llvm.log.f32, which NVPTX back-end has no way to lower. The patch adds a safety net in clang so it does not generate code for builtins which we have disabled (or can't handle) in NVPTX.

Got it, thanks for the explanation.

This revision is now accepted and ready to land.Jan 22 2018, 4:12 PM

Closed by commit rL323239: [CUDA] CUDA has no device-side library builtins. (authored by tra). · Explain WhyJan 23 2018, 11:10 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 23 2018, 11:10 AM

Revision Contents

Path

Size

cfe/

trunk/

lib/

AST/

Decl.cpp

7 lines

test/

CodeGenCUDA/

library-builtin.cu

22 lines

Diff 131112

cfe/trunk/lib/AST/Decl.cpp

Show First 20 Lines • Show All 2,895 Lines • ▼ Show 20 Lines	if (getStorageClass() == SC_Static)
return 0;		return 0;

// OpenCL v1.2 s6.9.f - The library functions defined in		// OpenCL v1.2 s6.9.f - The library functions defined in
// the C99 standard headers are not available.		// the C99 standard headers are not available.
if (Context.getLangOpts().OpenCL &&		if (Context.getLangOpts().OpenCL &&
Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID))		Context.BuiltinInfo.isPredefinedLibFunction(BuiltinID))
return 0;		return 0;

		// CUDA does not have device-side standard library. printf and malloc are the
		// only special cases that are supported by device-side runtime.
		if (Context.getLangOpts().CUDA && hasAttr<CUDADeviceAttr>() &&
		!hasAttr<CUDAHostAttr>() &&
		!(BuiltinID == Builtin::BIprintf \|\| BuiltinID == Builtin::BImalloc))
		return 0;

return BuiltinID;		return BuiltinID;
}		}

/// getNumParams - Return the number of parameters this function must have		/// getNumParams - Return the number of parameters this function must have
/// based on its FunctionType. This is the length of the ParamInfo array		/// based on its FunctionType. This is the length of the ParamInfo array
/// after it has been created.		/// after it has been created.
unsigned FunctionDecl::getNumParams() const {		unsigned FunctionDecl::getNumParams() const {
const auto *FPT = getType()->getAs<FunctionProtoType>();		const auto *FPT = getType()->getAs<FunctionProtoType>();
▲ Show 20 Lines • Show All 1,605 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGenCUDA/library-builtin.cu

				// REQUIRES: x86-registered-target
				// REQUIRES: nvptx-registered-target

				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \| \
				// RUN: FileCheck --check-prefixes=HOST,BOTH %s
				// RUN: %clang_cc1 -fcuda-is-device -triple nvptx64-nvidia-cuda \
				// RUN: -emit-llvm -o - %s \| FileCheck %s --check-prefixes=DEVICE,BOTH

				// BOTH-LABEL: define float @logf(float

				// logf() should be calling itself recursively as we don't have any standard
				// library on device side.
				// DEVICE: call float @logf(float
				extern "C" __attribute__((device)) float logf(float __x) { return logf(__x); }

				// NOTE: this case is to illustrate the expected differences in behavior between
				// the host and device. In general we do not mess with host-side standard
				// library.
				//
				// Host is assumed to have standard library, so logf() calls LLVM intrinsic.
				// HOST: call float @llvm.log.f32(float
				extern "C" float logf(float __x) { return logf(__x); }