This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/AST/
-
AST/
1
ItaniumMangle.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
convergent.cu
-
device-var-init.cu
-
function-overload.cu
-
mangling.cu

Differential D18458

[CUDA] Mangle host device functions differently than host or device functions.
AbandonedPublic

Authored by jlebar on Mar 24 2016, 1:06 PM.

Download Raw Diff

Details

Reviewers

rjmccall
rsmith

Summary

This is important because in a later patch, we will allow host
device functions to be overloaded with host / device
functions with the same signature, and we don't want a naming conflict
in this case.

Based on changes from http://reviews.llvm.org/D12453?vs=on&id=33483.

Diff Detail

Event Timeline

jlebar updated this revision to Diff 51588.Mar 24 2016, 1:06 PM

jlebar retitled this revision from to [CUDA] Mangle __host__ __device__ functions differently than __host__ or __device__ functions..

jlebar updated this object.

jlebar added a reviewer: rsmith.

jlebar added subscribers: tra, cfe-commits.

jlebar mentioned this in D18380: [CUDA] Make unattributed constexpr functions (usually) implicitly host+device..Mar 24 2016, 1:06 PM

rsmith added a reviewer: rjmccall.Mar 24 2016, 5:39 PM

This makes the "constexpr implies __host__ __device__" patch look slightly questionable: two translation units defining the same constexpr function will mangle that function differently depending on whether the translation unit is built with CUDA support enabled. That will cause you to get duplicates of static locals and the like (but I suppose you do anyway between the host and the device, so maybe that's not much more broken than it would be regardless).

lib/AST/ItaniumMangle.cpp
488–489	According to http://mentorembedded.github.io/cxx-abi/abi.html#mangling-type, order-insensitive attributes should be sorted into reverse alphabetic order (alphabetically-first goes nearest to the base type). Given that `enable_if` is order-sensitive but `host` and `device` are not, I'm not really sure what the Itanium ABI expects us to do here regarding their relative order. John?

In D18458#383266, @rsmith wrote:

This makes the "constexpr implies __host__ __device__" patch look slightly questionable: two translation units defining the same constexpr function will mangle that function differently depending on whether the translation unit is built with CUDA support enabled. That will cause you to get duplicates of static locals and the like (but I suppose you do anyway between the host and the device, so maybe that's not much more broken than it would be regardless).

Hm. I agree that would be surprising to users.

I suppose we could say that HD can overload D but not H. Then we could apply mangling to all device functions, solving the ambiguity that way. That feels pretty ham-fisted, though. :-/

! In D18458#383276, @jlebar wrote:

! In D18458#383266, @rsmith wrote:

This makes the "constexpr implies __host__ __device__" patch look slightly questionable: two translation units defining the same constexpr function will mangle that function differently depending on whether the translation unit is built with CUDA support enabled. That will cause you to get duplicates of static locals and the like (but I suppose you do anyway between the host and the device, so maybe that's not much more broken than it would be regardless).

The breakage seems to be worse than this. :( Eigen seems to do the following:

foo.h:
  #ifdef __CUDACC__  // If compiling CUDA code
  #define HOST_DEVICE __host__ __device__
  #else
  #define HOST_DEVICE
  #endif

  HOST_DEVICE void foo();

foo.cc:  // Compiled as CUDA
  HOST_DEVICE void foo() { ... }

bar.cc:  // *Not* compiled as CUDA
  #include "foo.h"
  void bar() { foo(); }

With this patch, foo() has a different mangled name in foo.o and bar.cc, and
we're hosed.

If we think this use-case is reasonable (I think it is?) I think this means
that we cannot mangle host device functions differently when doing host
compilation. That seems to restrict us to saying that H and HD functions with
the same signatures cannot overload. This leaves us with two options:

No overloading between HD and H or D functions with the same signature.

I don't see how to do this while still letting constexpr be HD; the issue is that there are constexpr std math functions which we want to overload for device. We could let constexpr be something other than HD, but if that new thing can overload with D, then I think we still have the same problem.
No overloading between HD and H, but OK to overload HD and D.

If we did this, we'd still need to give D functions a different mangled name. But we don't have this problem of referencing symbols defined in a file compiled in CUDA mode from a file compiled without CUDA.

tra pointed out a problem with this, which is that if someone (say, nvidia) gave us a C++ library consisting of precompiled device code plus headers, we wouldn't be able to link with it, because we would use different mangling.

I also don't like this because it's inconsistent to say HD can overload D but not H. But that's a minor point at this point.

Richard, what do you think? Maybe you have an alternative idea?

It seems like we have the following constraint: on host, no attributes must mangle the same as __host__ __device__ and constexpr (and probably __global__?).

Are there any others? What do we need to do to be ABI-compatible with NVCC? (And is that possible if we allow __host__ to overload __host__ __device__?)

One possibility given only that constraint would be to use a different mangling for H functions and D functions, but mangle HD and unattributed functions the same.

In D18458#383719, @rsmith wrote:

It seems like we have the following constraint: on host, no attributes must mangle the same as __host__ __device__ and constexpr (and probably __global__?).

Yes to __host__ __device__ and constexpr. Unsure about __global__, but let's also say yes for now, to be conservative.

Are there any others?

An existing assumption is that __host__ is identical to unattributed. Probably makes sense to keep that one around for now if we can (modulo changes to unattributed constexpr), as it makes things simpler.

What do we need to do to be ABI-compatible with NVCC? (And is that possible if we allow __host__ to overload __host__ __device__?)

NVCC doesn't apply any special mangling to D or HD functions, so I think maintaining naming compatibility means, basically, not screwing with mangled names based on attributes.

That suggests, to your second question, that it's not possible to maintain ABI compatibility if we allow D or H to overload HD.

One possibility given only that constraint would be to use a different mangling for H functions and D functions, but mangle HD and unattributed functions the same.

I guess using a different mangling for both H and D functions, rather than just for D functions, is in some sense more consistent. But this would also be very subtle: We'd be saying, non-constexpr H and unattributed are identical, *except* for their mangled names.

In D18458#383755, @jlebar wrote:

In D18458#383719, @rsmith wrote:

It seems like we have the following constraint: on host, no attributes must mangle the same as __host__ __device__ and constexpr (and probably __global__?).

Yes to __host__ __device__ and constexpr. Unsure about __global__, but let's also say yes for now, to be conservative.

Are there any others?

An existing assumption is that __host__ is identical to unattributed. Probably makes sense to keep that one around for now if we can (modulo changes to unattributed constexpr), as it makes things simpler.

OK, that makes things pretty easy (though we don't get the answer we might want): unattributed must be mangled the same as H and HD, so we cannot support overloading H and HD.

What do we need to do to be ABI-compatible with NVCC? (And is that possible if we allow __host__ to overload __host__ __device__?)

NVCC doesn't apply any special mangling to D or HD functions, so I think maintaining naming compatibility means, basically, not screwing with mangled names based on attributes.

That suggests, to your second question, that it's not possible to maintain ABI compatibility if we allow D or H to overload HD.

OK, so the question for you is, how much ABI compatibility with NVCC are you prepared to give up in order to allow HD / D overloading and HD / H overloading?

One possibility given only that constraint would be to use a different mangling for H functions and D functions, but mangle HD and unattributed functions the same.

I guess using a different mangling for both H and D functions, rather than just for D functions, is in some sense more consistent. But this would also be very subtle: We'd be saying, non-constexpr H and unattributed are identical, *except* for their mangled names.

Yes, that seems like a good argument for mangling H the same as unattributed.

OK, so the question for you is, how much ABI compatibility with NVCC are you prepared to give up in order to allow HD / D overloading and HD / H overloading?

At the moment, getting this feature to work seems more important than maintaining ABI compatibility with NVCC. But I cannot confidently assign a probability to how likely it will be at some point in the future that we'll want this ABI compatibility. I really don't know.

So, that's one option. Here's another:

The motivation behind this one is, we have this pie-in-the-sky notion that, morally, device code should be able to call anything it wants. Only if we cannot codegen for device a function transitively invoked by a device function will we error out. constexpr-is-implicitly-HD is a step towards this more ambitious goal.

Setting aside the constexpr bit, it seems to me that when we codegen an unattributed function for device, we should mark the function as having internal linkage (or whatever the thing is called such that it's not visible from other TUs). The reason is, other TUs cannot rely on this function being present in the first object file, because the function is only generated on-demand. If you want to call an HD function defined in another .cu file, then the header in both files needs to explicitly define it as HD.

If that is true -- that unattributed functions which we codegen for device can/should be made internal -- then the mangling of those names has no bearing on ABI compatibility. So we could say, no explicit-HD / D or explicit-HD / H overloading, but *implicit*-HD / D overloading is OK, and we will mangle implicit-HD functions differently to allow this.

Does that sound like it might work?

New plan, R2: Let nvcc win.

After much discussion, we're abandoning this because we want to maintain abi compatibility with nvcc. I'm about to upload a revised approach to D18380 that won't require this.

Revision Contents

Path

Size

lib/

AST/

ItaniumMangle.cpp

11 lines

test/

CodeGenCUDA/

10 lines

8 lines

8 lines

20 lines

Diff 51588

lib/AST/ItaniumMangle.cpp

	Show First 20 Lines • Show All 478 Lines • ▼ Show 20 Lines
	void CXXNameMangler::mangleFunctionEncoding(const FunctionDecl *FD) {			void CXXNameMangler::mangleFunctionEncoding(const FunctionDecl *FD) {
	// <encoding> ::= <function name> <bare-function-type>			// <encoding> ::= <function name> <bare-function-type>
	mangleName(FD);			mangleName(FD);

	// Don't mangle in the type if this isn't a decl we should typically mangle.			// Don't mangle in the type if this isn't a decl we should typically mangle.
	if (!Context.shouldMangleDeclName(FD))			if (!Context.shouldMangleDeclName(FD))
	return;			return;

				// CUDA __host__ __device__ functions co-exist with both __host__ and
				// __device__ functions, so they need a different mangled name. We sort
				// "device", "host", and "enable_if" attrs alphabetically.
				rsmithUnsubmitted Not Done Reply Inline Actions According to http://mentorembedded.github.io/cxx-abi/abi.html#mangling-type, order-insensitive attributes should be sorted into reverse alphabetic order (alphabetically-first goes nearest to the base type). Given that `enable_if` is order-sensitive but `host` and `device` are not, I'm not really sure what the Itanium ABI expects us to do here regarding their relative order. John? rsmith: According to http://mentorembedded.github.io/cxx-abi/abi.html#mangling-type, order-insensitive…
				bool IsCudaHostDevice =
				FD->hasAttr<CUDADeviceAttr>() && FD->hasAttr<CUDAHostAttr>();
				if (IsCudaHostDevice)
				Out << "Ua6device";

	if (FD->hasAttr<EnableIfAttr>()) {			if (FD->hasAttr<EnableIfAttr>()) {
	FunctionTypeDepthState Saved = FunctionTypeDepth.push();			FunctionTypeDepthState Saved = FunctionTypeDepth.push();
	Out << "Ua9enable_ifI";			Out << "Ua9enable_ifI";
	// FIXME: specific_attr_iterator iterates in reverse order. Fix that and use			// FIXME: specific_attr_iterator iterates in reverse order. Fix that and use
	// it here.			// it here.
	for (AttrVec::const_reverse_iterator I = FD->getAttrs().rbegin(),			for (AttrVec::const_reverse_iterator I = FD->getAttrs().rbegin(),
	E = FD->getAttrs().rend();			E = FD->getAttrs().rend();
	I != E; ++I) {			I != E; ++I) {
	EnableIfAttr EIA = dyn_cast<EnableIfAttr>(I);			EnableIfAttr EIA = dyn_cast<EnableIfAttr>(I);
	if (!EIA)			if (!EIA)
	continue;			continue;
	Out << 'X';			Out << 'X';
	mangleExpression(EIA->getCond());			mangleExpression(EIA->getCond());
	Out << 'E';			Out << 'E';
	}			}
	Out << 'E';			Out << 'E';
	FunctionTypeDepth.pop(Saved);			FunctionTypeDepth.pop(Saved);
	}			}

				if (IsCudaHostDevice)
				Out << "Ua4host";

	// Whether the mangling of a function type includes the return type depends on			// Whether the mangling of a function type includes the return type depends on
	// the context and the nature of the function. The rules for deciding whether			// the context and the nature of the function. The rules for deciding whether
	// the return type is included are:			// the return type is included are:
	//			//
	// 1. Template functions (names or types) have return types encoded, with			// 1. Template functions (names or types) have return types encoded, with
	// the exceptions listed below.			// the exceptions listed below.
	// 2. Function types not appearing as part of a function name mangling,			// 2. Function types not appearing as part of a function name mangling,
	// e.g. parameters, pointer types, etc., have return type encoded, with the			// e.g. parameters, pointer types, etc., have return type encoded, with the
	▲ Show 20 Lines • Show All 3,836 Lines • Show Last 20 Lines

test/CodeGenCUDA/convergent.cu

	Show All 11 Lines

	// DEVICE: Function Attrs:			// DEVICE: Function Attrs:
	// DEVICE-SAME: convergent			// DEVICE-SAME: convergent
	// DEVICE-NEXT: define void @_Z3foov			// DEVICE-NEXT: define void @_Z3foov
	__device__ void foo() {}			__device__ void foo() {}

	// HOST: Function Attrs:			// HOST: Function Attrs:
	// HOST-NOT: convergent			// HOST-NOT: convergent
	// HOST-NEXT: define void @_Z3barv			// HOST-NEXT: define void @_Z3barUa6deviceUa4hostv
	// DEVICE: Function Attrs:			// DEVICE: Function Attrs:
	// DEVICE-SAME: convergent			// DEVICE-SAME: convergent
	// DEVICE-NEXT: define void @_Z3barv			// DEVICE-NEXT: define void @_Z3barUa6deviceUa4hostv
	__host__ __device__ void baz();			__host__ __device__ void baz();
	__host__ __device__ void bar() {			__host__ __device__ void bar() {
	// DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]			// DEVICE: call void @_Z3bazUa6deviceUa4hostv() [[CALL_ATTR:#[0-9]+]]
	baz();			baz();
	}			}

	// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]			// DEVICE: declare void @_Z3bazUa6deviceUa4hostv() [[BAZ_ATTR:#[0-9]+]]
	// DEVICE: attributes [[BAZ_ATTR]] = {			// DEVICE: attributes [[BAZ_ATTR]] = {
	// DEVICE-SAME: convergent			// DEVICE-SAME: convergent
	// DEVICE-SAME: }			// DEVICE-SAME: }
	// DEVICE: attributes [[CALL_ATTR]] = { convergent }			// DEVICE: attributes [[CALL_ATTR]] = { convergent }

	// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]			// HOST: declare void @_Z3bazUa6deviceUa4hostv() [[BAZ_ATTR:#[0-9]+]]
	// HOST: attributes [[BAZ_ATTR]] = {			// HOST: attributes [[BAZ_ATTR]] = {
	// HOST-NOT: convergent			// HOST-NOT: convergent
	// NOST-SAME: }			// NOST-SAME: }

test/CodeGenCUDA/device-var-init.cu

Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines	__device__ void df() {
static __shared__ UC s_uc;		static __shared__ UC s_uc;
}		}

// CHECK: call void @_ZN2ECC1Ev(%struct.EC* %ec)		// CHECK: call void @_ZN2ECC1Ev(%struct.EC* %ec)
// CHECK: call void @_ZN3ETCC1IJEEEDpT_(%struct.ETC* %etc)		// CHECK: call void @_ZN3ETCC1IJEEEDpT_(%struct.ETC* %etc)
// CHECK: call void @_ZN2UCC1Ev(%struct.UC* %uc)		// CHECK: call void @_ZN2UCC1Ev(%struct.UC* %uc)
// CHECK: call void @_ZN3ECIC1Ev(%struct.ECI* %eci)		// CHECK: call void @_ZN3ECIC1Ev(%struct.ECI* %eci)
// CHECK: call void @_ZN3NECC1Ev(%struct.NEC* %nec)		// CHECK: call void @_ZN3NECC1Ev(%struct.NEC* %nec)
// CHECK: call void @_ZN3NCVC1Ev(%struct.NCV* %ncv)		// CHECK: call void @_ZN3NCVC1EUa6deviceUa4hostv(%struct.NCV* %ncv)
// CHECK: call void @_ZN3NCFC1Ev(%struct.NCF* %ncf)		// CHECK: call void @_ZN3NCFC1EUa6deviceUa4hostv(%struct.NCF* %ncf)
// CHECK: call void @_ZN4NCFSC1Ev(%struct.NCFS* %ncfs)		// CHECK: call void @_ZN4NCFSC1EUa6deviceUa4hostv(%struct.NCFS* %ncfs)
// CHECK: call void @_ZN3UTCC1IJEEEDpT_(%struct.UTC* %utc)		// CHECK: call void @_ZN3UTCC1IJEEEDpT_(%struct.UTC* %utc)
// CHECK: call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)		// CHECK: call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
// CHECK: call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)		// CHECK: call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
// CHECK: call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)		// CHECK: call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
// CHECK: call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)		// CHECK: call void @_ZN5T_V_TC1EUa6deviceUa4hostv(%struct.T_V_T* %t_v_t)
// CHECK: call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)		// CHECK: call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
// CHECK: call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)		// CHECK: call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
// CHECK: call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)		// CHECK: call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)
// CHECK: call void @_ZN2UCC1Ev(%struct.UC* addrspacecast (%struct.UC addrspace(3)* @_ZZ2dfvE4s_uc to %struct.UC*))		// CHECK: call void @_ZN2UCC1Ev(%struct.UC* addrspacecast (%struct.UC addrspace(3)* @_ZZ2dfvE4s_uc to %struct.UC*))
// CHECK: ret void		// CHECK: ret void

// We should not emit global init function.		// We should not emit global init function.
// CHECK-NOT: @_GLOBAL__sub_I		// CHECK-NOT: @_GLOBAL__sub_I

test/CodeGenCUDA/function-overload.cu

	Show All 29 Lines
	__device__			__device__
	#else			#else
	__host__			__host__
	#endif			#endif
	void wrapper() {			void wrapper() {
	s_cd_dh scddh;			s_cd_dh scddh;
	// CHECK-BOTH: call void @_ZN7s_cd_dhC1Ev(			// CHECK-BOTH: call void @_ZN7s_cd_dhC1Ev(
	s_cd_hd scdhd;			s_cd_hd scdhd;
	// CHECK-BOTH: call void @_ZN7s_cd_hdC1Ev			// CHECK-BOTH: call void @_ZN7s_cd_hdC1EUa6deviceUa4hostv

	// CHECK-BOTH: call void @_ZN7s_cd_hdD1Ev(			// CHECK-BOTH: call void @_ZN7s_cd_hdD1EUa6deviceUa4hostv(
	// CHECK-BOTH: call void @_ZN7s_cd_dhD1Ev(			// CHECK-BOTH: call void @_ZN7s_cd_dhD1Ev(
	}			}
	// CHECK-BOTH: ret void			// CHECK-BOTH: ret void

	// Now it's time to check what's been generated for the methods we used.			// Now it's time to check what's been generated for the methods we used.

	// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_dhC2Ev(			// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_dhC2Ev(
	// CHECK-HOST: store i32 11,			// CHECK-HOST: store i32 11,
	// CHECK-DEVICE: store i32 12,			// CHECK-DEVICE: store i32 12,
	// CHECK-BOTH: ret void			// CHECK-BOTH: ret void

	// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_hdC2Ev(			// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_hdC2EUa6deviceUa4hostv(
	// CHECK-BOTH: store i32 31,			// CHECK-BOTH: store i32 31,
	// CHECK-BOTH: ret void			// CHECK-BOTH: ret void

	// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_hdD2Ev(			// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_hdD2EUa6deviceUa4hostv(
	// CHECK-BOTH: store i32 32,			// CHECK-BOTH: store i32 32,
	// CHECK-BOTH: ret void			// CHECK-BOTH: ret void

	// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_dhD2Ev(			// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_dhD2Ev(
	// CHECK-HOST: store i32 21,			// CHECK-HOST: store i32 21,
	// CHECK-DEVICE: store i32 22,			// CHECK-DEVICE: store i32 22,
	// CHECK-BOTH: ret void			// CHECK-BOTH: ret void

test/CodeGenCUDA/mangling.cu

This file was added.

				// REQUIRES: x86-registered-target
				// REQUIRES: nvptx-registered-target

				// RUN: %clang_cc1 -triple nvptx-nvidia-cuda -fcuda-is-device -emit-llvm -o - %s \| FileCheck %s
				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \| FileCheck %s

				#include "Inputs/cuda.h"

				// Check that __host__ __device__ function mangled names explicitly contain
				// "host" and "device" attributes. This is important because HD overloads may
				// coexist with H and D overloads.

				// CHECK: define i32 @_Z11host_deviceUa6deviceUa4hostv()
				__host__ __device__ int host_device() { return 0; }

				// The enable_if attribute should appear in-between the device and host attrs
				// in the mangled name.
				// CHECK: define i32 @_Z8enableifUa6deviceUa9enable_if{{.*}}Ua4hostv
				__attribute__((enable_if(1, "")))
				__host__ __device__ int enableif() { return 0; }