Page MenuHomePhabricator

[CUDA] Mangle __host__ __device__ functions differently than __host__ or __device__ functions.
AbandonedPublic

Authored by jlebar on Mar 24 2016, 1:06 PM.

Details

Reviewers
rjmccall
rsmith
Summary

This is important because in a later patch, we will allow host
device functions to be overloaded with host / device
functions with the same signature, and we don't want a naming conflict
in this case.

Based on changes from http://reviews.llvm.org/D12453?vs=on&id=33483.

Diff Detail

Event Timeline

jlebar updated this revision to Diff 51588.Mar 24 2016, 1:06 PM
jlebar retitled this revision from to [CUDA] Mangle __host__ __device__ functions differently than __host__ or __device__ functions..
jlebar updated this object.
jlebar added a reviewer: rsmith.
jlebar added subscribers: tra, cfe-commits.
rsmith edited edge metadata.Mar 24 2016, 5:42 PM

This makes the "constexpr implies __host__ __device__" patch look slightly questionable: two translation units defining the same constexpr function will mangle that function differently depending on whether the translation unit is built with CUDA support enabled. That will cause you to get duplicates of static locals and the like (but I suppose you do anyway between the host and the device, so maybe that's not much more broken than it would be regardless).

lib/AST/ItaniumMangle.cpp
488–489

According to http://mentorembedded.github.io/cxx-abi/abi.html#mangling-type, order-insensitive attributes should be sorted into *reverse* alphabetic order (alphabetically-first goes nearest to the base type).

Given that enable_if is order-sensitive but host and device are not, I'm not really sure what the Itanium ABI expects us to do here regarding their relative order. John?

This makes the "constexpr implies __host__ __device__" patch look slightly questionable: two translation units defining the same constexpr function will mangle that function differently depending on whether the translation unit is built with CUDA support enabled. That will cause you to get duplicates of static locals and the like (but I suppose you do anyway between the host and the device, so maybe that's not much more broken than it would be regardless).

Hm. I agree that would be surprising to users.

I suppose we could say that HD can overload D but not H. Then we could apply mangling to all device functions, solving the ambiguity that way. That feels pretty ham-fisted, though. :-/

! In D18458#383276, @jlebar wrote:

! In D18458#383266, @rsmith wrote:

This makes the "constexpr implies __host__ __device__" patch look slightly questionable: two translation units defining the same constexpr function will mangle that function differently depending on whether the translation unit is built with CUDA support enabled. That will cause you to get duplicates of static locals and the like (but I suppose you do anyway between the host and the device, so maybe that's not much more broken than it would be regardless).

The breakage seems to be worse than this. :( Eigen seems to do the following:

foo.h:
  #ifdef __CUDACC__  // If compiling CUDA code
  #define HOST_DEVICE __host__ __device__
  #else
  #define HOST_DEVICE
  #endif

  HOST_DEVICE void foo();

foo.cc:  // Compiled as CUDA
  HOST_DEVICE void foo() { ... }

bar.cc:  // *Not* compiled as CUDA
  #include "foo.h"
  void bar() { foo(); }

With this patch, foo() has a different mangled name in foo.o and bar.cc, and
we're hosed.

If we think this use-case is reasonable (I think it is?) I think this means
that we cannot mangle host device functions differently when doing host
compilation. That seems to restrict us to saying that H and HD functions with
the same signatures cannot overload. This leaves us with two options:

  1. No overloading between HD and H or D functions with the same signature.

    I don't see how to do this while still letting constexpr be HD; the issue is that there are constexpr std math functions which we want to overload for device. We could let constexpr be something other than HD, but if that new thing can overload with D, then I think we still have the same problem.
  2. No overloading between HD and H, but OK to overload HD and D.

    If we did this, we'd still need to give D functions a different mangled name. But we don't have this problem of referencing symbols defined in a file compiled in CUDA mode from a file compiled without CUDA.

    tra pointed out a problem with this, which is that if someone (say, nvidia) gave us a C++ library consisting of precompiled device code plus headers, we wouldn't be able to link with it, because we would use different mangling.

    I also don't like this because it's inconsistent to say HD can overload D but not H. But that's a minor point at this point.

Richard, what do you think? Maybe you have an alternative idea?

It seems like we have the following constraint: on host, no attributes must mangle the same as __host__ __device__ and constexpr (and probably __global__?).

Are there any others? What do we need to do to be ABI-compatible with NVCC? (And is that possible if we allow __host__ to overload __host__ __device__?)

One possibility given only that constraint would be to use a different mangling for H functions and D functions, but mangle HD and unattributed functions the same.

jlebar added a comment.EditedMar 25 2016, 2:38 PM

It seems like we have the following constraint: on host, no attributes must mangle the same as __host__ __device__ and constexpr (and probably __global__?).

Yes to __host__ __device__ and constexpr. Unsure about __global__, but let's also say yes for now, to be conservative.

Are there any others?

An existing assumption is that __host__ is identical to unattributed. Probably makes sense to keep that one around for now if we can (modulo changes to unattributed constexpr), as it makes things simpler.

What do we need to do to be ABI-compatible with NVCC? (And is that possible if we allow __host__ to overload __host__ __device__?)

NVCC doesn't apply any special mangling to D or HD functions, so I think maintaining naming compatibility means, basically, not screwing with mangled names based on attributes.

That suggests, to your second question, that it's not possible to maintain ABI compatibility if we allow D or H to overload HD.

One possibility given only that constraint would be to use a different mangling for H functions and D functions, but mangle HD and unattributed functions the same.

I guess using a different mangling for both H and D functions, rather than just for D functions, is in some sense more consistent. But this would also be very subtle: We'd be saying, non-constexpr H and unattributed are identical, *except* for their mangled names.

It seems like we have the following constraint: on host, no attributes must mangle the same as __host__ __device__ and constexpr (and probably __global__?).

Yes to __host__ __device__ and constexpr. Unsure about __global__, but let's also say yes for now, to be conservative.

Are there any others?

An existing assumption is that __host__ is identical to unattributed. Probably makes sense to keep that one around for now if we can (modulo changes to unattributed constexpr), as it makes things simpler.

OK, that makes things pretty easy (though we don't get the answer we might want): unattributed must be mangled the same as H and HD, so we cannot support overloading H and HD.

What do we need to do to be ABI-compatible with NVCC? (And is that possible if we allow __host__ to overload __host__ __device__?)

NVCC doesn't apply any special mangling to D or HD functions, so I think maintaining naming compatibility means, basically, not screwing with mangled names based on attributes.

That suggests, to your second question, that it's not possible to maintain ABI compatibility if we allow D or H to overload HD.

OK, so the question for you is, how much ABI compatibility with NVCC are you prepared to give up in order to allow HD / D overloading and HD / H overloading?

One possibility given only that constraint would be to use a different mangling for H functions and D functions, but mangle HD and unattributed functions the same.

I guess using a different mangling for both H and D functions, rather than just for D functions, is in some sense more consistent. But this would also be very subtle: We'd be saying, non-constexpr H and unattributed are identical, *except* for their mangled names.

Yes, that seems like a good argument for mangling H the same as unattributed.

OK, so the question for you is, how much ABI compatibility with NVCC are you prepared to give up in order to allow HD / D overloading and HD / H overloading?

At the moment, getting this feature to work seems more important than maintaining ABI compatibility with NVCC. But I cannot confidently assign a probability to how likely it will be at some point in the future that we'll want this ABI compatibility. I really don't know.

So, that's one option. Here's another:

The motivation behind this one is, we have this pie-in-the-sky notion that, morally, device code should be able to call anything it wants. Only if we cannot codegen for device a function transitively invoked by a device function will we error out. constexpr-is-implicitly-HD is a step towards this more ambitious goal.

Setting aside the constexpr bit, it seems to me that when we codegen an unattributed function for device, we should mark the function as having internal linkage (or whatever the thing is called such that it's not visible from other TUs). The reason is, other TUs cannot rely on this function being present in the first object file, because the function is only generated on-demand. If you want to call an HD function defined in another .cu file, then the header in both files needs to explicitly define it as HD.

If that is true -- that unattributed functions which we codegen for device can/should be made internal -- then the mangling of those names has no bearing on ABI compatibility. So we could say, no explicit-HD / D or explicit-HD / H overloading, but *implicit*-HD / D overloading is OK, and we will mangle implicit-HD functions differently to allow this.

Does that sound like it might work?

jlebar abandoned this revision.Mar 28 2016, 6:49 PM

New plan, R2: Let nvcc win.

After much discussion, we're abandoning this because we want to maintain abi compatibility with nvcc. I'm about to upload a revised approach to D18380 that won't require this.