This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix hostness of defaulted constructor
ClosedPublic

Authored by yaxunl on Sep 12 2019, 11:23 AM.

Details

Summary

Clang does not respect the explicit device host attributes of defaulted special members.
Also clang does not respect the hostness of special members determined by their
first declarations.
Clang also adds duplicate implicit device or host attributes in certain cases.
This patch fixes that.

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl created this revision.Sep 12 2019, 11:23 AM
tra added a reviewer: jlebar.Sep 12 2019, 12:01 PM
tra added a comment.Sep 12 2019, 12:16 PM

Example of the actual error produced by clang: https://godbolt.org/z/Dl1FfC

Ugh. Another corner case of the way we're dealing with implicit __host__ __device__ functions. :-(
LGTM for postponing the error until actual use.

test/SemaCUDA/default-ctor.cu
1 ↗(On Diff #219946)

It would be good to add host-side compilation, too.

4 ↗(On Diff #219946)

Use #include "Inputs/cuda.h" instead.

Sorry I found some issue with the fix.

The following code:

struct A {  virtual ~A(); };
struct B: public A { B(); };
B::B() = default;

will cause B::B() with external linkage emitted in IR, since B::B() = default; is a function definition.

This somehow defeats the intention not to emit B::B() in device code if its base class has virtual member function.

On the other hand, if we remove B::B() = default; from the above code, B::B() will become a __host__ function.

I think host/device property of B::B() should be determined at declaration and should not be changed by its definition.

In the above example, it should always be a __host__ function and should not be emitted in device code.

yaxunl updated this revision to Diff 220838.Sep 19 2019, 4:53 AM
yaxunl retitled this revision from [CUDA][HIP] Diagnose defaulted constructor only if it is used to [CUDA][HIP] Fix hostness of defaulted constructor.
yaxunl edited the summary of this revision. (Show Details)

Posts a new fix for this issue, where the defaulted constructor definition follows the hostness of the original declaration in the class. Also fix the issue when defaulted ctor has explicit host device attribs.

tra added inline comments.Sep 19 2019, 9:09 AM
lib/Sema/SemaCUDA.cpp
273–274 ↗(On Diff #220838)

A comment here would be helpful.

I think the intent here is to look for implicit special members with *explicitly* set attributes.
We have number of cases where we set H/D attributes implicitly. I'm not sure whether we ever see any of them here, but if we do, it will sneak through this check. I think a check for whether the attribute is explicit would be prudent.

yaxunl marked an inline comment as done.Sep 19 2019, 9:27 AM
yaxunl added inline comments.
lib/Sema/SemaCUDA.cpp
273–274 ↗(On Diff #220838)

will add the comment.

I intentionally omitted check for explicit attr because I noticed the same special member is inferred twice. Each time it is added the same attrs, which cause them to have two __host__ and two __device__ attrs. By checking if attrs exist (not just explicit attrs) we can avoid duplicate attrs. I tested this with real machine learning frameworks and did not see issues.

tra accepted this revision.Sep 19 2019, 10:25 AM
tra added inline comments.
lib/Sema/SemaCUDA.cpp
273–274 ↗(On Diff #220838)

OK. So, if we have explicit attributes, then there's no need to infer them. If the attributes are implicit, then we've already guessed them, so there's no point doing it again. The check for simple attribute presence covers both cases. I'll buy that.

This leaves a hypothetical gap in case we have implicitly set attributes set somewhere else that would disagree with the attributes that would be set by this function. It would be great to have an assertion somewhere to verify that it does not happen. Alas, this function modifies the MemberDecl, so I don't see an easy way to do it.

In general, the multiple application of the attributes seems to be a separate issue that should be fixed. I think we run into it in other places, too.

This revision is now accepted and ready to land.Sep 19 2019, 10:25 AM
yaxunl updated this revision to Diff 220892.Sep 19 2019, 11:36 AM
yaxunl edited the summary of this revision. (Show Details)

Skip inferring for explicit host/device attrs only. Adds checks for implicit device and host attrs and avoid duplicates.

tra added inline comments.Sep 19 2019, 12:13 PM
lib/Sema/SemaCUDA.cpp
387 ↗(On Diff #220892)

addHDAttrIfNeeded ? We may not even need it. See below.

388–409 ↗(On Diff #220892)

Perhaps we can rearrange things a bit to make it easier to follow.

bool needsH = true, needsD=true;
if (has Value) {
   if (CFT_Device)
      needsH = false;
   if (CFT_Host)
      needsD = false;
}

// We either setting attributes first time, or the inferred ones must match previously set ones.
assert(!(hasAttr(D) || hasAttr(H)) 
    || (needsD == hasAttr(D) && (needsH == hasAttr(H)))
if (needsD && !hasAttr(D))
   addAttr(D);
if (needsH && ! hasAttr(H))
   addAttr(H);
yaxunl updated this revision to Diff 220907.Sep 19 2019, 1:54 PM

simplify logic by Artem's comments.

tra added inline comments.Sep 19 2019, 2:05 PM
lib/Sema/SemaCUDA.cpp
386–387 ↗(On Diff #220907)

Nice.
Now these can be moved above HasExpAttr and then used in its initializer to make it shorter.

yaxunl updated this revision to Diff 220912.Sep 19 2019, 2:15 PM

revise by Artem's comments.

tra accepted this revision.Sep 19 2019, 2:33 PM

LGTM. Thank you!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2019, 7:29 AM
In D67509#1677528, @tra wrote:

I am taking a look.

In D67509#1677528, @tra wrote:

I am taking a look.

I can reproduce similar asserts locally. It seems the assertion I added assert(!(HasD || HasH) || (NeedsD == HasD && NeedsH == HasH)); is not always true. Since we do not have this assert before, I removed it. I will study what causes it to assert and post it later.

In D67509#1677528, @tra wrote:

I am taking a look.

I can reproduce similar asserts locally. It seems the assertion I added assert(!(HasD || HasH) || (NeedsD == HasD && NeedsH == HasH)); is not always true. Since we do not have this assert before, I removed it. I will study what causes it to assert and post it later.

A reduced test case is

struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;

B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.

tra added a comment.Sep 23 2019, 12:04 PM

A reduced test case is

struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;

B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.

On one hand inferCUDATargetForImplicitSpecialMember is correct here.
On the other hand, constexpr being implicitly __host__ __device__ also works OK, with the error popping up only if we need to instantiate the B<A> on device side.

So, what we want is:

__host__ void f() {B<A> x;} // This should compile
__device__ void f() {B<A> x;} // This should produce an error.

struct foo {
  __host__ foo() { B<A> x; } // should compile
  __device__ foo() { B<A> x; } // ???
};

We could remove the implicit 'device' attribute from the function. This should make __device__ foo() fail to compile regardless of whether struct foo is instantiated on device.
Or we can keep the defaulted constexpr function as __host__ __device__ and catch the error only if/when struct foo is instantiated on device side.

NVCC (and clang as it is right now) appear to follow the latter -- there's no error if we don't generate code for the function.
https://godbolt.org/z/aVhvVn

For the sake of avoiding surprises, I think we should preserve this behavior and just relax the assertion here. We should be OK to infer stricter set of attributes, but not to relax them.

In D67509#1679524, @tra wrote:

A reduced test case is

struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;

B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.

On one hand inferCUDATargetForImplicitSpecialMember is correct here.
On the other hand, constexpr being implicitly __host__ __device__ also works OK, with the error popping up only if we need to instantiate the B<A> on device side.

So, what we want is:

__host__ void f() {B<A> x;} // This should compile
__device__ void f() {B<A> x;} // This should produce an error.

struct foo {
  __host__ foo() { B<A> x; } // should compile
  __device__ foo() { B<A> x; } // ???
};

We could remove the implicit 'device' attribute from the function. This should make __device__ foo() fail to compile regardless of whether struct foo is instantiated on device.
Or we can keep the defaulted constexpr function as __host__ __device__ and catch the error only if/when struct foo is instantiated on device side.

NVCC (and clang as it is right now) appear to follow the latter -- there's no error if we don't generate code for the function.
https://godbolt.org/z/aVhvVn

For the sake of avoiding surprises, I think we should preserve this behavior and just relax the assertion here. We should be OK to infer stricter set of attributes, but not to relax them.

I will add a lit test to make sure we have the desired behavior.