Download Raw Diff

Details

Reviewers

tra
rjmccall
jlebar

Commits

rL372394: [CUDA][HIP] Fix hostness of defaulted constructor
rC372394: [CUDA][HIP] Fix hostness of defaulted constructor

Summary

Clang does not respect the explicit device host attributes of defaulted special members.
Also clang does not respect the hostness of special members determined by their
first declarations.
Clang also adds duplicate implicit device or host attributes in certain cases.
This patch fixes that.

Diff Detail

Repository: rL LLVM

Event Timeline

yaxunl created this revision.Sep 12 2019, 11:23 AM

tra added a reviewer: jlebar.Sep 12 2019, 12:01 PM

Example of the actual error produced by clang: https://godbolt.org/z/Dl1FfC

Ugh. Another corner case of the way we're dealing with implicit __host__ __device__ functions. :-(
LGTM for postponing the error until actual use.

test/SemaCUDA/default-ctor.cu
1 ↗	(On Diff #219946)	It would be good to add host-side compilation, too.
4 ↗	(On Diff #219946)	Use `#include "Inputs/cuda.h"` instead.

Sorry I found some issue with the fix.

The following code:

struct A {  virtual ~A(); };
struct B: public A { B(); };
B::B() = default;

will cause B::B() with external linkage emitted in IR, since B::B() = default; is a function definition.

This somehow defeats the intention not to emit B::B() in device code if its base class has virtual member function.

On the other hand, if we remove B::B() = default; from the above code, B::B() will become a __host__ function.

I think host/device property of B::B() should be determined at declaration and should not be changed by its definition.

In the above example, it should always be a __host__ function and should not be emitted in device code.

Posts a new fix for this issue, where the defaulted constructor definition follows the hostness of the original declaration in the class. Also fix the issue when defaulted ctor has explicit host device attribs.

tra added inline comments.Sep 19 2019, 9:09 AM

lib/Sema/SemaCUDA.cpp
273–274 ↗	(On Diff #220838)	A comment here would be helpful. I think the intent here is to look for implicit special members with explicitly set attributes. We have number of cases where we set H/D attributes implicitly. I'm not sure whether we ever see any of them here, but if we do, it will sneak through this check. I think a check for whether the attribute is explicit would be prudent.

yaxunl marked an inline comment as done.Sep 19 2019, 9:27 AM

yaxunl added inline comments.

lib/Sema/SemaCUDA.cpp
273–274 ↗	(On Diff #220838)	will add the comment. I intentionally omitted check for explicit attr because I noticed the same special member is inferred twice. Each time it is added the same attrs, which cause them to have two `__host__` and two `__device__` attrs. By checking if attrs exist (not just explicit attrs) we can avoid duplicate attrs. I tested this with real machine learning frameworks and did not see issues.

tra accepted this revision.Sep 19 2019, 10:25 AM

tra added inline comments.

lib/Sema/SemaCUDA.cpp
273–274 ↗	(On Diff #220838)	OK. So, if we have explicit attributes, then there's no need to infer them. If the attributes are implicit, then we've already guessed them, so there's no point doing it again. The check for simple attribute presence covers both cases. I'll buy that. This leaves a hypothetical gap in case we have implicitly set attributes set somewhere else that would disagree with the attributes that would be set by this function. It would be great to have an assertion somewhere to verify that it does not happen. Alas, this function modifies the MemberDecl, so I don't see an easy way to do it. In general, the multiple application of the attributes seems to be a separate issue that should be fixed. I think we run into it in other places, too.

This revision is now accepted and ready to land.Sep 19 2019, 10:25 AM

Skip inferring for explicit host/device attrs only. Adds checks for implicit device and host attrs and avoid duplicates.

tra added inline comments.Sep 19 2019, 12:13 PM

lib/Sema/SemaCUDA.cpp

387 ↗

(On Diff #220892)

addHDAttrIfNeeded ? We may not even need it. See below.

388–409 ↗

(On Diff #220892)

Perhaps we can rearrange things a bit to make it easier to follow.

bool needsH = true, needsD=true;
if (has Value) {
   if (CFT_Device)
      needsH = false;
   if (CFT_Host)
      needsD = false;
}

// We either setting attributes first time, or the inferred ones must match previously set ones.
assert(!(hasAttr(D) || hasAttr(H)) 
    || (needsD == hasAttr(D) && (needsH == hasAttr(H)))
if (needsD && !hasAttr(D))
   addAttr(D);
if (needsH && ! hasAttr(H))
   addAttr(H);

simplify logic by Artem's comments.

tra added inline comments.Sep 19 2019, 2:05 PM

lib/Sema/SemaCUDA.cpp
386–387 ↗	(On Diff #220907)	Nice. Now these can be moved above `HasExpAttr` and then used in its initializer to make it shorter.

revise by Artem's comments.

LGTM. Thank you!

Closed by commit rL372394: [CUDA][HIP] Fix hostness of defaulted constructor (authored by yaxunl). · Explain WhySep 20 2019, 7:29 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2019, 7:29 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Looks like CUDA test-suite is triggering the assertion added by this patch:

http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio

In D67509#1677528, @tra wrote:

Looks like CUDA test-suite is triggering the assertion added by this patch:

http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio

I am taking a look.

In D67509#1677586, @yaxunl wrote:

In D67509#1677528, @tra wrote:

Looks like CUDA test-suite is triggering the assertion added by this patch:

http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio

I am taking a look.

I can reproduce similar asserts locally. It seems the assertion I added assert(!(HasD || HasH) || (NeedsD == HasD && NeedsH == HasH)); is not always true. Since we do not have this assert before, I removed it. I will study what causes it to assert and post it later.

In D67509#1677722, @yaxunl wrote:

In D67509#1677586, @yaxunl wrote:

In D67509#1677528, @tra wrote:

Looks like CUDA test-suite is triggering the assertion added by this patch:

http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio

I am taking a look.

I can reproduce similar asserts locally. It seems the assertion I added assert(!(HasD || HasH) || (NeedsD == HasD && NeedsH == HasH)); is not always true. Since we do not have this assert before, I removed it. I will study what causes it to assert and post it later.

A reduced test case is

struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;

B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.

In D67509#1678394, @yaxunl wrote:
A reduced test case is
struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;
B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.

On one hand inferCUDATargetForImplicitSpecialMember is correct here.
On the other hand, constexpr being implicitly __host__ __device__ also works OK, with the error popping up only if we need to instantiate the B<A> on device side.

So, what we want is:

__host__ void f() {B<A> x;} // This should compile
__device__ void f() {B<A> x;} // This should produce an error.

struct foo {
  __host__ foo() { B<A> x; } // should compile
  __device__ foo() { B<A> x; } // ???
};

We could remove the implicit 'device' attribute from the function. This should make __device__ foo() fail to compile regardless of whether struct foo is instantiated on device.
Or we can keep the defaulted constexpr function as __host__ __device__ and catch the error only if/when struct foo is instantiated on device side.

NVCC (and clang as it is right now) appear to follow the latter -- there's no error if we don't generate code for the function.
https://godbolt.org/z/aVhvVn

For the sake of avoiding surprises, I think we should preserve this behavior and just relax the assertion here. We should be OK to infer stricter set of attributes, but not to relax them.

In D67509#1679524, @tra wrote:
In D67509#1678394, @yaxunl wrote:
A reduced test case is
struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;
B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.
On one hand inferCUDATargetForImplicitSpecialMember is correct here.
On the other hand, constexpr being implicitly __host__ __device__ also works OK, with the error popping up only if we need to instantiate the B<A> on device side.

So, what we want is:
__host__ void f() {B<A> x;} // This should compile
__device__ void f() {B<A> x;} // This should produce an error.

struct foo {
  __host__ foo() { B<A> x; } // should compile
  __device__ foo() { B<A> x; } // ???
};
We could remove the implicit 'device' attribute from the function. This should make __device__ foo() fail to compile regardless of whether struct foo is instantiated on device.
Or we can keep the defaulted constexpr function as __host__ __device__ and catch the error only if/when struct foo is instantiated on device side.

NVCC (and clang as it is right now) appear to follow the latter -- there's no error if we don't generate code for the function.
https://godbolt.org/z/aVhvVn

For the sake of avoiding surprises, I think we should preserve this behavior and just relax the assertion here. We should be OK to infer stricter set of attributes, but not to relax them.

I will add a lit test to make sure we have the desired behavior.

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix hostness of defaulted constructor
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 221029

cfe/trunk/lib/Sema/SemaCUDA.cpp

cfe/trunk/test/SemaCUDA/default-ctor.cu

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix hostness of defaulted constructorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 221029

cfe/trunk/lib/Sema/SemaCUDA.cpp

cfe/trunk/test/SemaCUDA/default-ctor.cu

[CUDA][HIP] Fix hostness of defaulted constructor
ClosedPublic