Download Raw Diff

Details

Reviewers

tra
rjmccall
jlebar

Commits

rL372394: [CUDA][HIP] Fix hostness of defaulted constructor
rC372394: [CUDA][HIP] Fix hostness of defaulted constructor

Summary

Clang does not respect the explicit device host attributes of defaulted special members.
Also clang does not respect the hostness of special members determined by their
first declarations.
Clang also adds duplicate implicit device or host attributes in certain cases.
This patch fixes that.

Diff Detail

Event Timeline

yaxunl created this revision.Sep 12 2019, 11:23 AM

tra added a reviewer: jlebar.Sep 12 2019, 12:01 PM

Example of the actual error produced by clang: https://godbolt.org/z/Dl1FfC

Ugh. Another corner case of the way we're dealing with implicit __host__ __device__ functions. :-(
LGTM for postponing the error until actual use.

test/SemaCUDA/default-ctor.cu
2	It would be good to add host-side compilation, too.
5	Use `#include "Inputs/cuda.h"` instead.

Sorry I found some issue with the fix.

The following code:

struct A {  virtual ~A(); };
struct B: public A { B(); };
B::B() = default;

will cause B::B() with external linkage emitted in IR, since B::B() = default; is a function definition.

This somehow defeats the intention not to emit B::B() in device code if its base class has virtual member function.

On the other hand, if we remove B::B() = default; from the above code, B::B() will become a __host__ function.

I think host/device property of B::B() should be determined at declaration and should not be changed by its definition.

In the above example, it should always be a __host__ function and should not be emitted in device code.

Posts a new fix for this issue, where the defaulted constructor definition follows the hostness of the original declaration in the class. Also fix the issue when defaulted ctor has explicit host device attribs.

tra added inline comments.Sep 19 2019, 9:09 AM

lib/Sema/SemaCUDA.cpp
273–274	A comment here would be helpful. I think the intent here is to look for implicit special members with explicitly set attributes. We have number of cases where we set H/D attributes implicitly. I'm not sure whether we ever see any of them here, but if we do, it will sneak through this check. I think a check for whether the attribute is explicit would be prudent.

yaxunl marked an inline comment as done.Sep 19 2019, 9:27 AM

yaxunl added inline comments.

lib/Sema/SemaCUDA.cpp
273–274	will add the comment. I intentionally omitted check for explicit attr because I noticed the same special member is inferred twice. Each time it is added the same attrs, which cause them to have two `__host__` and two `__device__` attrs. By checking if attrs exist (not just explicit attrs) we can avoid duplicate attrs. I tested this with real machine learning frameworks and did not see issues.

tra accepted this revision.Sep 19 2019, 10:25 AM

tra added inline comments.

lib/Sema/SemaCUDA.cpp
273–274	OK. So, if we have explicit attributes, then there's no need to infer them. If the attributes are implicit, then we've already guessed them, so there's no point doing it again. The check for simple attribute presence covers both cases. I'll buy that. This leaves a hypothetical gap in case we have implicitly set attributes set somewhere else that would disagree with the attributes that would be set by this function. It would be great to have an assertion somewhere to verify that it does not happen. Alas, this function modifies the MemberDecl, so I don't see an easy way to do it. In general, the multiple application of the attributes seems to be a separate issue that should be fixed. I think we run into it in other places, too.

This revision is now accepted and ready to land.Sep 19 2019, 10:25 AM

Skip inferring for explicit host/device attrs only. Adds checks for implicit device and host attrs and avoid duplicates.

tra added inline comments.Sep 19 2019, 12:13 PM

lib/Sema/SemaCUDA.cpp

382

addHDAttrIfNeeded ? We may not even need it. See below.

383–404

Perhaps we can rearrange things a bit to make it easier to follow.

bool needsH = true, needsD=true;
if (has Value) {
   if (CFT_Device)
      needsH = false;
   if (CFT_Host)
      needsD = false;
}

// We either setting attributes first time, or the inferred ones must match previously set ones.
assert(!(hasAttr(D) || hasAttr(H)) 
    || (needsD == hasAttr(D) && (needsH == hasAttr(H)))
if (needsD && !hasAttr(D))
   addAttr(D);
if (needsH && ! hasAttr(H))
   addAttr(H);

simplify logic by Artem's comments.

tra added inline comments.Sep 19 2019, 2:05 PM

lib/Sema/SemaCUDA.cpp
386–387	Nice. Now these can be moved above `HasExpAttr` and then used in its initializer to make it shorter.

revise by Artem's comments.

LGTM. Thank you!

Closed by commit rL372394: [CUDA][HIP] Fix hostness of defaulted constructor (authored by yaxunl). · Explain WhySep 20 2019, 7:29 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2019, 7:29 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Looks like CUDA test-suite is triggering the assertion added by this patch:

http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio

In D67509#1677528, @tra wrote:

Looks like CUDA test-suite is triggering the assertion added by this patch:

http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio

I am taking a look.

In D67509#1677586, @yaxunl wrote:

In D67509#1677528, @tra wrote:

Looks like CUDA test-suite is triggering the assertion added by this patch:

http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio

I am taking a look.

I can reproduce similar asserts locally. It seems the assertion I added assert(!(HasD || HasH) || (NeedsD == HasD && NeedsH == HasH)); is not always true. Since we do not have this assert before, I removed it. I will study what causes it to assert and post it later.

In D67509#1677722, @yaxunl wrote:

In D67509#1677586, @yaxunl wrote:

In D67509#1677528, @tra wrote:

Looks like CUDA test-suite is triggering the assertion added by this patch:

http://lab.llvm.org:8011/builders/clang-cuda-build/builds/37301/steps/ninja%20build%20simple%20CUDA%20tests/logs/stdio

I am taking a look.

I can reproduce similar asserts locally. It seems the assertion I added assert(!(HasD || HasH) || (NeedsD == HasD && NeedsH == HasH)); is not always true. Since we do not have this assert before, I removed it. I will study what causes it to assert and post it later.

A reduced test case is

struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;

B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.

In D67509#1678394, @yaxunl wrote:
A reduced test case is
struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;
B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.

On one hand inferCUDATargetForImplicitSpecialMember is correct here.
On the other hand, constexpr being implicitly __host__ __device__ also works OK, with the error popping up only if we need to instantiate the B<A> on device side.

So, what we want is:

__host__ void f() {B<A> x;} // This should compile
__device__ void f() {B<A> x;} // This should produce an error.

struct foo {
  __host__ foo() { B<A> x; } // should compile
  __device__ foo() { B<A> x; } // ???
};

We could remove the implicit 'device' attribute from the function. This should make __device__ foo() fail to compile regardless of whether struct foo is instantiated on device.
Or we can keep the defaulted constexpr function as __host__ __device__ and catch the error only if/when struct foo is instantiated on device side.

NVCC (and clang as it is right now) appear to follow the latter -- there's no error if we don't generate code for the function.
https://godbolt.org/z/aVhvVn

For the sake of avoiding surprises, I think we should preserve this behavior and just relax the assertion here. We should be OK to infer stricter set of attributes, but not to relax them.

In D67509#1679524, @tra wrote:
In D67509#1678394, @yaxunl wrote:
A reduced test case is
struct A {
  A();
};

template <class T>
struct B
{
  T a;
  constexpr B() = default;
};

B<A> x;
B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.
On one hand inferCUDATargetForImplicitSpecialMember is correct here.
On the other hand, constexpr being implicitly __host__ __device__ also works OK, with the error popping up only if we need to instantiate the B<A> on device side.

So, what we want is:
__host__ void f() {B<A> x;} // This should compile
__device__ void f() {B<A> x;} // This should produce an error.

struct foo {
  __host__ foo() { B<A> x; } // should compile
  __device__ foo() { B<A> x; } // ???
};
We could remove the implicit 'device' attribute from the function. This should make __device__ foo() fail to compile regardless of whether struct foo is instantiated on device.
Or we can keep the defaulted constexpr function as __host__ __device__ and catch the error only if/when struct foo is instantiated on device side.

NVCC (and clang as it is right now) appear to follow the latter -- there's no error if we don't generate code for the function.
https://godbolt.org/z/aVhvVn

For the sake of avoiding surprises, I think we should preserve this behavior and just relax the assertion here. We should be OK to infer stricter set of attributes, but not to relax them.

I will add a lit test to make sure we have the desired behavior.

Diff 220838

lib/Sema/SemaCUDA.cpp

Show First 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	resolveCalleeCUDATargetConflict(Sema::CUDAFunctionTarget Target1,
return false;		return false;
}		}

bool Sema::inferCUDATargetForImplicitSpecialMember(CXXRecordDecl *ClassDecl,		bool Sema::inferCUDATargetForImplicitSpecialMember(CXXRecordDecl *ClassDecl,
CXXSpecialMember CSM,		CXXSpecialMember CSM,
CXXMethodDecl *MemberDecl,		CXXMethodDecl *MemberDecl,
bool ConstRHS,		bool ConstRHS,
bool Diagnose) {		bool Diagnose) {
		bool InClass = MemberDecl->getLexicalParent() == MemberDecl->getParent();
		bool hasAttr = MemberDecl->hasAttr<CUDADeviceAttr>() \|\|
		MemberDecl->hasAttr<CUDAHostAttr>();
		if (!InClass \|\| hasAttr)
		return false;
		traUnsubmitted Not Done Reply Inline Actions A comment here would be helpful. I think the intent here is to look for implicit special members with explicitly set attributes. We have number of cases where we set H/D attributes implicitly. I'm not sure whether we ever see any of them here, but if we do, it will sneak through this check. I think a check for whether the attribute is explicit would be prudent. tra: A comment here would be helpful. I think the intent here is to look for implicit special…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will add the comment. I intentionally omitted check for explicit attr because I noticed the same special member is inferred twice. Each time it is added the same attrs, which cause them to have two `__host__` and two `__device__` attrs. By checking if attrs exist (not just explicit attrs) we can avoid duplicate attrs. I tested this with real machine learning frameworks and did not see issues. yaxunl: will add the comment. I intentionally omitted check for explicit attr because I noticed the…
		traUnsubmitted Not Done Reply Inline Actions OK. So, if we have explicit attributes, then there's no need to infer them. If the attributes are implicit, then we've already guessed them, so there's no point doing it again. The check for simple attribute presence covers both cases. I'll buy that. This leaves a hypothetical gap in case we have implicitly set attributes set somewhere else that would disagree with the attributes that would be set by this function. It would be great to have an assertion somewhere to verify that it does not happen. Alas, this function modifies the MemberDecl, so I don't see an easy way to do it. In general, the multiple application of the attributes seems to be a separate issue that should be fixed. I think we run into it in other places, too. tra: OK. So, if we have explicit attributes, then there's no need to infer them. If the attributes…

llvm::Optional<CUDAFunctionTarget> InferredTarget;		llvm::Optional<CUDAFunctionTarget> InferredTarget;

// We're going to invoke special member lookup; mark that these special		// We're going to invoke special member lookup; mark that these special
// members are called from this one, and not from its caller.		// members are called from this one, and not from its caller.
ContextRAII MethodContext(*this, MemberDecl);		ContextRAII MethodContext(*this, MemberDecl);

// Look for special members in base classes that should be invoked from here.		// Look for special members in base classes that should be invoked from here.
// Infer the target of this member base on the ones it should call.		// Infer the target of this member base on the ones it should call.
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	if (!InferredTarget.hasValue()) {
MemberDecl->addAttr(CUDAInvalidTargetAttr::CreateImplicit(Context));		MemberDecl->addAttr(CUDAInvalidTargetAttr::CreateImplicit(Context));
return true;		return true;
}		}
}		}
}		}

if (InferredTarget.hasValue()) {		if (InferredTarget.hasValue()) {
if (InferredTarget.getValue() == CFT_Device) {		if (InferredTarget.getValue() == CFT_Device) {
MemberDecl->addAttr(CUDADeviceAttr::CreateImplicit(Context));		MemberDecl->addAttr(CUDADeviceAttr::CreateImplicit(Context));
		traUnsubmitted Not Done Reply Inline Actions `addHDAttrIfNeeded` ? We may not even need it. See below. tra: `addHDAttrIfNeeded` ? We may not even need it. See below.
} else if (InferredTarget.getValue() == CFT_Host) {		} else if (InferredTarget.getValue() == CFT_Host) {
MemberDecl->addAttr(CUDAHostAttr::CreateImplicit(Context));		MemberDecl->addAttr(CUDAHostAttr::CreateImplicit(Context));
} else {		} else {
MemberDecl->addAttr(CUDADeviceAttr::CreateImplicit(Context));		MemberDecl->addAttr(CUDADeviceAttr::CreateImplicit(Context));
MemberDecl->addAttr(CUDAHostAttr::CreateImplicit(Context));		MemberDecl->addAttr(CUDAHostAttr::CreateImplicit(Context));
		traUnsubmitted Not Done Reply Inline Actions Nice. Now these can be moved above `HasExpAttr` and then used in its initializer to make it shorter. tra: Nice. Now these can be moved above `HasExpAttr` and then used in its initializer to make it…
}		}
} else {		} else {
// If no target was inferred, mark this member as __host__ __device__;		// If no target was inferred, mark this member as __host__ __device__;
// it's the least restrictive option that can be invoked from any target.		// it's the least restrictive option that can be invoked from any target.
MemberDecl->addAttr(CUDADeviceAttr::CreateImplicit(Context));		MemberDecl->addAttr(CUDADeviceAttr::CreateImplicit(Context));
MemberDecl->addAttr(CUDAHostAttr::CreateImplicit(Context));		MemberDecl->addAttr(CUDAHostAttr::CreateImplicit(Context));
}		}

return false;		return false;
}		}

bool Sema::isEmptyCudaConstructor(SourceLocation Loc, CXXConstructorDecl *CD) {		bool Sema::isEmptyCudaConstructor(SourceLocation Loc, CXXConstructorDecl *CD) {
if (!CD->isDefined() && CD->isTemplateInstantiation())		if (!CD->isDefined() && CD->isTemplateInstantiation())
InstantiateFunctionDefinition(Loc, CD->getFirstDecl());		InstantiateFunctionDefinition(Loc, CD->getFirstDecl());

// (E.2.3.1, CUDA 7.5) A constructor for a class type is considered		// (E.2.3.1, CUDA 7.5) A constructor for a class type is considered
// empty at a point in the translation unit, if it is either a		// empty at a point in the translation unit, if it is either a
		traUnsubmitted Not Done Reply Inline Actions Perhaps we can rearrange things a bit to make it easier to follow. bool needsH = true, needsD=true; if (has Value) { if (CFT_Device) needsH = false; if (CFT_Host) needsD = false; } // We either setting attributes first time, or the inferred ones must match previously set ones. assert(!(hasAttr(D) \|\| hasAttr(H)) \|\| (needsD == hasAttr(D) && (needsH == hasAttr(H))) if (needsD && !hasAttr(D)) addAttr(D); if (needsH && ! hasAttr(H)) addAttr(H); tra: Perhaps we can rearrange things a bit to make it easier to follow. ``` bool needsH = true…
// trivial constructor		// trivial constructor
if (CD->isTrivial())		if (CD->isTrivial())
return true;		return true;

// ... or it satisfies all of the following conditions:		// ... or it satisfies all of the following conditions:
// The constructor function has been defined.		// The constructor function has been defined.
// The constructor function has no parameters,		// The constructor function has no parameters,
// and the function body is an empty compound statement.		// and the function body is an empty compound statement.
▲ Show 20 Lines • Show All 412 Lines • Show Last 20 Lines

test/SemaCUDA/default-ctor.cu

This file was added.

				// RUN: %clang_cc1 -std=c++11 -triple nvptx64-nvidia-cuda -fsyntax-only \
				// RUN: -fcuda-is-device -verify -verify-ignore-unexpected=note %s
				traUnsubmitted Not Done Reply Inline Actions It would be good to add host-side compilation, too. tra: It would be good to add host-side compilation, too.
				// RUN: %clang_cc1 -std=c++11 -triple x86_64-unknown-linux-gnu -fsyntax-only \
				// RUN: -verify -verify-ignore-unexpected=note %s

				traUnsubmitted Not Done Reply Inline Actions Use `#include "Inputs/cuda.h"` instead. tra: Use `#include "Inputs/cuda.h"` instead.
				#include "Inputs/cuda.h"

				struct In { In() = default; };
				struct InD { __device__ InD() = default; };
				struct InH { __host__ InH() = default; };
				struct InHD { __host__ __device__ InHD() = default; };

				struct Out { Out(); };
				struct OutD { __device__ OutD(); };
				struct OutH { __host__ OutH(); };
				struct OutHD { __host__ __device__ OutHD(); };

				Out::Out() = default;
				__device__ OutD::OutD() = default;
				__host__ OutH::OutH() = default;
				__host__ __device__ OutHD::OutHD() = default;

				__device__ void fd() {
				In in;
				InD ind;
				InH inh; // expected-error{{no matching constructor for initialization of 'InH'}}
				InHD inhd;
				Out out; // expected-error{{no matching constructor for initialization of 'Out'}}
				OutD outd;
				OutH outh; // expected-error{{no matching constructor for initialization of 'OutH'}}
				OutHD outhd;
				}

				__host__ void fh() {
				In in;
				InD ind; // expected-error{{no matching constructor for initialization of 'InD'}}
				InH inh;
				InHD inhd;
				Out out;
				OutD outd; // expected-error{{no matching constructor for initialization of 'OutD'}}
				OutH outh;
				OutHD outhd;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix hostness of defaulted constructor
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 220838

lib/Sema/SemaCUDA.cpp

test/SemaCUDA/default-ctor.cu

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix hostness of defaulted constructorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 220838

lib/Sema/SemaCUDA.cpp

test/SemaCUDA/default-ctor.cu

[CUDA][HIP] Fix hostness of defaulted constructor
ClosedPublic