Clang does not respect the explicit device host attributes of defaulted special members.
Also clang does not respect the hostness of special members determined by their
first declarations.
Clang also adds duplicate implicit device or host attributes in certain cases.
This patch fixes that.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Example of the actual error produced by clang: https://godbolt.org/z/Dl1FfC
Ugh. Another corner case of the way we're dealing with implicit __host__ __device__ functions. :-(
LGTM for postponing the error until actual use.
test/SemaCUDA/default-ctor.cu | ||
---|---|---|
1 ↗ | (On Diff #219946) | It would be good to add host-side compilation, too. |
4 ↗ | (On Diff #219946) | Use #include "Inputs/cuda.h" instead. |
Sorry I found some issue with the fix.
The following code:
struct A { virtual ~A(); }; struct B: public A { B(); }; B::B() = default;
will cause B::B() with external linkage emitted in IR, since B::B() = default; is a function definition.
This somehow defeats the intention not to emit B::B() in device code if its base class has virtual member function.
On the other hand, if we remove B::B() = default; from the above code, B::B() will become a __host__ function.
I think host/device property of B::B() should be determined at declaration and should not be changed by its definition.
In the above example, it should always be a __host__ function and should not be emitted in device code.
Posts a new fix for this issue, where the defaulted constructor definition follows the hostness of the original declaration in the class. Also fix the issue when defaulted ctor has explicit host device attribs.
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
273–274 ↗ | (On Diff #220838) | A comment here would be helpful. I think the intent here is to look for implicit special members with *explicitly* set attributes. |
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
273–274 ↗ | (On Diff #220838) | will add the comment. I intentionally omitted check for explicit attr because I noticed the same special member is inferred twice. Each time it is added the same attrs, which cause them to have two __host__ and two __device__ attrs. By checking if attrs exist (not just explicit attrs) we can avoid duplicate attrs. I tested this with real machine learning frameworks and did not see issues. |
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
273–274 ↗ | (On Diff #220838) | OK. So, if we have explicit attributes, then there's no need to infer them. If the attributes are implicit, then we've already guessed them, so there's no point doing it again. The check for simple attribute presence covers both cases. I'll buy that. This leaves a hypothetical gap in case we have implicitly set attributes set somewhere else that would disagree with the attributes that would be set by this function. It would be great to have an assertion somewhere to verify that it does not happen. Alas, this function modifies the MemberDecl, so I don't see an easy way to do it. In general, the multiple application of the attributes seems to be a separate issue that should be fixed. I think we run into it in other places, too. |
Skip inferring for explicit host/device attrs only. Adds checks for implicit device and host attrs and avoid duplicates.
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
387 ↗ | (On Diff #220892) | addHDAttrIfNeeded ? We may not even need it. See below. |
388–409 ↗ | (On Diff #220892) | Perhaps we can rearrange things a bit to make it easier to follow. bool needsH = true, needsD=true; if (has Value) { if (CFT_Device) needsH = false; if (CFT_Host) needsD = false; } // We either setting attributes first time, or the inferred ones must match previously set ones. assert(!(hasAttr(D) || hasAttr(H)) || (needsD == hasAttr(D) && (needsH == hasAttr(H))) if (needsD && !hasAttr(D)) addAttr(D); if (needsH && ! hasAttr(H)) addAttr(H); |
lib/Sema/SemaCUDA.cpp | ||
---|---|---|
386–387 ↗ | (On Diff #220907) | Nice. |
I can reproduce similar asserts locally. It seems the assertion I added assert(!(HasD || HasH) || (NeedsD == HasD && NeedsH == HasH)); is not always true. Since we do not have this assert before, I removed it. I will study what causes it to assert and post it later.
A reduced test case is
struct A { A(); }; template <class T> struct B { T a; constexpr B() = default; }; B<A> x;
B<A>::B() got implicit __host__ __device__ attrs due to constexpr before entering Sema::inferCUDATargetForImplicitSpecialMember.
In Sema::inferCUDATargetForImplicitSpecialMember, the inferred hostness of B<A>::B() is host since A::A() is host. This causes discrepancy between the inferred hostness and the existing hostness.
On one hand inferCUDATargetForImplicitSpecialMember is correct here.
On the other hand, constexpr being implicitly __host__ __device__ also works OK, with the error popping up only if we need to instantiate the B<A> on device side.
So, what we want is:
__host__ void f() {B<A> x;} // This should compile __device__ void f() {B<A> x;} // This should produce an error. struct foo { __host__ foo() { B<A> x; } // should compile __device__ foo() { B<A> x; } // ??? };
We could remove the implicit 'device' attribute from the function. This should make __device__ foo() fail to compile regardless of whether struct foo is instantiated on device.
Or we can keep the defaulted constexpr function as __host__ __device__ and catch the error only if/when struct foo is instantiated on device side.
NVCC (and clang as it is right now) appear to follow the latter -- there's no error if we don't generate code for the function.
https://godbolt.org/z/aVhvVn
For the sake of avoiding surprises, I think we should preserve this behavior and just relax the assertion here. We should be OK to infer stricter set of attributes, but not to relax them.