This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix device variables used by host
ClosedPublic

Authored by yaxunl on May 19 2021, 12:55 PM.

Details

Summary

variables emitted on both host and device side with different addresses
when ODR-used by host function should not cause device side counter-part
to be force emitted.

This fixes the regression caused by https://reviews.llvm.org/D102237

Diff Detail

Event Timeline

yaxunl created this revision.May 19 2021, 12:55 PM
yaxunl requested review of this revision.May 19 2021, 12:55 PM
tra added a reviewer: rsmith.May 19 2021, 1:59 PM
tra added a subscriber: rsmith.

Tentative LGTM as we need it to fix the regression soon.

Summoning @rsmith for the 'big picture' opinion.
While the patch may fix this particular regression, I wonder if there's a better way to deal with this. We're growing a bit too many nuances that would be hard to explain and may cause more corner cases to appear.

clang/lib/CodeGen/CodeGenModule.cpp
2391

IIUIC, The idea here is that we do not want to emit constexpr int foo; on device, even if we happen to ODR-use it there.
And the way we detect this is by checking for implicit __constant__ we happen to add to constexpr variables.

I think this may be relying on the implementation details too much. It also makes compiler's behavior somewhat surprising -- we would potentially emit other variables that do not get any device attributes attribute, but would not emit the variables with implicit __constant__, which is a device attribute.

I'm not sure if we have any good options here. This may be an acceptable compromise, but I wonder if there's a better way to deal with this.

That said, this patch is OK to fix the regression we have now, but we may need to revisit this.

clang/test/CodeGenCUDA/host-used-device-var.cu
185–213

This definitely needs some comments. Otherwise this is nearly incomprehensible and it's impossible to tell what's going on.

tra added a comment.May 19 2021, 4:32 PM

This patch does not appear to fix the second regression introduced by the D102237.

Trying to compile the following code triggers an assertion in CGExpr.cpp:

class a {
public:
  a(char *);
};
void b() {
  [](char *c) {
    static a d(c);
    d;
  };
}

With assertions disabled it eventually leads to a different error:
Module has a nontrivial global ctor, which NVPTX does not support.
https://godbolt.org/z/sYE1dKr1W

yaxunl updated this revision to Diff 346796.May 20 2021, 10:46 AM
yaxunl retitled this revision from [CUDA][HIP] Fix implicit constant variable to [CUDA][HIP] Fix device variables used by host.
yaxunl edited the summary of this revision. (Show Details)

Fix the other regression

yaxunl marked 2 inline comments as done.May 20 2021, 10:57 AM

This patch does not appear to fix the second regression introduced by the D102237.

Trying to compile the following code triggers an assertion in CGExpr.cpp:

class a {
public:
  a(char *);
};
void b() {
  [](char *c) {
    static a d(c);
    d;
  };
}

With assertions disabled it eventually leads to a different error:
Module has a nontrivial global ctor, which NVPTX does not support.
https://godbolt.org/z/sYE1dKr1W

The root cause is similar to the last regression. Basically when a variable is emitted on both sides but as different entities, we should not treat it as a device variable on host side. I have updated the patch to fix both regressions.

clang/lib/CodeGen/CodeGenModule.cpp
2391

we need to differentiate constexpr int a and __constant__ constexpr int a, since the former is emitted on both sides, and the later is only emitted on device side. It seems the only way to differentiate them is to check whether the constant attribute is explicit or not.

clang/test/CodeGenCUDA/host-used-device-var.cu
185–213

done

yaxunl marked 2 inline comments as done.May 20 2021, 11:04 AM

Tentative LGTM as we need it to fix the regression soon.

Summoning @rsmith for the 'big picture' opinion.
While the patch may fix this particular regression, I wonder if there's a better way to deal with this. We're growing a bit too many nuances that would be hard to explain and may cause more corner cases to appear.

In the updated patch I have a simpler solution which is easier to explain to the users. Basically we classify variables by how they are emitted: device side only, host side only, both sides as different entities (e.g. default constexpr var), and both sides as unified entity (e.g. managed var). For variables emitted on both sides as separate entities, we have limited knowledge and we limit what we can do for them. I think users should understand the compiler's limitation in such cases. And they can easily workaround that by making the variable explicitly device variable.

tra added a comment.May 20 2021, 11:25 AM

In the updated patch I have a simpler solution which is easier to explain to the users. Basically we classify variables by how they are emitted: device side only, host side only, both sides as different entities (e.g. default constexpr var), and both sides as unified entity (e.g. managed var). For variables emitted on both sides as separate entities, we have limited knowledge and we limit what we can do for them. I think users should understand the compiler's limitation in such cases. And they can easily workaround that by making the variable explicitly device variable.

This is really nice.

Let me test it internally and see if anything breaks.

clang/include/clang/Sema/Sema.h
12066

Wasn't there another kind, where the variable is emitted on the host with device-side shadow? I vaguely recall it had something to do with textures.

12067

I think we should mention the host-side shadows, too.

clang/lib/Sema/SemaCUDA.cpp
148–149

I'm still not a fan of relying on a implicit constant.
Can we change it to more direct is-a-constexpr && !has-explicit-device-side-attr ?
We may eventually consider relaxing this to can-be-const-evaluated and allow const vars with known values.

yaxunl marked 3 inline comments as done.May 20 2021, 12:54 PM
yaxunl added inline comments.
clang/include/clang/Sema/Sema.h
12066

That was the first implementation, which was similar to managed var but used pinned host memory as a common memory shared by device and host.

However, that implementation was later replaced by a different implementation which is similar to nvcc. In the new implementation textures and surfaces are like usual device variables. So far I do not see the necessity to differentiate them from usual device variables.

12067

will do

clang/lib/Sema/SemaCUDA.cpp
148–149

will do. agree we should relax this for const var in the future

yaxunl updated this revision to Diff 346831.May 20 2021, 12:55 PM
yaxunl marked 3 inline comments as done.

revised by Artem's comments

tra accepted this revision.May 20 2021, 1:31 PM

LGTM.

I've verified that Tensorflow still builds with this patch and that the patch does fix the regressions we've seen.
If you could land this patch soon, that would be appreciated.

This revision is now accepted and ready to land.May 20 2021, 1:31 PM
This revision was landed with ongoing or failed builds.May 20 2021, 2:04 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2021, 2:05 PM