variables emitted on both host and device side with different addresses
when ODR-used by host function should not cause device side counter-part
to be force emitted.
This fixes the regression caused by https://reviews.llvm.org/D102237
Differential D102801
[CUDA][HIP] Fix device variables used by host yaxunl on May 19 2021, 12:55 PM. Authored by
Details variables emitted on both host and device side with different addresses This fixes the regression caused by https://reviews.llvm.org/D102237
Diff Detail Event TimelineComment Actions Tentative LGTM as we need it to fix the regression soon. Summoning @rsmith for the 'big picture' opinion.
Comment Actions This patch does not appear to fix the second regression introduced by the D102237. Trying to compile the following code triggers an assertion in CGExpr.cpp: class a { public: a(char *); }; void b() { [](char *c) { static a d(c); d; }; } With assertions disabled it eventually leads to a different error: Comment Actions The root cause is similar to the last regression. Basically when a variable is emitted on both sides but as different entities, we should not treat it as a device variable on host side. I have updated the patch to fix both regressions.
Comment Actions In the updated patch I have a simpler solution which is easier to explain to the users. Basically we classify variables by how they are emitted: device side only, host side only, both sides as different entities (e.g. default constexpr var), and both sides as unified entity (e.g. managed var). For variables emitted on both sides as separate entities, we have limited knowledge and we limit what we can do for them. I think users should understand the compiler's limitation in such cases. And they can easily workaround that by making the variable explicitly device variable. Comment Actions This is really nice. Let me test it internally and see if anything breaks.
Comment Actions LGTM. I've verified that Tensorflow still builds with this patch and that the patch does fix the regressions we've seen. |
IIUIC, The idea here is that we do not want to emit constexpr int foo; on device, even if we happen to ODR-use it there.
And the way we detect this is by checking for implicit __constant__ we happen to add to constexpr variables.
I think this may be relying on the implementation details too much. It also makes compiler's behavior somewhat surprising -- we would potentially emit other variables that do not get any device attributes attribute, but would not emit the variables with implicit __constant__, which is a device attribute.
I'm not sure if we have any good options here. This may be an acceptable compromise, but I wonder if there's a better way to deal with this.
That said, this patch is OK to fix the regression we have now, but we may need to revisit this.