This is an archive of the discontinued LLVM Phabricator instance.

[cuda] Start diagnosing variables with bad target.
Needs ReviewPublic

Authored by hliao on May 4 2020, 11:39 AM.

Details

Summary
  • Non-local variables on the host side are generally not accessible from the device side. Without proper diagnostic messages, the compilation may pass until the final linking stage. That link error may not be intuitive enough for developers, especially for relocatable code compilation. For certain cases like assembly output only, it is even worse that the compilation just passes.
  • This patch addresses that issue by checking the use of non-local variables and issuing errors on bad target references. For references through default argumennts, a warning is generated on the function declaration as, at that point, that variables are just bound. No real code would be generated if that function won't be used.
  • The oppose direction, i.e. accessing device variables from the host side, is NOT addressed in this patch as the host code allows the access those device variables by using runtime interface on their shadow variables. It needs more support to identify how that variable is used on the host side for simple cases. The comprehensive diagnosing would be so expensive that alternative analysis tools like clang-tidy should be used.

Diff Detail

Event Timeline

hliao created this revision.May 4 2020, 11:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2020, 11:39 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
hliao added a comment.May 4 2020, 11:41 AM

That test code just passed compilation on clang trunk if only assembly code is generated, https://godbolt.org/z/XYjRcT. But NVCC generates errors on all cases.

hliao updated this revision to Diff 261904.May 4 2020, 12:29 PM

Reformatting test code following pre-merge checks.

yaxunl added inline comments.May 4 2020, 12:31 PM
clang/lib/Sema/SemaCUDA.cpp
156

We may need to mark constexpr variables as host device too. In practice such usage has exist for long time.

clang/test/SemaCUDA/variable-target.cu
43

we need to have a test to check captured local host variable is allowed in device lambda.

we need to have some test for constexpr variables used in device function.

tra added a comment.May 4 2020, 12:41 PM

This has a good chance of breaking existing code. It would be great to add an escape hatch option to revert to the old behavior if we run into problems. The change is relatively simple, so reverting it in case something goes wrong should work, too. Up to you.

clang/test/SemaCUDA/variable-target.cu
7

The current set of tests only verifies access of host variable from device side. We need to check that things work in other direction (i.e. device veriable is not accessible from host). A bit of it is covered in function-overload.cu, but it would make sense to deal with all variable-related things here.

It would be great to add more test cases:

  • access of device variable from various host functions.
  • test cases to verify that &var and sizeof(var) works for device vars in host functions.
hliao added a comment.May 4 2020, 12:54 PM
In D79344#2018561, @tra wrote:

This has a good chance of breaking existing code. It would be great to add an escape hatch option to revert to the old behavior if we run into problems. The change is relatively simple, so reverting it in case something goes wrong should work, too. Up to you.

Why? for the cases addressed in this patch, if there is existing code, it won't be compiled to generate module file due to the missing symbol. Anything missing?

hliao marked an inline comment as done.May 4 2020, 12:55 PM
hliao added inline comments.
clang/test/SemaCUDA/variable-target.cu
43

This patch just addresses the direct address of variables. For capture, it would be better to start with another patch.

hliao marked an inline comment as done.May 4 2020, 12:58 PM
hliao added inline comments.
clang/test/SemaCUDA/variable-target.cu
7

yeah, as noted in both the message and some sources, that direction diagnosing is more complicated because the host code still be able to access shadow variables. We need to issue warnings on improper usage, such as variable direct read/write. I want to address that in another patch as more change is required to check how a variable is being used.

yaxunl added inline comments.May 4 2020, 1:15 PM
clang/test/SemaCUDA/variable-target.cu
43

but there are chances that this patch may break valid usage of captured variables in device lambda. At least we should add test to avoid that.

hliao marked an inline comment as done.May 4 2020, 1:16 PM
hliao added inline comments.
clang/lib/Sema/SemaCUDA.cpp
156

cosntexpr variable is a little bit tricky as it's still possible for that variable to be finally emitted as a variable. For example, if its address is taken, it won't be optimized away and still needs emitting somewhere. But, like other non-local variables, CUDA forbids their initializers. Any suggestion?

tra added a comment.May 4 2020, 1:22 PM
In D79344#2018561, @tra wrote:

This has a good chance of breaking existing code. It would be great to add an escape hatch option to revert to the old behavior if we run into problems. The change is relatively simple, so reverting it in case something goes wrong should work, too. Up to you.

Why? for the cases addressed in this patch, if there is existing code, it won't be compiled to generate module file due to the missing symbol. Anything missing?

Logistics, mostly.

Overloading is a rather fragile area of CUDA. This is the area where clang and NVCC behave differently. Combined with the existing code that needs to work with both compilers, even minor changes in compiler behavior can result in unexpected issues. Stricter checks tend to expose existing code which happens to work (or to compile) when it should not have, but it's not always trivial to fix those quickly. Having an escape hatch allows us to deal with those issues. It allows the owner of the code to reproduce the problem while the rest of the world continues to work. Reverting is suboptimal as the end user is often not in a good position to build a compiler with your patch plumbed in and then plumb the patched compiler into their build system. Adding another compiler option to enable/disable the new behavior is much more manageable.

hliao added a comment.May 4 2020, 1:30 PM
In D79344#2018683, @tra wrote:
In D79344#2018561, @tra wrote:

This has a good chance of breaking existing code. It would be great to add an escape hatch option to revert to the old behavior if we run into problems. The change is relatively simple, so reverting it in case something goes wrong should work, too. Up to you.

Why? for the cases addressed in this patch, if there is existing code, it won't be compiled to generate module file due to the missing symbol. Anything missing?

Logistics, mostly.

Overloading is a rather fragile area of CUDA. This is the area where clang and NVCC behave differently. Combined with the existing code that needs to work with both compilers, even minor changes in compiler behavior can result in unexpected issues. Stricter checks tend to expose existing code which happens to work (or to compile) when it should not have, but it's not always trivial to fix those quickly. Having an escape hatch allows us to deal with those issues. It allows the owner of the code to reproduce the problem while the rest of the world continues to work. Reverting is suboptimal as the end user is often not in a good position to build a compiler with your patch plumbed in and then plumb the patched compiler into their build system. Adding another compiler option to enable/disable the new behavior is much more manageable.

OK, I will add one option, But, do we turn it on by default or off?

tra added a comment.May 4 2020, 3:04 PM

OK, I will add one option, But, do we turn it on by default or off?

As a rule of thumb, if it's an experimental feature, then the default would be off. For a change which should be the default, but is risky, the default is on. This patch looks like the latter.

If you can wait, I can try patching this change into our clang tree and then see if it breaks anything obvious. If nothing falls apart, I'll be fine with the patch as is.

tra added a comment.May 7 2020, 1:51 PM
In D79344#2018915, @tra wrote:

If you can wait, I can try patching this change into our clang tree and then see if it breaks anything obvious. If nothing falls apart, I'll be fine with the patch as is.

The patch appears to break compilation of CUDA headers:

In file included from <built-in>:1:
In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:406:
llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_complex_builtins.h:30:13: error: call to 'copysign' is ambiguous
      __a = std::copysign(std::isinf(__a) ? 1 : 0, __a);
            ^~~~~~~~~~~~~
llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_math.h:76:19: note: candidate function
__DEVICE__ double copysign(double __a, double __b) {
                  ^
third_party/gpus/cuda_10_1/include/crt/math_functions.hpp:861:32: note: candidate function
__MATH_FUNCTIONS_DECL__ double copysign(float a, double b)
                               ^
1 error generated when compiling for sm_60.
tra added a comment.EditedMay 7 2020, 2:06 PM

We're calling copysign( int, double). The standard library provides copysign(double, double), CUDA provides only copysign(float, double). As far as C++ is concerned, both require one type conversion. I guess previously we would give __device__ one provided by CUDA a higher preference, considering that the callee is a device function. Now both seem to have equal weight. I'm not sure how/why,

hliao added a comment.May 7 2020, 2:42 PM
In D79344#2026025, @tra wrote:

We're calling copysign( int, double). The standard library provides copysign(double, double), CUDA provides only copysign(float, double). As far as C++ is concerned, both require one type conversion. I guess previously we would give __device__ one provided by CUDA a higher preference, considering that the callee is a device function. Now both seem to have equal weight. I'm not sure how/why,

@yaxunl, that may be related to the change of overload resolution. Back to this change, that error should not be related to the non-local variable checks.

tra added a comment.May 7 2020, 2:51 PM
In D79344#2026025, @tra wrote:

We're calling copysign( int, double). The standard library provides copysign(double, double), CUDA provides only copysign(float, double). As far as C++ is concerned, both require one type conversion. I guess previously we would give __device__ one provided by CUDA a higher preference, considering that the callee is a device function. Now both seem to have equal weight. I'm not sure how/why,

@yaxunl, that may be related to the change of overload resolution. Back to this change, that error should not be related to the non-local variable checks.

The tree I've tested had Sam's changes reverted (bf6a26b066382e0f41bf023c781d84061c542307), so it appears to be triggered by this patch. Let me try reproducing it in the upstream HEAD.

tra added a comment.May 7 2020, 3:07 PM

The problem is reproducible in upstream clang. Let's see if I can reduce it to something simpler.

hliao added a comment.May 7 2020, 4:08 PM
In D79344#2026180, @tra wrote:

The problem is reproducible in upstream clang. Let's see if I can reduce it to something simpler.

I remembered found similar errors when the math part is refactored out into the current but, later, it seems fixed. Not sure, it's relevant or not.

tra added a comment.May 7 2020, 5:00 PM
In D79344#2026180, @tra wrote:

The problem is reproducible in upstream clang. Let's see if I can reduce it to something simpler.

Reduced it down to this -- compiles with clang w/o the patch, but fails with it.

__attribute__((device)) double copysign(double, double);
__attribute__((device)) double copysign(float, double);
template <typename> struct a { static const bool b = true; };
template <bool, class> struct c;
template <class f> struct c<true, f> { typedef f g; };
template <typename d, typename h>
__attribute__((device)) typename c<a<h>::b, double>::g copysign(d, h) {
  double e = copysign(0, e);
}
tra added a comment.May 7 2020, 5:10 PM

Here's a slightly smaller variant which may be a good clue for tracking down the root cause. This one fails with:

var.cc:6:14: error: no matching function for call to 'copysign'
  double g = copysign(0, g);
             ^~~~~~~~
var.cc:5:56: note: candidate template ignored: substitution failure [with e = int, f = double]: reference to __host__ variable 'b' in __device__ function
__attribute__((device)) typename c<a<f>::b, double>::d copysign(e, f) {
                                         ~             ^
1 error generated when compiling for sm_60.

I suspect that it's handling of non-type template parameter that may be breaking things in both cases.

template <typename> struct a { static const bool b = true; };
template <bool, class> struct c;
template <class h> struct c<true, h> { typedef h d; };
template <typename e, typename f>
__attribute__((device)) typename c<a<f>::b, double>::d copysign(e, f) {
  double g = copysign(0, g);
}
hliao added a comment.May 7 2020, 9:52 PM
In D79344#2026349, @tra wrote:

Here's a slightly smaller variant which may be a good clue for tracking down the root cause. This one fails with:

var.cc:6:14: error: no matching function for call to 'copysign'
  double g = copysign(0, g);
             ^~~~~~~~
var.cc:5:56: note: candidate template ignored: substitution failure [with e = int, f = double]: reference to __host__ variable 'b' in __device__ function
__attribute__((device)) typename c<a<f>::b, double>::d copysign(e, f) {
                                         ~             ^
1 error generated when compiling for sm_60.

I suspect that it's handling of non-type template parameter that may be breaking things in both cases.

template <typename> struct a { static const bool b = true; };
template <bool, class> struct c;
template <class h> struct c<true, h> { typedef h d; };
template <typename e, typename f>
__attribute__((device)) typename c<a<f>::b, double>::d copysign(e, f) {
  double g = copysign(0, g);
}

My bad. We need a similar logic in the call check to skip the template not instantiated yet, i.e.

diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 583e588e4bd..467136f4579 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -910,6 +910,10 @@ bool Sema::CheckCUDAAccess(SourceLocation Loc, FunctionDecl *Caller,
   assert(getLangOpts().CUDA && "Should only be called during CUDA compilation");
   assert(VD && isNonLocalVariable(VD) && "Variable must be a non-local one.");
 
+  auto &ExprEvalCtx = ExprEvalContexts.back();
+  if (ExprEvalCtx.isUnevaluated() || ExprEvalCtx.isConstantEvaluated())
+    return true;
+
   // FIXME: Is bailing out early correct here?  Should we instead assume that
   // the caller is a global initializer?
   if (!Caller)
tra added a comment.May 8 2020, 3:52 PM

This triggers an assertion:

clang: /usr/local/google/home/tra/work/llvm/repo/clang/lib/AST/Decl.cpp:2697: clang::Expr *clang::ParmVarDecl::getDefaultArg(): Assertion `!hasUninstantiatedDefaultArg() && "Default argument is not yet instantiated!"' failed.
#2  0x00007fffeb8ae40f in __assert_fail_base (fmt=0x7fffeba106e0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x7fffe7d2e909 "!hasUninstantiatedDefaultArg() && \"Default argument is not yet instantiated!\"",
    file=0x7fffe7d22e5c "/usr/local/google/home/tra/work/llvm/repo/clang/lib/AST/Decl.cpp", line=2697, function=<optimized out>) at assert.c:92
#3  0x00007fffeb8bbb92 in __GI___assert_fail (
    assertion=0x7fffe7d2e909 "!hasUninstantiatedDefaultArg() && \"Default argument is not yet instantiated!\"",
    file=0x7fffe7d22e5c "/usr/local/google/home/tra/work/llvm/repo/clang/lib/AST/Decl.cpp", line=2697,
    function=0x7fffe7dda0fb "clang::Expr *clang::ParmVarDecl::getDefaultArg()") at assert.c:101
#4  0x00007fffe8460aec in clang::ParmVarDecl::getDefaultArg (this=0x112f560) at /usr/local/google/home/tra/work/llvm/repo/clang/lib/AST/Decl.cpp:2696
#5  0x00007fffe618a5a6 in clang::Sema::checkCUDAParamWithInvalidDefaultArg (this=0x392450, Loc=..., FD=0x112f678, PVD=0x112f560)
    at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaCUDA.cpp:729
#6  0x00007fffe62ed89a in clang::Sema::CheckCXXDefaultArguments (this=0x392450, FD=0x112f678)
    at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaDeclCXX.cpp:1551
#7  0x00007fffe61c9443 in clang::Sema::CheckFunctionDeclaration (this=0x392450, S=0x0, NewFD=0x112f678, Previous=..., IsMemberSpecialization=false)
    at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaDecl.cpp:10765
#8  0x00007fffe6d5f0b7 in clang::TemplateDeclInstantiator::VisitCXXMethodDecl (this=0x7ffffffe7f20, D=0x111b198, TemplateParams=0x0,
    ClassScopeSpecializationArgs=llvm::Optional is not initialized, FunctionRewriteKind=clang::TemplateDeclInstantiator::RewriteKind::None)
    at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp:2424
#9  0x00007fffe6d62f10 in clang::TemplateDeclInstantiator::VisitCXXMethodDecl (this=0x7ffffffe7f20, D=0x111b198)
    at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp:3410
#10 0x00007fffe6d62ead in clang::TemplateDeclInstantiator::VisitCXXConstructorDecl (this=0x7ffffffe7f20, D=0x111b198)
    at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp:2498