- Non-local variables on the host side are generally not accessible from the device side. Without proper diagnostic messages, the compilation may pass until the final linking stage. That link error may not be intuitive enough for developers, especially for relocatable code compilation. For certain cases like assembly output only, it is even worse that the compilation just passes.
- This patch addresses that issue by checking the use of non-local variables and issuing errors on bad target references. For references through default argumennts, a warning is generated on the function declaration as, at that point, that variables are just bound. No real code would be generated if that function won't be used.
- The oppose direction, i.e. accessing device variables from the host side, is NOT addressed in this patch as the host code allows the access those device variables by using runtime interface on their shadow variables. It needs more support to identify how that variable is used on the host side for simple cases. The comprehensive diagnosing would be so expensive that alternative analysis tools like clang-tidy should be used.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
That test code just passed compilation on clang trunk if only assembly code is generated, https://godbolt.org/z/XYjRcT. But NVCC generates errors on all cases.
clang/lib/Sema/SemaCUDA.cpp | ||
---|---|---|
156 | We may need to mark constexpr variables as host device too. In practice such usage has exist for long time. | |
clang/test/SemaCUDA/variable-target.cu | ||
43 | we need to have a test to check captured local host variable is allowed in device lambda. we need to have some test for constexpr variables used in device function. |
This has a good chance of breaking existing code. It would be great to add an escape hatch option to revert to the old behavior if we run into problems. The change is relatively simple, so reverting it in case something goes wrong should work, too. Up to you.
clang/test/SemaCUDA/variable-target.cu | ||
---|---|---|
7 | The current set of tests only verifies access of host variable from device side. We need to check that things work in other direction (i.e. device veriable is not accessible from host). A bit of it is covered in function-overload.cu, but it would make sense to deal with all variable-related things here. It would be great to add more test cases:
|
Why? for the cases addressed in this patch, if there is existing code, it won't be compiled to generate module file due to the missing symbol. Anything missing?
clang/test/SemaCUDA/variable-target.cu | ||
---|---|---|
43 | This patch just addresses the direct address of variables. For capture, it would be better to start with another patch. |
clang/test/SemaCUDA/variable-target.cu | ||
---|---|---|
7 | yeah, as noted in both the message and some sources, that direction diagnosing is more complicated because the host code still be able to access shadow variables. We need to issue warnings on improper usage, such as variable direct read/write. I want to address that in another patch as more change is required to check how a variable is being used. |
clang/test/SemaCUDA/variable-target.cu | ||
---|---|---|
43 | but there are chances that this patch may break valid usage of captured variables in device lambda. At least we should add test to avoid that. |
clang/lib/Sema/SemaCUDA.cpp | ||
---|---|---|
156 | cosntexpr variable is a little bit tricky as it's still possible for that variable to be finally emitted as a variable. For example, if its address is taken, it won't be optimized away and still needs emitting somewhere. But, like other non-local variables, CUDA forbids their initializers. Any suggestion? |
Logistics, mostly.
Overloading is a rather fragile area of CUDA. This is the area where clang and NVCC behave differently. Combined with the existing code that needs to work with both compilers, even minor changes in compiler behavior can result in unexpected issues. Stricter checks tend to expose existing code which happens to work (or to compile) when it should not have, but it's not always trivial to fix those quickly. Having an escape hatch allows us to deal with those issues. It allows the owner of the code to reproduce the problem while the rest of the world continues to work. Reverting is suboptimal as the end user is often not in a good position to build a compiler with your patch plumbed in and then plumb the patched compiler into their build system. Adding another compiler option to enable/disable the new behavior is much more manageable.
As a rule of thumb, if it's an experimental feature, then the default would be off. For a change which should be the default, but is risky, the default is on. This patch looks like the latter.
If you can wait, I can try patching this change into our clang tree and then see if it breaks anything obvious. If nothing falls apart, I'll be fine with the patch as is.
The patch appears to break compilation of CUDA headers:
In file included from <built-in>:1: In file included from llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_runtime_wrapper.h:406: llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_complex_builtins.h:30:13: error: call to 'copysign' is ambiguous __a = std::copysign(std::isinf(__a) ? 1 : 0, __a); ^~~~~~~~~~~~~ llvm_unstable/toolchain/lib/clang/google3-trunk/include/__clang_cuda_math.h:76:19: note: candidate function __DEVICE__ double copysign(double __a, double __b) { ^ third_party/gpus/cuda_10_1/include/crt/math_functions.hpp:861:32: note: candidate function __MATH_FUNCTIONS_DECL__ double copysign(float a, double b) ^ 1 error generated when compiling for sm_60.
We're calling copysign( int, double). The standard library provides copysign(double, double), CUDA provides only copysign(float, double). As far as C++ is concerned, both require one type conversion. I guess previously we would give __device__ one provided by CUDA a higher preference, considering that the callee is a device function. Now both seem to have equal weight. I'm not sure how/why,
@yaxunl, that may be related to the change of overload resolution. Back to this change, that error should not be related to the non-local variable checks.
The tree I've tested had Sam's changes reverted (bf6a26b066382e0f41bf023c781d84061c542307), so it appears to be triggered by this patch. Let me try reproducing it in the upstream HEAD.
The problem is reproducible in upstream clang. Let's see if I can reduce it to something simpler.
I remembered found similar errors when the math part is refactored out into the current but, later, it seems fixed. Not sure, it's relevant or not.
Reduced it down to this -- compiles with clang w/o the patch, but fails with it.
__attribute__((device)) double copysign(double, double); __attribute__((device)) double copysign(float, double); template <typename> struct a { static const bool b = true; }; template <bool, class> struct c; template <class f> struct c<true, f> { typedef f g; }; template <typename d, typename h> __attribute__((device)) typename c<a<h>::b, double>::g copysign(d, h) { double e = copysign(0, e); }
Here's a slightly smaller variant which may be a good clue for tracking down the root cause. This one fails with:
var.cc:6:14: error: no matching function for call to 'copysign' double g = copysign(0, g); ^~~~~~~~ var.cc:5:56: note: candidate template ignored: substitution failure [with e = int, f = double]: reference to __host__ variable 'b' in __device__ function __attribute__((device)) typename c<a<f>::b, double>::d copysign(e, f) { ~ ^ 1 error generated when compiling for sm_60.
I suspect that it's handling of non-type template parameter that may be breaking things in both cases.
template <typename> struct a { static const bool b = true; }; template <bool, class> struct c; template <class h> struct c<true, h> { typedef h d; }; template <typename e, typename f> __attribute__((device)) typename c<a<f>::b, double>::d copysign(e, f) { double g = copysign(0, g); }
My bad. We need a similar logic in the call check to skip the template not instantiated yet, i.e.
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp index 583e588e4bd..467136f4579 100644 --- a/clang/lib/Sema/SemaCUDA.cpp +++ b/clang/lib/Sema/SemaCUDA.cpp @@ -910,6 +910,10 @@ bool Sema::CheckCUDAAccess(SourceLocation Loc, FunctionDecl *Caller, assert(getLangOpts().CUDA && "Should only be called during CUDA compilation"); assert(VD && isNonLocalVariable(VD) && "Variable must be a non-local one."); + auto &ExprEvalCtx = ExprEvalContexts.back(); + if (ExprEvalCtx.isUnevaluated() || ExprEvalCtx.isConstantEvaluated()) + return true; + // FIXME: Is bailing out early correct here? Should we instead assume that // the caller is a global initializer? if (!Caller)
This triggers an assertion:
clang: /usr/local/google/home/tra/work/llvm/repo/clang/lib/AST/Decl.cpp:2697: clang::Expr *clang::ParmVarDecl::getDefaultArg(): Assertion `!hasUninstantiatedDefaultArg() && "Default argument is not yet instantiated!"' failed.
#2 0x00007fffeb8ae40f in __assert_fail_base (fmt=0x7fffeba106e0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7fffe7d2e909 "!hasUninstantiatedDefaultArg() && \"Default argument is not yet instantiated!\"", file=0x7fffe7d22e5c "/usr/local/google/home/tra/work/llvm/repo/clang/lib/AST/Decl.cpp", line=2697, function=<optimized out>) at assert.c:92 #3 0x00007fffeb8bbb92 in __GI___assert_fail ( assertion=0x7fffe7d2e909 "!hasUninstantiatedDefaultArg() && \"Default argument is not yet instantiated!\"", file=0x7fffe7d22e5c "/usr/local/google/home/tra/work/llvm/repo/clang/lib/AST/Decl.cpp", line=2697, function=0x7fffe7dda0fb "clang::Expr *clang::ParmVarDecl::getDefaultArg()") at assert.c:101 #4 0x00007fffe8460aec in clang::ParmVarDecl::getDefaultArg (this=0x112f560) at /usr/local/google/home/tra/work/llvm/repo/clang/lib/AST/Decl.cpp:2696 #5 0x00007fffe618a5a6 in clang::Sema::checkCUDAParamWithInvalidDefaultArg (this=0x392450, Loc=..., FD=0x112f678, PVD=0x112f560) at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaCUDA.cpp:729 #6 0x00007fffe62ed89a in clang::Sema::CheckCXXDefaultArguments (this=0x392450, FD=0x112f678) at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaDeclCXX.cpp:1551 #7 0x00007fffe61c9443 in clang::Sema::CheckFunctionDeclaration (this=0x392450, S=0x0, NewFD=0x112f678, Previous=..., IsMemberSpecialization=false) at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaDecl.cpp:10765 #8 0x00007fffe6d5f0b7 in clang::TemplateDeclInstantiator::VisitCXXMethodDecl (this=0x7ffffffe7f20, D=0x111b198, TemplateParams=0x0, ClassScopeSpecializationArgs=llvm::Optional is not initialized, FunctionRewriteKind=clang::TemplateDeclInstantiator::RewriteKind::None) at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp:2424 #9 0x00007fffe6d62f10 in clang::TemplateDeclInstantiator::VisitCXXMethodDecl (this=0x7ffffffe7f20, D=0x111b198) at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp:3410 #10 0x00007fffe6d62ead in clang::TemplateDeclInstantiator::VisitCXXConstructorDecl (this=0x7ffffffe7f20, D=0x111b198) at /usr/local/google/home/tra/work/llvm/repo/clang/lib/Sema/SemaTemplateInstantiateDecl.cpp:2498
We may need to mark constexpr variables as host device too. In practice such usage has exist for long time.