- As global initializers are not under any function body, they need to look into the current variable being initialized. That is not addressed in the current CUDA/HIP overloadable function resolution and ignore target checking. That may result in wrong candidate to be considered as illustrated in the newly added test case.
- In this patch, a non-local variable stack is introduced to keep track the current non-local variable being initialized so that initialization function could be inspected for the target preference.
- Besides newly added tests, existing tests are refined as the current implementation adds extra checks on global initializers to ensure no device functions are used. As the target match checking is enabled in this patch, such check is only necessary for CUDA device global variables. They are not allowed to be non-trivially initialized. As HIP starts to support non-trivial initialization of device initialization, such target matching check is mandatory to be enforced.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Looks good to me overall. I've pinged rsmith@ to double-check that we're covering all possibilities for non-local variable init.
clang/include/clang/Sema/Sema.h | ||
---|---|---|
11632 | I'd add a comment describing that it's a wrapper which dispatches the call to one of more specific variants above. | |
11655–11656 | Comment needs updating. | |
11668 | Nit: I'd add an empty line between delarations and the function. Jammed together they are hard to read. | |
11697–11699 | if (const FunctionDecl *Caller = dyn_cast<FunctionDecl>(CurContext)) if (Kind == SkipImplicitCaller && Caller->isImplicit()) return true; | |
11736–11737 | Now that we always use getCUDAContextDecl() as the first argument, perhaps we can just always retrieve the context inside the function. | |
clang/lib/Parse/ParseDecl.cpp | ||
2345 | @rsmith -- is this sufficient to catch all attempts to call an initializer for a global? | |
clang/test/SemaCUDA/function-overload.cu | ||
458 | I'd add more details here. |
I wonder if this patch will help with this case:
__device__ float fn(int) { return threadIdx.x; }; __host__ float fn(float); float gvar1 = []()__device__ { return fn(1);} (); // This ends up calling fn(int) on *host*
We seem to happily let host code call device function from a lambda function used as an initializer.
It's turned out that Sema::CheckCUDACall needs to consider global initializer as well. I will revise that part. But, technically, that's irrelevant to overloadable resolution. Should be prepared in another patch enhancing CheckCUDACall to check global initializers.
clang/include/clang/Sema/Sema.h | ||
---|---|---|
11656–11658 | Please capitalize the first word of each of these parameter descriptions, to match the style used elsewhere in Clang. | |
11657 | "Null" not "nullptr". | |
11664–11672 | Does this really need to be CUDA-specific? This is (at least) the third time we've needed this. We currently have a ManglingContextDecl on ExpressionEvaluationContextRecord that tracks the non-local variable whose initializer we're parsing. In addition to using this as a lambda context declaration, we also (hackily) use it as the context declaration for DiagRuntimeBehavior. It would seem sensible to use that mechanism here too (and rename it to remove any suggestion that this is specific to lambdas or mangling). I think we only currently push ExpressionEvaluationContexts for variable initializers in C++. That's presumably fine for CUDA's purposes. | |
clang/lib/Parse/ParseDecl.cpp | ||
2345 | No, this is not sufficient; it's missing (at least) the template instantiation case. (The ExpressionEvaluationContextRecord mechanism does handle that case properly.) You should also consider what should happen in default arguments (which are sometimes parsed before we form a FunctionDecl for the function for which they are parameters) and default member initializers (which are parsed after we know whether the enclosing class has a user-declared default constructor, so you could in principle consider the CUDA function kind of the declared constructors, I suppose -- but the constructor bodies are not yet available, so you can't tell which constructors would actually use the initializers). Both of those cases are also tracked by the ExpressionEvaluationContextRecord mechanism, though you may need to track additional information to process default arguments in the same mode as the function for which they are supplied. |
Sorry for the late reply. Really appreciate your feedback. Thanks!
clang/include/clang/Sema/Sema.h | ||
---|---|---|
11664–11672 | I tried that before adding the new non-local variable stack. Using ManglingContextDecl on ExpressionEvaluationContextRecord could serve some cases, but it cannot fit the case where the constructor needs resolving as well. When resolving the constructor, ManglingContextDecl scope is already closed and cannot be used to check the target of the global variables. Says the following code struct EC { int ec; __device__ EC() {} }; __device__ EC d_ec; I also tried enlarging the scope of ManglingContextDecl but that triggers even more issues for the generic C++ compilation. I'd appreciate any better solution as I agree that adding CUDA specific facilities should be minimized. | |
clang/lib/Parse/ParseDecl.cpp | ||
2345 | Could you elaborate more? I added new test cases requiring template instantiation. The current code handle them correctly. Do you refer to template variables? |
@rsmith do have u the chance to review the revised change again as well as my answers to your comments?
clang/include/clang/Sema/Sema.h | ||
---|---|---|
11678 | This is tricky because we could be in a nested context, not just the initializer, and that context just might not be a function. For example, there could be a local class in a lambda or something like that. |
clang/include/clang/Sema/Sema.h | ||
---|---|---|
11678 | You are right. Limit that to function and TU context so far. I need more efforts to consider other cases. One case in mind is that default member initialier in a class. But, for local classes in a lambda, they should be in a function (lambda function body) context. |
clang/include/clang/Sema/Sema.h | ||
---|---|---|
11680 | You really want this to match whenever we're in a local context, right? How about structuring the function like: if (CurContext->isFunctionOrMethod()) return cast<Decl>(CurContext); if (!CurContext->isFileContext()) return nullptr; return getCUDACurrentNonLocalVariable(); As a more general solution, I think Sema funnels all changes to CurContext through a small number of places, and you could make those places save and restore the currently initialized variable as well. |
clang/include/clang/Sema/Sema.h | ||
---|---|---|
11680 | Richard, I'd like your opinion about this. We have three separate patches right now that would all benefit from being able to track that they're currently within a variable/field initializer in Sema. And it's a general deficiency that it's hard to track declarations in initializers back to their initialized variable. Swift actually bit the bullet and introduced a kind of DeclContext that's a non-local initializer, and that links back to the variable. That would be hard to bring back to Clang with the current AST because Clang assumes that all DeclContexts are Decls, and I don't think we can reasonably remove that assumption; and of course VarDecl and FieldDecl aren't DeclContexts. Now, we could try to change that latter point. Making *all* VarDecls and FieldDecls DCs would have prohibitive memory overhead, since the vast majority are local / uninitialized; however, we could introduce a VarDecl subclass for global variables (including static member variables, of course), and similarly we could have a FieldDecl subclass for fields with initializers, which would nicely move some of the other overhead out-of-line and optimize for the C-style/old-style case. (We always know whether a field has an in-class initializer at parse time, right?) Less invasively, we could forget about trying to track this in the AST and just also track a current initialized variable in Sema. Anything which tried to change the context would have to save and restore that as well. That might be annoying because of PushDeclContext/PopDeclContext, though, which assume that you can restore the old context by just looking at the current context. | |
clang/lib/Sema/SemaDeclCXX.cpp | ||
16932 | The declaration could become invalid while processing its initializer; I think you should drop that condition. |
I'd add a comment describing that it's a wrapper which dispatches the call to one of more specific variants above.