This is an archive of the discontinued LLVM Phabricator instance.

[cuda][hip] Fix function overload resolution in the global initiailizer.
Needs ReviewPublic

Authored by hliao on Dec 9 2019, 1:37 PM.

Details

Summary
  • As global initializers are not under any function body, they need to look into the current variable being initialized. That is not addressed in the current CUDA/HIP overloadable function resolution and ignore target checking. That may result in wrong candidate to be considered as illustrated in the newly added test case.
  • In this patch, a non-local variable stack is introduced to keep track the current non-local variable being initialized so that initialization function could be inspected for the target preference.
  • Besides newly added tests, existing tests are refined as the current implementation adds extra checks on global initializers to ensure no device functions are used. As the target match checking is enabled in this patch, such check is only necessary for CUDA device global variables. They are not allowed to be non-trivially initialized. As HIP starts to support non-trivial initialization of device initialization, such target matching check is mandatory to be enforced.

Diff Detail

Event Timeline

hliao created this revision.Dec 9 2019, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2019, 1:37 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
hliao updated this revision to Diff 232933.Dec 9 2019, 1:38 PM

refine commit message

hliao edited the summary of this revision. (Show Details)Dec 9 2019, 1:39 PM
Harbormaster completed remote builds in B42158: Diff 232933.
hliao updated this revision to Diff 232938.Dec 9 2019, 1:42 PM

refine again

hliao edited the summary of this revision. (Show Details)Dec 9 2019, 1:42 PM

File PR44266 to track that bug.

tra added a subscriber: rsmith.Dec 10 2019, 12:16 PM

Looks good to me overall. I've pinged rsmith@ to double-check that we're covering all possibilities for non-local variable init.

clang/include/clang/Sema/Sema.h
11396

I'd add a comment describing that it's a wrapper which dispatches the call to one of more specific variants above.

11419–11420

Comment needs updating.

11432

Nit: I'd add an empty line between delarations and the function. Jammed together they are hard to read.

11461–11463
if (const FunctionDecl *Caller = dyn_cast<FunctionDecl>(CurContext))
  if (Kind == SkipImplicitCaller && Caller->isImplicit())
      return true;
11500–11501

Now that we always use getCUDAContextDecl() as the first argument, perhaps we can just always retrieve the context inside the function.

clang/lib/Parse/ParseDecl.cpp
2345

@rsmith -- is this sufficient to catch all attempts to call an initializer for a global?
I wonder if there are other sneaky ways to call an initializer.

clang/test/SemaCUDA/function-overload.cu
428

I'd add more details here.
The problem is that here the overload set has both functions and the one with the integer argument wins, even though it's a device function which we can't execute. We do handle similar cases during overload resolution in other places where we would prefer a callable function over a non-callable function with a better signature match.

tra added a comment.Dec 10 2019, 12:30 PM

I wonder if this patch will help with this case:

https://godbolt.org/z/X4KdsV

__device__ float fn(int) { return threadIdx.x; };
__host__ float fn(float);

float gvar1 = []()__device__ { return fn(1);} (); // This ends up calling fn(int) on *host*

We seem to happily let host code call device function from a lambda function used as an initializer.

hliao added a comment.EditedDec 10 2019, 7:27 PM
In D71227#1778136, @tra wrote:

I wonder if this patch will help with this case:

https://godbolt.org/z/X4KdsV

__device__ float fn(int) { return threadIdx.x; };
__host__ float fn(float);

float gvar1 = []()__device__ { return fn(1);} (); // This ends up calling fn(int) on *host*

We seem to happily let host code call device function from a lambda function used as an initializer.

It's turned out that Sema::CheckCUDACall needs to consider global initializer as well. I will revise that part. But, technically, that's irrelevant to overloadable resolution. Should be prepared in another patch enhancing CheckCUDACall to check global initializers.

hliao marked 6 inline comments as done.Jan 2 2020, 12:09 PM

refinements are made after comments from reviewers.

hliao updated this revision to Diff 235921.Jan 2 2020, 12:09 PM

code refinement after reviewers' comments.

rsmith added inline comments.Jan 6 2020, 11:02 AM
clang/include/clang/Sema/Sema.h
11420–11422

Please capitalize the first word of each of these parameter descriptions, to match the style used elsewhere in Clang.

11421

"Null" not "nullptr".

11428–11436

Does this really need to be CUDA-specific?

This is (at least) the third time we've needed this. We currently have a ManglingContextDecl on ExpressionEvaluationContextRecord that tracks the non-local variable whose initializer we're parsing. In addition to using this as a lambda context declaration, we also (hackily) use it as the context declaration for DiagRuntimeBehavior. It would seem sensible to use that mechanism here too (and rename it to remove any suggestion that this is specific to lambdas or mangling).

I think we only currently push ExpressionEvaluationContexts for variable initializers in C++. That's presumably fine for CUDA's purposes.

clang/lib/Parse/ParseDecl.cpp
2345

No, this is not sufficient; it's missing (at least) the template instantiation case. (The ExpressionEvaluationContextRecord mechanism does handle that case properly.)

You should also consider what should happen in default arguments (which are sometimes parsed before we form a FunctionDecl for the function for which they are parameters) and default member initializers (which are parsed after we know whether the enclosing class has a user-declared default constructor, so you could in principle consider the CUDA function kind of the declared constructors, I suppose -- but the constructor bodies are not yet available, so you can't tell which constructors would actually use the initializers). Both of those cases are also tracked by the ExpressionEvaluationContextRecord mechanism, though you may need to track additional information to process default arguments in the same mode as the function for which they are supplied.

hliao updated this revision to Diff 239328.Jan 21 2020, 8:11 AM
hliao marked 2 inline comments as done.
  • revise comment.
  • add tests requiring tempate instantiation.
hliao marked 2 inline comments as done.Jan 21 2020, 8:23 AM

Sorry for the late reply. Really appreciate your feedback. Thanks!

clang/include/clang/Sema/Sema.h
11428–11436

I tried that before adding the new non-local variable stack. Using ManglingContextDecl on ExpressionEvaluationContextRecord could serve some cases, but it cannot fit the case where the constructor needs resolving as well. When resolving the constructor, ManglingContextDecl scope is already closed and cannot be used to check the target of the global variables. Says the following code

struct EC {
  int ec;
__device__ EC() {}
};
__device__ EC d_ec;

I also tried enlarging the scope of ManglingContextDecl but that triggers even more issues for the generic C++ compilation. I'd appreciate any better solution as I agree that adding CUDA specific facilities should be minimized.

clang/lib/Parse/ParseDecl.cpp
2345

Could you elaborate more? I added new test cases requiring template instantiation. The current code handle them correctly. Do you refer to template variables?

hliao added a comment.Jan 30 2020, 7:05 PM

Sorry for the late reply. Really appreciate your feedback. Thanks!

@rsmith Have you chance to review the revised change?

hliao added a comment.Feb 4 2020, 1:39 PM

@rsmith do have u the chance to review the revised change again as well as my answers to your comments?

hliao updated this revision to Diff 245461.Feb 19 2020, 10:40 AM

Rebase to the latest trunk code.

rjmccall added inline comments.Feb 19 2020, 12:13 PM
clang/include/clang/Sema/Sema.h
11442

This is tricky because we could be in a nested context, not just the initializer, and that context just might not be a function. For example, there could be a local class in a lambda or something like that.

hliao updated this revision to Diff 245682.Feb 20 2020, 9:55 AM

Skip non-function or non-TU context so far as more cases need considering.

hliao marked 2 inline comments as done.Feb 20 2020, 10:00 AM
hliao added inline comments.
clang/include/clang/Sema/Sema.h
11442

You are right. Limit that to function and TU context so far. I need more efforts to consider other cases. One case in mind is that default member initialier in a class. But, for local classes in a lambda, they should be in a function (lambda function body) context.

rjmccall added inline comments.Feb 20 2020, 1:49 PM
clang/include/clang/Sema/Sema.h
11444

You really want this to match whenever we're in a local context, right? How about structuring the function like:

if (CurContext->isFunctionOrMethod())
  return cast<Decl>(CurContext);
if (!CurContext->isFileContext())
  return nullptr;
return getCUDACurrentNonLocalVariable();

As a more general solution, I think Sema funnels all changes to CurContext through a small number of places, and you could make those places save and restore the currently initialized variable as well.

hliao updated this revision to Diff 245735.Feb 20 2020, 2:01 PM
hliao marked an inline comment as done.

Rebase the code to the latest trunk.

hliao updated this revision to Diff 245747.Feb 20 2020, 2:24 PM

Revise following reviewer comments.

hliao marked an inline comment as done.Feb 20 2020, 2:26 PM
hliao updated this revision to Diff 245863.Feb 21 2020, 8:07 AM

Rebase to the trunk.

hliao updated this revision to Diff 246204.Feb 24 2020, 7:31 AM

Rebase to the trunk.

hliao updated this revision to Diff 246826.Feb 26 2020, 1:58 PM

Rebase to the latest trunk.

hliao added a comment.Feb 26 2020, 3:06 PM

@rjmccall @rsmith @tra, could you review on this revision?

hliao updated this revision to Diff 247248.Feb 28 2020, 6:07 AM

Fix pre-merge checks.

rjmccall added inline comments.Mar 1 2020, 11:46 PM
clang/include/clang/Sema/Sema.h
11444

Richard, I'd like your opinion about this. We have three separate patches right now that would all benefit from being able to track that they're currently within a variable/field initializer in Sema. And it's a general deficiency that it's hard to track declarations in initializers back to their initialized variable.

Swift actually bit the bullet and introduced a kind of DeclContext that's a non-local initializer, and that links back to the variable. That would be hard to bring back to Clang with the current AST because Clang assumes that all DeclContexts are Decls, and I don't think we can reasonably remove that assumption; and of course VarDecl and FieldDecl aren't DeclContexts.

Now, we could try to change that latter point. Making *all* VarDecls and FieldDecls DCs would have prohibitive memory overhead, since the vast majority are local / uninitialized; however, we could introduce a VarDecl subclass for global variables (including static member variables, of course), and similarly we could have a FieldDecl subclass for fields with initializers, which would nicely move some of the other overhead out-of-line and optimize for the C-style/old-style case. (We always know whether a field has an in-class initializer at parse time, right?)

Less invasively, we could forget about trying to track this in the AST and just also track a current initialized variable in Sema. Anything which tried to change the context would have to save and restore that as well. That might be annoying because of PushDeclContext/PopDeclContext, though, which assume that you can restore the old context by just looking at the current context.

clang/lib/Sema/SemaDeclCXX.cpp
16702

The declaration could become invalid while processing its initializer; I think you should drop that condition.

hliao updated this revision to Diff 247675.Mar 2 2020, 9:24 AM

Remove unncessary condition checking.

hliao marked an inline comment as done.Mar 2 2020, 9:42 AM
hliao updated this revision to Diff 250965.Mar 17 2020, 7:25 PM

Rebase to the latest trunk.

hliao updated this revision to Diff 251171.Mar 18 2020, 2:22 PM

Fix warnings from clang-tidy.

hliao updated this revision to Diff 251212.Mar 18 2020, 4:33 PM

Fix more clang-tidy warnings.

hliao updated this revision to Diff 255219.Apr 5 2020, 10:21 PM

Rebase to the latest trunk.

hliao updated this revision to Diff 257347.Apr 14 2020, 8:16 AM

Rebase to trunk.

LGTM. Can we get this in? There are other fixes depending on this. Thanks.

hliao updated this revision to Diff 261003.Apr 29 2020, 1:15 PM

Rebase to trunk and resolve the conflict.

tra added a comment.Jul 22 2020, 11:58 AM

Is this patch still actual?

hliao added a comment.Jul 22 2020, 1:45 PM

Hi @rsmith, @rjmccall and @tra what's your suggestion to make progress on this review?

In D71227#2167596, @tra wrote:

Is this patch still actual?

I need to rebase this to the latest trunk. Interrupt with other heavy loads.