Page MenuHomePhabricator

[cuda][hip] Fix function overload resolution in the global initiailizer.
Needs ReviewPublic

Authored by hliao on Dec 9 2019, 1:37 PM.

Details

Summary
  • As global initializers are not under any function body, they need to look into the current variable being initialized. That is not addressed in the current CUDA/HIP overloadable function resolution and ignore target checking. That may result in wrong candidate to be considered as illustrated in the newly added test case.
  • In this patch, a non-local variable stack is introduced to keep track the current non-local variable being initialized so that initialization function could be inspected for the target preference.
  • Besides newly added tests, existing tests are refined as the current implementation adds extra checks on global initializers to ensure no device functions are used. As the target match checking is enabled in this patch, such check is only necessary for CUDA device global variables. They are not allowed to be non-trivially initialized. As HIP starts to support non-trivial initialization of device initialization, such target matching check is mandatory to be enforced.

Event Timeline

hliao created this revision.Dec 9 2019, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2019, 1:37 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
hliao updated this revision to Diff 232933.Dec 9 2019, 1:38 PM

refine commit message

hliao edited the summary of this revision. (Show Details)Dec 9 2019, 1:39 PM
Harbormaster completed remote builds in B42158: Diff 232933.
hliao updated this revision to Diff 232938.Dec 9 2019, 1:42 PM

refine again

hliao edited the summary of this revision. (Show Details)Dec 9 2019, 1:42 PM

File PR44266 to track that bug.

tra added a subscriber: rsmith.Dec 10 2019, 12:16 PM

Looks good to me overall. I've pinged rsmith@ to double-check that we're covering all possibilities for non-local variable init.

clang/include/clang/Sema/Sema.h
11292

I'd add a comment describing that it's a wrapper which dispatches the call to one of more specific variants above.

11315–11316

Comment needs updating.

11328

Nit: I'd add an empty line between delarations and the function. Jammed together they are hard to read.

11352–11354
if (const FunctionDecl *Caller = dyn_cast<FunctionDecl>(CurContext))
  if (Kind == SkipImplicitCaller && Caller->isImplicit())
      return true;
11391–11392

Now that we always use getCUDAContextDecl() as the first argument, perhaps we can just always retrieve the context inside the function.

clang/lib/Parse/ParseDecl.cpp
2345

@rsmith -- is this sufficient to catch all attempts to call an initializer for a global?
I wonder if there are other sneaky ways to call an initializer.

clang/test/SemaCUDA/function-overload.cu
428

I'd add more details here.
The problem is that here the overload set has both functions and the one with the integer argument wins, even though it's a device function which we can't execute. We do handle similar cases during overload resolution in other places where we would prefer a callable function over a non-callable function with a better signature match.

tra added a comment.Dec 10 2019, 12:30 PM

I wonder if this patch will help with this case:

https://godbolt.org/z/X4KdsV

__device__ float fn(int) { return threadIdx.x; };
__host__ float fn(float);

float gvar1 = []()__device__ { return fn(1);} (); // This ends up calling fn(int) on *host*

We seem to happily let host code call device function from a lambda function used as an initializer.

hliao added a comment.EditedDec 10 2019, 7:27 PM
In D71227#1778136, @tra wrote:

I wonder if this patch will help with this case:

https://godbolt.org/z/X4KdsV

__device__ float fn(int) { return threadIdx.x; };
__host__ float fn(float);

float gvar1 = []()__device__ { return fn(1);} (); // This ends up calling fn(int) on *host*

We seem to happily let host code call device function from a lambda function used as an initializer.

It's turned out that Sema::CheckCUDACall needs to consider global initializer as well. I will revise that part. But, technically, that's irrelevant to overloadable resolution. Should be prepared in another patch enhancing CheckCUDACall to check global initializers.

hliao marked 6 inline comments as done.Jan 2 2020, 12:09 PM

refinements are made after comments from reviewers.

hliao updated this revision to Diff 235921.Jan 2 2020, 12:09 PM

code refinement after reviewers' comments.

rsmith added inline comments.Jan 6 2020, 11:02 AM
clang/include/clang/Sema/Sema.h
11316–11318

Please capitalize the first word of each of these parameter descriptions, to match the style used elsewhere in Clang.

11317

"Null" not "nullptr".

11324–11332

Does this really need to be CUDA-specific?

This is (at least) the third time we've needed this. We currently have a ManglingContextDecl on ExpressionEvaluationContextRecord that tracks the non-local variable whose initializer we're parsing. In addition to using this as a lambda context declaration, we also (hackily) use it as the context declaration for DiagRuntimeBehavior. It would seem sensible to use that mechanism here too (and rename it to remove any suggestion that this is specific to lambdas or mangling).

I think we only currently push ExpressionEvaluationContexts for variable initializers in C++. That's presumably fine for CUDA's purposes.

clang/lib/Parse/ParseDecl.cpp
2345

No, this is not sufficient; it's missing (at least) the template instantiation case. (The ExpressionEvaluationContextRecord mechanism does handle that case properly.)

You should also consider what should happen in default arguments (which are sometimes parsed before we form a FunctionDecl for the function for which they are parameters) and default member initializers (which are parsed after we know whether the enclosing class has a user-declared default constructor, so you could in principle consider the CUDA function kind of the declared constructors, I suppose -- but the constructor bodies are not yet available, so you can't tell which constructors would actually use the initializers). Both of those cases are also tracked by the ExpressionEvaluationContextRecord mechanism, though you may need to track additional information to process default arguments in the same mode as the function for which they are supplied.

hliao updated this revision to Diff 239328.Tue, Jan 21, 8:11 AM
hliao marked 2 inline comments as done.
  • revise comment.
  • add tests requiring tempate instantiation.
hliao marked 2 inline comments as done.Tue, Jan 21, 8:23 AM

Sorry for the late reply. Really appreciate your feedback. Thanks!

clang/include/clang/Sema/Sema.h
11324–11332

I tried that before adding the new non-local variable stack. Using ManglingContextDecl on ExpressionEvaluationContextRecord could serve some cases, but it cannot fit the case where the constructor needs resolving as well. When resolving the constructor, ManglingContextDecl scope is already closed and cannot be used to check the target of the global variables. Says the following code

struct EC {
  int ec;
__device__ EC() {}
};
__device__ EC d_ec;

I also tried enlarging the scope of ManglingContextDecl but that triggers even more issues for the generic C++ compilation. I'd appreciate any better solution as I agree that adding CUDA specific facilities should be minimized.

clang/lib/Parse/ParseDecl.cpp
2345

Could you elaborate more? I added new test cases requiring template instantiation. The current code handle them correctly. Do you refer to template variables?

hliao added a comment.Thu, Jan 30, 7:05 PM

Sorry for the late reply. Really appreciate your feedback. Thanks!

@rsmith Have you chance to review the revised change?

hliao added a comment.Tue, Feb 4, 1:39 PM

@rsmith do have u the chance to review the revised change again as well as my answers to your comments?