This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix overloading resolution in global variable initializer
ClosedPublic

Authored by yaxunl on Aug 17 2023, 9:22 PM.

Details

Summary

Currently, clang does not resolve certain overloaded functions correctly in the initializer
of global variables, e.g.

template<typename T1, typename U>
T1 mypow(T1, U);

__attribute__((device)) double mypow(double, int);

double t_extent = mypow(1.0, 2);

In the above example, mypow is supposed to resolve to the host version
but clang resolves it to the device version instead, and emits an error
(https://godbolt.org/z/17xxzaa67).

However, if the variable is assigned in a host function, there is no error.
The discrepancy in overloading resolution inside and outside of
a function is due to clang not accounting for the host/device target
when resolving functions called in the initializer of a global variable.

This patch introduces a global host/device target context for CUDA/HIP
for functions called outside of functions. For global variable initialization,
it is determined by the host/device attribute of the variable. For other
situations, a default value of host_device is sufficient.

Diff Detail

Unit TestsFailed

Event Timeline

yaxunl created this revision.Aug 17 2023, 9:22 PM
yaxunl requested review of this revision.Aug 17 2023, 9:22 PM
tra added a comment.Aug 18 2023, 10:36 AM

Same reproducer but for CUDA: https://godbolt.org/z/WhjTMffnx

clang/include/clang/Sema/Sema.h
4753

It appears that Declarator D here is only used as an attribute carrier used to identify CUDA calling target.
Should we pass CudaTarget ContextTarget instead and let the caller figure out how to find it?

I'm just thinking that we're hardcoding just one specific way to find the target, while there may potentially be more.
The current way is OK, as we have just one use case at the moment.

clang/lib/Sema/SemaCUDA.cpp
137

Style nit: no braces around single-statement body.

clang/test/CodeGenCUDA/global-initializers.cu
11–12

We don't really need templates to reproduce the issue. We just need a host function with lower overloading priority. A function requiring type conversion or with an additional default argument should do. E.g. float pow(float, int); or double X = pow(double, int, bool lower_priority_host_overload=1);

Removing template should unclutter the tests a bit.

yaxunl marked 3 inline comments as done.Aug 18 2023, 9:35 PM
yaxunl added inline comments.
clang/include/clang/Sema/Sema.h
4753

will do

clang/lib/Sema/SemaCUDA.cpp
137

will fix

clang/test/CodeGenCUDA/global-initializers.cu
11–12

will do

yaxunl updated this revision to Diff 551705.Aug 18 2023, 9:39 PM
yaxunl marked 3 inline comments as done.

revised by comments

tra accepted this revision.Aug 28 2023, 10:22 AM
This revision is now accepted and ready to land.Aug 28 2023, 10:22 AM
This revision was landed with ongoing or failed builds.Aug 29 2023, 7:17 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 7:17 AM
yaxunl reopened this revision.Sep 7 2023, 4:48 AM

The patch was reverted since it caused regressions on Windows for HIP. A reduced test case is:

typedef void (__stdcall* funcTy)();
void invoke(funcTy f);

static void __stdcall callee() noexcept {
}

void foo() {
   invoke(callee);
}

It is due to clang missed handling host/device attributes for calling convention at a few places.

Reopen to fix it.

This revision is now accepted and ready to land.Sep 7 2023, 4:48 AM
yaxunl closed this revision.Sep 7 2023, 6:29 AM

Phabricator no longer allows me to update the patch. Created PR in github https://github.com/llvm/llvm-project/pull/65606

clang/test/SemaCUDA/function-overload.cu