Page MenuHomePhabricator

[CUDA] Allow function overloads based on host/device attributes.

Authored by tra on Aug 28 2015, 3:52 PM.



The patch makes it possible to parse CUDA files that contain host/device functions with identical signatures, but different attributes without having to physically split source into host-only and device-only parts.

This change is needed in order to parse CUDA header files that have a lot of name clashes with standard include files.

Gory details are in design doc here:
Feel free to leave comments there or in this review thread.

This feature is controlled with CC1 option -fcuda-target-overloads and is disabled by default.

Diff Detail


Event Timeline

tra updated this revision to Diff 33483.Aug 28 2015, 3:52 PM
tra retitled this revision from to [CUDA] Allow function overloads based on host/device attributes..
tra updated this object.
tra added a subscriber: cfe-commits.
eliben added inline comments.Aug 31 2015, 3:59 PM
8599 ↗(On Diff #33483)

The comment should explain exactly how it depends on the option

504 ↗(On Diff #33483)

Why device before host?

98 ↗(On Diff #33483)

Why allow Caller to be nullptr but not Callee? Also, != nullptr?

2277 ↗(On Diff #33483)

Nit: wrap the ++I in braces too, otherwise this looks really weird

tra updated this revision to Diff 33654.Aug 31 2015, 5:12 PM
tra marked 2 inline comments as done.

addressed eliben@'s comments.

8599 ↗(On Diff #33483)


504 ↗(On Diff #33483)

Name mangling chapter in Itanium CXX ABI says:

In cases where multiple order-insensitive qualifiers are present, they should be ordered [...] with the 'U' qualifiers in alphabetical order by the vendor name

So I've ordered attributes alphabetically.

98 ↗(On Diff #33483)

Caller may be null in global context (e.g. we're assigning function address to a global variable). Callee can't be. I'll add an assert().

Also, != nullptr?

I'm not quite sure what you mean here. Are you suggesting replacing "Caller ?" with "Caller != nullptr ?"

eliben accepted this revision.Sep 1 2015, 12:26 PM
eliben edited edge metadata.

The CUDA parts look very good. Someone else should approve the overloading-related logic

8605 ↗(On Diff #33654)

Document this function

[I realize the other CUDA functions are not documented here :-(, but hey this is new code so it should follow the rules]

98 ↗(On Diff #33654)

Yes, I believe this is the accepted style when comparing pointers for null-ness.

106 ↗(On Diff #33654)

Not just global from global. global from device too, right? As for global from HD, the CUDA guide forbids it

529 ↗(On Diff #33654)

Is this part related to this patch?

This revision is now accepted and ready to land.Sep 1 2015, 12:26 PM
tra updated this revision to Diff 33741.Sep 1 2015, 3:08 PM
tra updated this object.
tra edited edge metadata.
tra marked an inline comment as done.

Removed builtin-related changes(D12122). Will commit them separately.
Added more test cases.
Addressed eliben@'s comments.

tra marked 2 inline comments as done.Sep 1 2015, 3:10 PM
tra added inline comments.
106 ↗(On Diff #33654)

I'll update the comment.

As for HD->G, nvcc happily compiles following code:

__global__ void kernel() {}
__host__ __device__ void foo() {
#if !defined(__CUDA_ARCH__)

Nvcc does produce an error for HD->G call during device compilation (the error actually complains about D->G or G->G calling). This patch matches nvcc behavior.

529 ↗(On Diff #33654)

It's part of D12122 which broke some of your team's tests and got rolled back.
It's a prerequisite for overloads to work (otherwise anything that uses a builtin would violate calling convention either during device or during host compilation) and it also needs to be hidden behind some option so it does not break your tests again.

I think I can commit it separately after the overload patch. Overloading will not work with builtins until then, but I don't think it's a big deal as there are no users yet.

rsmith added inline comments.Sep 1 2015, 4:22 PM
85 ↗(On Diff #33654)

I would prefer the more verbose isTargetBuiltin or isTargetSpecificBuiltin -- I don't think it will be obvious at call sites what this does if we use this abbreviation in the public interface.

2259 ↗(On Diff #33654)

I don't see any test coverage for this; please add some tests that declare usual deallocation functions with CUDA host/device attributes and check that they behave as expected.

2276 ↗(On Diff #33654)

Use resize, not set_size.

10119–10143 ↗(On Diff #33654)

Please factor out the common code shared by this and FindUsualDeallocationFunction.

tra updated this revision to Diff 34059.EditedSep 4 2015, 12:46 PM
tra marked 2 inline comments as done.

Added more test cases to cover constructor/destructor/new/delete.
Refactored code to remove unwanted CUDA functions from Lookup match results.
Disabled HD overloading of H or D destructors to guarantee single destructor for codegen. (I'm still investigating whether it makes sense to enforce the same restriction on all functions as pcc@ suggested in design doc).

rsmith accepted this revision.Sep 4 2015, 4:53 PM
rsmith edited edge metadata.
rsmith added inline comments.
504 ↗(On Diff #34059)

If we have enable_if on a __host__ __device__ function, I think we should mangle as Ua6deviceUa9enable_ifI...EUa4host. (The relative order of enable_if attributes matters, but the order of them relative to host and device does not, so retaining alphabetical order seems best.)

216 ↗(On Diff #34059)

We don't need the overhead of std::function here. Use this instead:

template<typename T, typename FetchDeclFn>
static void Erase...(
  FetchDeclFn FetchDecl) {
tra updated this revision to Diff 34230.EditedSep 8 2015, 10:39 AM
tra edited edge metadata.
tra marked 2 inline comments as done.

Implemented Richard Smiths' suggestions:
Fixed attribute order in name mangling.
Replaced std::function with template argument.

tra marked 2 inline comments as done.Sep 8 2015, 10:40 AM
tra updated this revision to Diff 34368.Sep 9 2015, 2:24 PM

Implemented pcc@'s suggestion to disallow mixing HD and H/D functions.
Removed name mangling as it's no longer needed. Restriction on HD overloading guarantees that we'll emit only one viable function during particular side of compilation (H or HD on the host side, D or HD on device side).
Updated test cases to reflect reduced number of attribute combinations we need to deal with now.

This revision was automatically updated to reflect the committed changes.