This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Tweak attribute-based overload resolution to match nvcc behavior.
ClosedPublic

Authored by tra on Feb 3 2016, 4:25 PM.

Details

Summary

This is an artefact of split-mode CUDA compilation that we need to
mimic. HD functions are sometimes allowed to call H or D functions. Due
to split compilation mode device-side compilation will not see host-only
function and thus they will not be considered at all. For clang both H
and D variants will become function overloads visible to
compiler. Normally target attribute is considered only if C++ rules can
not determine which function is better. However in this case we need to
ignore functions that would not be present during current compilation
phase before we apply normal overload resolution rules.

Changes:

  • introduced another level of call preference to better describe possible call combinations.
  • removed WrongSide functions from consideration if the set contains SameSide function.
  • disabled H->D and D->H and G->H calls. These combinations are not allowed by CUDA and we were reluctantly allowing them to work around device-side calls to math functions in std namespace. We no longer need it after r258880.

Diff Detail

Repository
rL LLVM

Event Timeline

tra updated this revision to Diff 46849.Feb 3 2016, 4:25 PM
tra retitled this revision from to [CUDA] Tweak attribute-based overload resolution to match nvcc behavior. .
tra updated this object.
tra added reviewers: jlebar, jingyue, jpienaar, eliben.
tra added a subscriber: cfe-commits.
jingyue added inline comments.Feb 3 2016, 5:43 PM
include/clang/Sema/Sema.h
8801 ↗(On Diff #46849)

Why not merging CFP_SameSide and CFP_Native? You can conceptually treat HD as H in host mode or D in device mode. Then, it seems that we can just reuse CFP_Native to express the same meaning.

tra added a comment.Feb 3 2016, 6:16 PM

When overload set contains h and HD functions that are otherwise equal for
overload resolution, you want to be able to tell which one is better.

jingyue added inline comments.Feb 3 2016, 9:40 PM
lib/Sema/SemaCUDA.cpp
132–141 ↗(On Diff #46849)

Maybe a clearer way is to replace QuestionableResult with CFP_WrongSide in this if block, and after this if block say

if (DisabletargetCallChecks && CFP_WrongSide)
  return CFP_Never;
lib/Sema/SemaOverload.cpp
8544 ↗(On Diff #46849)

Why can't we just return CFP1>CFP2? What's a counter example for that?

8548 ↗(On Diff #46849)

You assume some numeric orders between these CFPs. May be worth adding a comment in CFP's definition.

tra updated this revision to Diff 46927.Feb 4 2016, 10:36 AM
tra marked an inline comment as done.

Addressed Jingyue's comments.
Fixed function-overload.cu tests to reflect stricter call target checks.

tra marked an inline comment as done.Feb 4 2016, 10:37 AM
tra added inline comments.
lib/Sema/SemaCUDA.cpp
132–141 ↗(On Diff #46849)

I've rearranged the code to make it easier to follow. I hope.

lib/Sema/SemaOverload.cpp
8544 ↗(On Diff #46849)

You can look at it this way -- if both callees are viable during compilation (i.e. we can generate code for both) we want to use C++ overload resolution rules to pick best one (and can't return here). If one is viable and another is not, then the viable one always wins and that's what the check is for.

See template_vs_hd_function() in the test for example when we should not return here.

jlebar accepted this revision.Feb 4 2016, 11:06 AM
jlebar edited edge metadata.

Looks sane to me. Just some suggestions on the comments.

lib/Sema/SemaCUDA.cpp
71 ↗(On Diff #46927)

If we're going to use symbols rather than letters, could we use 4, 3, 2, 1, 0? I think that would be easier to follow.

127 ↗(On Diff #46927)

Nit: "It's OK to call a mode-matching function from an HD function."

lib/Sema/SemaOverload.cpp
8536 ↗(On Diff #46927)

Since we have language lawyers on the team, suggest adding articles to comment:

If an HD function calls a function which has host-only and device-only overloads, nvcc sees only the host-side function during host compilation and only the device function during device-side compilation. (This appears to be a side-effect of its splitting of host and device code into separate TUs.) Alas we need to be compatible with existing code that relies on this, so if we see such a case, return the better variant right away.

I actually might suggest rephrasing this a bit more, to something like:

When performing host-side compilation, nvcc doesn't see device functions, and similarly when performing device-side compilation, nvcc doesn't see host functions. (This is a consequence of the fact that it splits host and device code into separate TUs.) We see all functions in both compilation modes, so to match nvcc's behavior, we need to exclude some overload candidates from consideration based only on their host/device attributes. Specifically, if one candidate call is WrongSide and the other is Native or SameSide, we ignore the WrongSide candidate. If we don't return early here, we'll consider the CUDA target attributes again later in this function, as a tiebreaker between calls with otherwise identical priority according to the regular C++ overloading rules.

test/CodeGenCUDA/function-overload.cu
96 ↗(On Diff #46927)

call

This revision is now accepted and ready to land.Feb 4 2016, 11:06 AM
jingyue accepted this revision.Feb 4 2016, 11:17 AM
jingyue edited edge metadata.
jingyue added inline comments.Feb 5 2016, 11:00 AM
test/SemaCUDA/function-overload.cu
138 ↗(On Diff #46927)

This line seems to have a trailing space.

tra updated this revision to Diff 47335.Feb 9 2016, 10:14 AM
tra updated this object.
tra edited edge metadata.
tra marked 3 inline comments as done.

Updated the way WrongSide functions are removed from consideration during overload resolution.
Previous version could provide inconsistent results depending on the order of candidates in the overload set.
This version removes WrongSide functions from the set before we perform pair-wise comparison of candidates.

Added another test case to make sure HD->{H,H} or HD->{D,D} overloads are resolved consistently during host and device compilation.

tra planned changes to this revision.Feb 9 2016, 10:18 AM

Previously accepted version was dependent on order of functions in overload set.
In order to make ordering consistent, WrongSide functions are now removed from the
set before pair-wise comparison of candidates. Please take a look.

lib/Sema/SemaCUDA.cpp
71 ↗(On Diff #46927)

I considered that but found that numbers require more mental effort to map them back to what's supposed to happen for particular call combination. Letters were not sufficiently distinct visually (hard to tell l/f h/n at a glance). Current scheme works best for me. '+*o' == good. '-.' == bad.

tra requested a review of this revision.Feb 11 2016, 1:46 PM

@jingyue, @jlebar: can you take a look at the updated version?

This revision is now accepted and ready to land.Feb 11 2016, 1:46 PM

Mostly comments on comments.

lib/Sema/SemaCUDA.cpp
71 ↗(On Diff #47335)

What if we used the following mapping?

N = native
HD = host+device
SS = same-side
WS = wrong-side
- = never

This mimics how we were writing on the whiteboard.

115 ↗(On Diff #47335)

I'm not sure "fallback" is the right word to use here anymore, as HD --> HD gets priority HD.

127 ↗(On Diff #47335)

Suggest "compilation-mode matching"

lib/Sema/SemaOverload.cpp
8731 ↗(On Diff #47335)

We should be consistent wrt whether we call it a compilation pass, or (as above) compilation mode, or whatever. (I think "mode" may be right.)

Also please add an article: "During a particular compilation mode".

8738 ↗(On Diff #47335)

Maybe name this something more closely tied to what it is -- e.g. ContainsSameSideCall -- and then add a comment in the if statement saying "Remove wrong-side calls from consideration."

8752 ↗(On Diff #47335)

This is an indentation mouthful -- can we pull the lambda out, maybe?

8757 ↗(On Diff #47335)

Suggest auto*, so it's clear we're not copying things.

8771 ↗(On Diff #47335)

auto*

test/CodeGenCUDA/function-overload.cu
81 ↗(On Diff #47335)

Nit: Suggest American spelling, "artifact", which is much more common in llvm codebase.

Maybe also remove this sentence, or move it down somewhere later -- this feels like a bad "topic sentence" for the paragraph. e.g.

HD functions are sometimes allowed to call H or D functions -- this is an artifact of the source-to-source splitting performed by nvcc that we need to mimic.

86 ↗(On Diff #47335)

This is setting up a contrast between nvcc and clang, so suggest connecting the phrases with "but" or "in contrast". Also suggest being specific that we're talking about nvcc -- since split-mode compilation isn't a thing in clang, if we just talk about it generally, it's not clear what we're referring to. e.g.

During device mode compilation in nvcc, host functions aren't present at all, so don't participate in overloading. But in clang, H and D functions are present in both compilation modes. Clang normally uses the target attribute as a tiebreaker between overloads with otherwise identical priority, but in order to match nvcc's behavior, we sometimes need to wholly discard overloads that would not be present during compilation under nvcc.

98 ↗(On Diff #47335)

This was a bit hard to follow, suggest rewording slightly:

Here we expect to call the templated function during host compilation, even if -fcuda-disable-target-call-checks is passed, and even though C++ overload rules prefer the non-templated function.

110 ↗(On Diff #47335)

call the overloaded function

128 ↗(On Diff #47335)

If we have a mix

in the overload set, normal

162 ↗(On Diff #47335)

when the overload set

I'll defer to Justin's approval.

tra updated this revision to Diff 47753.Feb 11 2016, 4:42 PM
tra marked 14 inline comments as done.

Addressed @jlebar's comments.

This revision was automatically updated to reflect the committed changes.