This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Make unattributed constexpr functions (usually) implicitly host+device.
ClosedPublic

Authored by jlebar on Mar 22 2016, 3:08 PM.

Details

Summary

[CUDA] Make unattributed constexpr functions implicitly host+device.

With this patch, by a constexpr function is implicitly host+device
unless:

a) it's a variadic function (variadic functions are not allowed on the device side), or
b) it's preceeded by a device overload in a system header.

The restriction on overloading host device functions on the
basis of their CUDA attributes remains in place, but we use (b) to allow
us to define device overloads for constexpr functions in cmath,
which would otherwise be host device and thus not overloadable.

You can disable this behavior with -fno-cuda-host-device-constexpr.

Diff Detail

Event Timeline

jlebar updated this revision to Diff 51352.Mar 22 2016, 3:08 PM
jlebar retitled this revision from to [CUDA] Implement -fcuda-relaxed-constexpr, and enable it by default..
jlebar updated this object.
jlebar added a reviewer: tra.
jlebar added subscribers: rsmith, rnk, cfe-commits.
jlebar updated this revision to Diff 51357.Mar 22 2016, 3:48 PM

Actually run the tests, and fix the CUDA overloading test.

tra edited edge metadata.Mar 22 2016, 4:56 PM

Now that H/D and HD cal all be in the same overload set, we'll also need additional tests in CodeGenCUDA/function-overload.cu for cases that now became legal.

In D18380#381025, @tra wrote:

Now that H/D and HD cal all be in the same overload set, we'll also need additional tests in CodeGenCUDA/function-overload.cu for cases that now became legal.

There are lots of tests that used to be compile errors and now aren't -- what do you think we're missing?

tra added a subscriber: tra.Mar 22 2016, 5:07 PM

We need tests to demonstrate that we pick correct function when we have mix
of HD+H/D in the overload set.
Existing tests only cover resolution of {HD,HD}, {H,H} {D,D} {H,D} sets

jlebar updated this revision to Diff 51366.Mar 22 2016, 5:14 PM
jlebar edited edge metadata.

Add tests checking host+device overloading.

In D18380#381031, @tra wrote:

We need tests to demonstrate that we pick correct function when we have mix
of HD+H/D in the overload set.
Existing tests only cover resolution of {HD,HD}, {H,H} {D,D} {H,D} sets

Aha, got it. I think adding this is simple given the existing framework -- lmk what you think.

jlebar updated this revision to Diff 51384.Mar 22 2016, 8:54 PM

Update test as discussed -- now we check that we're invoking the correct overloads.

tra accepted this revision.Mar 23 2016, 10:37 AM
tra edited edge metadata.
This revision is now accepted and ready to land.Mar 23 2016, 10:37 AM
rsmith added inline comments.Mar 23 2016, 10:58 AM
include/clang/Driver/CC1Options.td
694–695

Is there a better name we can use for this? I don't think this is "relaxed" in any obvious sense. -fcuda-host-device-constexpr or -fcuda-constexpr-on-device might be clearer?

lib/Driver/Tools.cpp
3597 ↗(On Diff #51384)

For flags that are enabled by default, we usually have the -cc1 flag be a -fno-* flag. This allows people to use (for instance) clang blah.cu -Xclang -fno-cuda-relaxed-constexpr if necessary.

lib/Sema/SemaOverload.cpp
1132

No parens around == comparisons.

jlebar added inline comments.Mar 23 2016, 11:30 AM
include/clang/Driver/CC1Options.td
694–695

"relaxed constexpr" is nvidia's term -- do you think it might be helpful to use the same terminology? I understand there's some prior art here, with respect to clang accepting gcc's flags, although the situation here is of course different.

lib/Driver/Tools.cpp
3597 ↗(On Diff #51384)

Yeah, Artem and I had a discussion about this yesterday. As you can see, there are two other flags above which are turned on by default -- these also lack -fno variants.

I think it would be good to be consistent here. I'm tempted to add another patch below this one which makes the other two -fno, then we can make this one -fno as well. It seems that convention is to just get rid of the existing non-fno flags, rather than leave both positive and negative versions.

Does that sound OK to you?

rsmith added inline comments.Mar 23 2016, 11:35 AM
include/clang/Driver/CC1Options.td
694–695

I think it's problematic to use that terminology, as "relaxed constexpr" is also used to describe the C++14 constexpr rules (see n3652).

lib/Driver/Tools.cpp
3597 ↗(On Diff #51384)

Yes, that sounds fine.

jlebar added inline comments.Mar 23 2016, 1:24 PM
include/clang/Driver/CC1Options.td
694–695

Heh, I can't argue with that.

lib/Driver/Tools.cpp
3597 ↗(On Diff #51384)

Okay, thank you. After talking to Artem, we're just going to remove those two flags entirely. So after we convert relaxed-constexpr to an fno flag, there should be no changes to this file in this patch.

jlebar updated this revision to Diff 51479.Mar 23 2016, 3:18 PM
jlebar edited edge metadata.

Switch to -fno-cuda-host-device-constexpr. Only implicitly add the attributes
on functions which themselves lack host/device attributes. Add more tests.

Changed as discussed. Please have another look. Thank you for your continued patience here.

tra added inline comments.Mar 23 2016, 3:38 PM
lib/Sema/SemaDecl.cpp
8015–8017

Can we have constexpr __global__ ?

jlebar updated this revision to Diff 51495.Mar 23 2016, 5:12 PM

Add check for global constexpr functions.

lib/Sema/SemaDecl.cpp
8015–8017

Yikes. We're saved (unless Richard has a tricky counterexample) because kernels must be void and constexpr must not be void. But I'll add a check here anyway.

Richard, are you happy here?

rsmith accepted this revision.Mar 24 2016, 11:10 AM
rsmith added a reviewer: rsmith.

The change to allow __host__ __device__ functions to be overloaded with other combinations of target attributes appears to be separable from the constexpr change; please split it out and commit it first.

include/clang/Basic/LangOptions.def
175

This should be a noun phrase -- this string appears in contexts like "support for %0 is enabled" -- so this should be "treating unattributed [...]".

lib/Sema/SemaDecl.cpp
8015–8017

constexpr functions can return void in a couple of different ways (in C++11, if they're template specializations with dependent return types that instantiate to void, and in C++14 there is no restriction on constexpr functions returning void).

jlebar marked 2 inline comments as done.Mar 24 2016, 1:04 PM
jlebar added inline comments.
include/clang/Basic/LangOptions.def
175

Thanks. This is fixed in my patch queue, and I will push a change for the other ones as part of this patch queue.

jlebar marked an inline comment as done.Mar 24 2016, 1:06 PM

Okay, just one more patch, D18458, then I think we're good here. (This is split up into two patches in my queue.)

Thanks for your help, Richard.

jlebar updated this object.
jlebar edited edge metadata.
jlebar updated this object.
jlebar added a reviewer: rnk.
jlebar updated this object.
jlebar updated this object.
jlebar retitled this revision from [CUDA] Implement -fcuda-relaxed-constexpr, and enable it by default. to [CUDA] Make unattributed constexpr functions (usually) implicitly host+device..Mar 28 2016, 6:52 PM
jlebar updated this revision to Diff 51868.Mar 28 2016, 6:57 PM

Update per changes to patch description. Now a constexpr becomes implicitly HD
unless there's a preceeding device overload.

Updated as discussed -- please have a look.

tra added a comment.Mar 28 2016, 7:39 PM

I wonder if we can find a way to decide whether particular constexpr function should be treated as HD or not without relying on particular order the functions are seen by compiler (or whether they come from system headers).

Right now we're relying on checking overloads of constexpr's function decl once and applying HD attributes based on state of overload set at the point in TU. We then use those attributes during overload resolution.

What if instead of permanently sticking HD attributes on the constexpr function, we instead postpone decision to the point of overload resolution and figure out effective attributes or call preference based on contents of the whole overload set regardless of the order the decls were added to the set.

test/SemaCUDA/host-device-constexpr.cu
31–32

"should prevent this"

jlebar added a comment.EditedMar 28 2016, 9:10 PM
In D18380#385240, @tra wrote:

What if instead of permanently sticking HD attributes on the constexpr function, we instead postpone decision to the point of overload resolution and figure out effective attributes or call preference based on contents of the whole overload set regardless of the order the decls were added to the set.

The problem we were trying to prevent by requiring that the __device__ overload come first is:

constexpr int foo();
__device__ void bar() { foo(); }
__device__ int foo();
__device__ void baz() { foo(); }

In this example, we're forced to instantiate both versions of foo() on the device. Being lazy about making the first foo HD doesn't help, because at the time we see bar, it's the only option available.

(Instantiating both foos is a problem if they have the same mangling. And we want them to have the same mangling so we maintain ABI compatibility with nvcc.)

(Just to be clear, I'm waiting on Richard's review here, even though he lg'ed an version of this patch.)

jlebar marked an inline comment as done.Mar 30 2016, 4:21 PM

Thank you all your time here, Art, Reid, and Richard. Fingers crossed we don't have to worry about this again for a while...

This revision was automatically updated to reflect the committed changes.