This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Overload `std::isnan` and friends multiple times for the GPU
ClosedPublic

Authored by jdoerfert on Aug 12 2020, 11:31 PM.

Details

Summary

std::isnan and friends can be found in two variants in the wild, one
returns bool, as the standard defines it, one returns int, as the C
macros do. So far we kinda hoped the system versions of these functions
will work for people, e.g. they are definitions that can be compiled for
the target. We know that is not the case always so we leverage the
disable_implicit_base OpenMP context extension to specialize both
versions of these functions without causing an invalid redeclaration.

Diff Detail

Event Timeline

jdoerfert created this revision.Aug 12 2020, 11:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2020, 11:31 PM
jdoerfert requested review of this revision.Aug 12 2020, 11:31 PM
jdoerfert updated this revision to Diff 285385.Aug 13 2020, 8:34 AM

Fix problem, add test

tra added a subscriber: tra.Aug 13 2020, 9:31 AM
tra added inline comments.
clang/lib/Headers/__clang_cuda_cmath.h
85

If you just want to disable some existing declarations that get in the way, one way to do it would be to redeclare them with an __arrtibute__((enable_if(false)))

Having overloads with different return types will be observable.

jdoerfert added inline comments.Aug 13 2020, 12:10 PM
clang/lib/Headers/__clang_cuda_cmath.h
85

We need both overloads as we don't know what return type the system uses. I modeled the test below this way, that is we don't know if isnan has a bool or int return type.

Having overloads with different return types will be observable.

Unsure what observable effect you expect, the variants are there, yes, but they have different names (wrt the base function and the other variant function). The variant without a base function is simply an unused internal function. Could you elaborate what problem you expect?

(I think the tests are failing due to missing parent revisions in the build, these have to go in in-order.)

tra added inline comments.Aug 13 2020, 1:31 PM
clang/lib/Headers/__clang_cuda_cmath.h
85

What will be the result of sizeof(isinf(1.0f)) ? I would expect it to be the same on host and on the device.
I'm not quite sure what the pragma would do, so it's possible I'm barking at the wrong tree here.

jdoerfert added inline comments.Aug 13 2020, 4:01 PM
clang/lib/Headers/__clang_cuda_cmath.h
85

So, I actually had to run this to verify what I suspected would happen:

sizeof(isinf(1.0f)) is in the AST usually:

|         `-UnaryExprOrTypeTraitExpr 0x5586a27bd590 <col:14, col:37> 'unsigned long' sizeof                                                                                                                                                                  
|           `-ParenExpr 0x5586a27bd570 <col:20, col:37> 'bool'
|             `-CallExpr 0x5586a27bd548 <col:21, col:36> 'bool'
|               |-ImplicitCastExpr 0x5586a27bd530 <col:21, col:26> 'bool (*)(float)' <FunctionToPointerDecay>
|               | `-DeclRefExpr 0x5586a27bd500 <col:21, col:26> 'bool (float)' lvalue Function 0x5586a276f9f0 'isinf' 'bool (float)' non_odr_use_unevaluated
|               `-FloatingLiteral 0x5586a27bd290 <col:32> 'float' 1.000000e+00

If isinf has an applicable variant, it will be picked up:

|         `-UnaryExprOrTypeTraitExpr 0x55f9ac949a20 <col:14, col:37> 'unsigned long' sizeof                                                                                                                                                                  
|           `-ParenExpr 0x55f9ac949a00 <col:20, col:37> 'bool'
|             `-PseudoObjectExpr 0x55f9ac9499e0 <col:21, col:36> 'bool'
|               |-CallExpr 0x55f9ac949978 <col:21, col:36> 'bool'
|               | |-ImplicitCastExpr 0x55f9ac949960 <col:21, col:26> 'bool (*)(float)' <FunctionToPointerDecay>
|               | | `-DeclRefExpr 0x55f9ac949930 <col:21, col:26> 'bool (float)' lvalue Function 0x55f9ac7cd1d0 'isinf' 'bool (float)' non_odr_use_unevaluated
|               | `-FloatingLiteral 0x55f9ac9496c0 <col:32> 'float' 1.000000e+00
|               `-CallExpr 0x55f9ac9499b8 </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20, //data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:36> 'bool'
|                 |-ImplicitCastExpr 0x55f9ac9499a0 </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20> 'bool (*)(float) __attribute__((nothrow))' <FunctionToPointerDecay>
|                 | `-DeclRefExpr 0x55f9ac939790 <col:20> 'bool (float) __attribute__((nothrow))' Function 0x55f9ac939690 'isinf[implementation={extension(disable_implicit_base, match_any, allow_templates)}, device={arch(nvptx, nvptx64)}]' 'bool (float@
|                 `-FloatingLiteral 0x55f9ac9496c0 <//data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:32> 'float' 1.000000e+00

That is the behavior I expected, as it happens for any base function call with an applicable variant.

This patch doesn't change any of this. We have two specialization that do only differ in their return type but each will only be a variant of a base function with that return type. In any context, when we have a call to the original base function, then we try to specialize. Since only the bool return *or* the int return specializations are variants of the base function, we might replace the base call with a call, but consistent on host and device. I hope this makes some sense, I don't think I did a good job explaining.

tra added inline comments.Aug 13 2020, 4:33 PM
clang/lib/Headers/__clang_cuda_cmath.h
85

It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target overload that I was thinking of (and overloads don't allow different return types at all). If I understand you correctly, the code below allows (literally?) matching host-side function signatures. Because the functions returning bool and functions returning int can't coexist on the host, there will be no conflicts on device side either.

Is that in the ballpark of what's happening? If I'm still off, could you point me to more info about how "pragma omp declare variant" works?

jdoerfert added inline comments.Aug 13 2020, 6:03 PM
clang/lib/Headers/__clang_cuda_cmath.h
85

It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target overload that I was thinking of (and overloads don't allow different return types at all).
[...]
Is that in the ballpark of what's happening?

Yep. Basically, you can provide N specialization for a function and calls to that function are replaced by calls to a matching specialization. We also only do this for direct calls, that is &foo will always give you the address of the base version, which may or may not be desirable but is certainly different from overloading. I also had to completely give up on my overloading based implementation of declare variant :(, but the new one works really well ;)

If I understand you correctly, the code below allows (literally?) matching host-side function signatures. Because the functions returning bool and functions returning int can't coexist on the host, there will be no conflicts on device side either.

Exactly, with the caveat mentioned here in the TODO: We mangle the variants to avoid conflicts with the base function. Since this mangling is only based on the context selector and the function name, two variants that only differ in their return type would clash. To avoid this I added a "no-op" context selector trait here that will ensure the names are different in the "overlay/variant" space.

how "pragma omp declare variant" works?

So, this is an extension to the context selector as allowed by the standard. The latest public version is https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf, declare variant is on page 56, Section 2.3.5. OpenMP 5.1 (Nov 2020) will have various clarifications but the principles are the same. Note that there is declare variant and the begin/end version which behave slightly different. I implemented all of math and complex support with the begin/end version and I believe it to be far superior anyway ;)

JonChesterfield added a comment.EditedAug 13 2020, 8:25 PM

I think this is reasonable. It's unfortunate to have isnan return bool or int depending on the system headers, but considering we have that in a language that doesn't mangle the return type into the name the workaround seems OK.

edit: removed concerns about macro implementations of isnan as this is cmath, not math

JonChesterfield accepted this revision.Aug 13 2020, 8:31 PM
This revision is now accepted and ready to land.Aug 13 2020, 8:31 PM
tra accepted this revision.Aug 14 2020, 10:38 AM
tra added inline comments.
clang/lib/Headers/__clang_cuda_cmath.h
85

Thank you for the details.

This revision was landed with ongoing or failed builds.Sep 16 2020, 11:40 AM
This revision was automatically updated to reflect the committed changes.