This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] Fix constexpr variables for C++17
ClosedPublic

Authored by yaxunl on May 1 2020, 9:08 AM.

Details

Summary

constexpr variables are compile time constants and implicitly const, therefore
they are safe to emit on both device and host side. Besides, in many cases
they are intended for both device and host, therefore it makes sense
to emit them on both device and host sides if necessary.

In most cases constexpr variables are used as rvalue and the variables
themselves do not need to be emitted. However if their address is taken,
then they need to be emitted.

The following example shows constexpr is available on device side without
__device__ or __constant__ attribute

https://godbolt.org/z/Uf7CgK

Which indicates that we need to emit constexpr variables on device side
even without __device__ or __constant__ attribute when necessary.
This should be OK since the initializer is compile time constant and the
variable itself is constant. They can just be emitted in the same way
as the host side.

For C++14, clang is able to handle that since clang emits them with
available_externally linkage together with the initializer.

However for C++17, the constexpr static data member of a class or template class
become inline variables implicitly. Therefore they become definitions with
linkonce_odr or weak_odr linkages. As such, they can not have available_externally
linkage.

This patch fixes that by adding implicit constant attribute to
file scope constexpr variables and constexpr static data members
in device compilation.

Diff Detail

Event Timeline

yaxunl created this revision.May 1 2020, 9:08 AM
tra added a comment.EditedMay 12 2020, 10:12 AM

constexpr variables are compile time constants and implicitly const, therefore
they are safe to emit on both device and host side. Besides, in many cases
they are intended for both device and host, therefore it makes sense
to emit them on both device and host sides if necessary.

In most cases constexpr variables are used as rvalue and the variables
themselves do not need to be emitted.

Agreed so far. constexpr values are usable on both sides.

However if their address is taken, then they need to be emitted.

Agreed.

That's where the fun starts. Will this assertion trigger or not?

constexpr int ce = 42;
__global__ void kernel() {
  assert(p == &ce); 
}
void f() {
  kernel<<<1,1>>>(&ce);
}

NVCC does not allow accessing ce, so this does not compile.
Clang allows taking a reference, but the variable is not emitted, so the failure will happen in ptxas.
If compiler does emit ce on device side, the addresses will be different, unless we rely on
CUDA runtime magic to translate addresses of shadows into real device-side address.
If we do that, then we should probably add an implicit constant on the constexpr vars.

The following example shows constexpr is available on device side without
device or constant attribute

https://godbolt.org/z/Uf7CgK

If you clear output filters, you will see that the reference to constexpr is emitted as .extern .global .align 4 .u32 _ZN1B1bE; and the varible's address is not actually available. If you were to actually compile the example, ptxas would complain about ti.
NVCC, predictably, allows using constexpr values, but not the variables themselves:
https://godbolt.org/z/55dEx-

In D79237#2031870, @tra wrote:

constexpr variables are compile time constants and implicitly const, therefore
they are safe to emit on both device and host side. Besides, in many cases
they are intended for both device and host, therefore it makes sense
to emit them on both device and host sides if necessary.

In most cases constexpr variables are used as rvalue and the variables
themselves do not need to be emitted.

Agreed so far. constexpr values are usable on both sides.

However if their address is taken, then they need to be emitted.

Agreed.

That's where the fun starts. Will this assertion trigger or not?

constexpr int ce = 42;
__global__ void kernel() {
  assert(p == &ce); 
}
void f() {
  kernel<<<1,1>>>(&ce);
}

NVCC does not allow accessing ce, so this does not compile.
Clang allows taking a reference, but the variable is not emitted, so the failure will happen in ptxas.
If compiler does emit ce on device side, the addresses will be different, unless we rely on
CUDA runtime magic to translate addresses of shadows into real device-side address.
If we do that, then we should probably add an implicit constant on the constexpr vars.

The following example shows constexpr is available on device side without
device or constant attribute

https://godbolt.org/z/Uf7CgK

If you clear output filters, you will see that the reference to constexpr is emitted as .extern .global .align 4 .u32 _ZN1B1bE; and the varible's address is not actually available. If you were to actually compile the example, ptxas would complain about ti.
NVCC, predictably, allows using constexpr values, but not the variables themselves:
https://godbolt.org/z/55dEx-

It seems adding an implicit __constant__ attribute to constexpr variables on device side may be a possible solution. I will give it a try.

yaxunl updated this revision to Diff 264329.May 15 2020, 1:32 PM
yaxunl edited the summary of this revision. (Show Details)

add implicit constant attribute to constexpr file scope variables and constexpr static data members in device compilation.

tra added a comment.May 15 2020, 2:07 PM

LGTM in general. Let me check the patch on our tensorflow build.

clang/test/SemaCUDA/constexpr-variables.cu
30–31 ↗(On Diff #264329)

Can we verify the diags for bad cases, too?

yaxunl marked 2 inline comments as done.May 15 2020, 2:20 PM
yaxunl added inline comments.
clang/test/SemaCUDA/constexpr-variables.cu
30–31 ↗(On Diff #264329)

By bad cases you mean the constexpr var is not compile time constant?

tra added a comment.May 15 2020, 2:53 PM
In D79237#2039417, @tra wrote:

LGTM in general. Let me check the patch on our tensorflow build.

Bad news -- it breaks the standard C++ library.

Reproducer:

$ bin/clang++ -x cuda /dev/null -fsyntax-only -include algorithm --cuda-path=$HOME/local/cuda-10.1 --cuda-device-only --cuda-gpu-arch=sm_60 -std=c++17 -stdlib=libc++                                              tra@art3:~/work/llvm/build/release+assert+zapcc

In file included from <built-in>:2:
In file included from build/release+assert+zapcc/lib/clang/11.0.0/include/cuda_wrappers/algorithm:55:
In file included from build/release+assert+zapcc/bin/../include/c++/v1/algorithm:642:
build/release+assert+zapcc/bin/../include/c++/v1/utility:937:51: error: dynamic initialization is not supported for __device__, __constant__, and __shared__ variables.
_LIBCPP_INLINE_VAR constexpr in_place_type_t<_Tp> in_place_type{};
                                                  ^            ~~
build/release+assert+zapcc/bin/../include/c++/v1/utility:944:53: error: dynamic initialization is not supported for __device__, __constant__, and __shared__ variables.
_LIBCPP_INLINE_VAR constexpr in_place_index_t<_Idx> in_place_index{};
                                                    ^             ~~
2 errors generated when compiling for sm_60.
tra added inline comments.May 15 2020, 3:33 PM
clang/test/SemaCUDA/constexpr-variables.cu
30–31 ↗(On Diff #264329)

It's a general suggestion. It's good to have some negative testing thrown in, too to make sure we didn't do too much by mistake. E.g. allowing all variables to become __constant__ would be bad. :-)

I'm not sure what exactly would be a good addition here. If you can't think of something, I guess we can leave it as is. C++ is largely oblivoius of __constant__, so we may not have much to work with. It's probably observable, but may be an overkill for the test.

rsmith added a subscriber: rsmith.May 15 2020, 4:31 PM
In D79237#2039559, @tra wrote:
In D79237#2039417, @tra wrote:

Bad news -- it breaks the standard C++ library.

[...]

build/release+assert+zapcc/bin/../include/c++/v1/utility:937:51: error: dynamic initialization is not supported for __device__, __constant__, and __shared__ variables.
_LIBCPP_INLINE_VAR constexpr in_place_type_t<_Tp> in_place_type{};
                                                  ^            ~~

This looks like a bug in that diagnostic: the instantiations of this variable template certainly do not have dynamic initialization. The diagnostic is missing an "in instantiation of" note, so I think the bug is that we're performing this check on a dependent variable prior to instantiation. Presumably we should delay the check to instantiation time if either the type of the variable is dependent or the initializer is value-dependent.

yaxunl updated this revision to Diff 264394.May 15 2020, 5:16 PM
yaxunl marked an inline comment as done.

fix constexpr var in templates

yaxunl marked 2 inline comments as done.May 15 2020, 5:18 PM
yaxunl added inline comments.
clang/test/SemaCUDA/constexpr-variables.cu
30–31 ↗(On Diff #264329)

added tests where initializer is not compile time constant

yaxunl marked an inline comment as done.May 15 2020, 5:19 PM
In D79237#2039559, @tra wrote:
In D79237#2039417, @tra wrote:

Bad news -- it breaks the standard C++ library.

[...]

build/release+assert+zapcc/bin/../include/c++/v1/utility:937:51: error: dynamic initialization is not supported for __device__, __constant__, and __shared__ variables.
_LIBCPP_INLINE_VAR constexpr in_place_type_t<_Tp> in_place_type{};
                                                  ^            ~~

This looks like a bug in that diagnostic: the instantiations of this variable template certainly do not have dynamic initialization. The diagnostic is missing an "in instantiation of" note, so I think the bug is that we're performing this check on a dependent variable prior to instantiation. Presumably we should delay the check to instantiation time if either the type of the variable is dependent or the initializer is value-dependent.

right. fixed

tra accepted this revision.Jun 3 2020, 10:22 AM

Tested with tensorflow build. The patch- does not seem to break anything now.

This revision is now accepted and ready to land.Jun 3 2020, 10:22 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2020, 7:17 PM