Page MenuHomePhabricator

[CUDA] Allow local `static const {__constant__, __device__}` variables.
AcceptedPublic

Authored by tra on Fri, Sep 25, 4:31 PM.

Details

Summary

While CUDA documentation claims that such variables are not allowed[1], NVCC has
been accepting them since CUDA-10.0 [2] and some headers in CUDA-11 rely on this
working.

  1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#static-variables-function
  2. https://godbolt.org/z/zsodzc

Diff Detail

Unit TestsFailed

TimeTest
500 mslinux > Clang.CodeGenCUDA::static-device-var-no-rdc.cu
Script: -- : 'RUN: at line 4'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 -emit-llvm -o - -x hip /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGenCUDA/static-device-var-no-rdc.cu | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefixes=DEV /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGenCUDA/static-device-var-no-rdc.cu
480 mslinux > Clang.SemaCUDA::device-var-init.cu
Script: -- : 'RUN: at line 6'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -verify /mnt/disks/ssd0/agent/llvm-project/clang/test/SemaCUDA/device-var-init.cu -triple nvptx64-nvidia-cuda -fcuda-is-device -std=c++11 /mnt/disks/ssd0/agent/llvm-project/clang/test/SemaCUDA/device-var-init.cu

Event Timeline

tra created this revision.Fri, Sep 25, 4:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Sep 25, 4:31 PM
tra requested review of this revision.Fri, Sep 25, 4:31 PM
tra retitled this revision from [CUDA] Allow `static const {__constant__, __device__}` variables. to [CUDA] Allow local `static const {__constant__, __device__}` variables..

wha... As you know, const doesn't mean anything, that can be const-casted away. And then you'll be able to observe that this nominally-static variable is just a normal variable.

Since this doesn't make sense and contradicts their documentation, I'm tempted to say this should only apply to the nvidia headers. Is that technically possible? And then we file a bug against nvidia and/or ask Bryce?

yaxunl added inline comments.Sat, Sep 26, 4:38 AM
clang/test/CodeGenCUDA/static-device-var-no-rdc.cu
84

what happens to a const static device or constant var with non-trivial constructor? can we have a test for that?

tra added a comment.Mon, Sep 28, 9:48 AM

wha... As you know, const doesn't mean anything, that can be const-casted away. And then you'll be able to observe that this nominally-static variable is just a normal variable.

Yes, I'm aware of that. It is irrelevant in this case.
The point is that __constant__ and __device__ are not allowed for the local static variables. I believe originally only __shared__ was allowed for local static vars is because __shared__ is already implicitly static, while __constant__ and __device__ are not.

On the other hand, nothing stops us giving static __constant__ of __device__ variables static storage class and that's what NVIDIA has apparently done, though without updating the docs.
The checks on the constructor still apply -- trivialness is enforced, const must have an initializer, etc.

Looks like the const-ness check should not be there, either. NVCC allows non-const statics. We may get rid of this check altogether and allow everything.

Since this doesn't make sense and contradicts their documentation, I'm tempted to say this should only apply to the nvidia headers. Is that technically possible? And then we file a bug against nvidia and/or ask Bryce?

IMO the relaxation is sensible. A static var is just a global var in an odd namespace and we do support that.
E.g. these produce the same PTX for c (modulo mangling of the names):

namespace a {
  __constant__ int c = 4;
}
__device__ void foo() {
  static __constant__ int c = 4;
}
clang/test/CodeGenCUDA/static-device-var-no-rdc.cu
84

I believe constructor trivialness check is orthogonal and will still be applied.
I'll add a test.

OK, backing up, what are the semantics of static on __constant__, __device__, and __shared__?

  • My understanding is that __shared__ behaves the same whether or not it's static. It's not equivalent to namespace a { __shared__ int c = 4; }, because that's illegal.
  • Does __constant__ behave the same whether or not it's static? A static __constant__ is equivalent to namespace a { __constant__ int c = 4; }, and a non-static __constant__ is *also* equivalent to that?
  • And __device__ does not behave the same whether or not it's static? In function scope __device__ int x; is a variable local to the thread. Whereas in global scope __device__ int x; is a global variable that lives in device memory (?). In function scope static __device__ int x; is equivalent to static int x; which is equivalent to int x; in namespace scope?

Should we mandate that you initialize static __constant__ variables in function scope? That is, if you write static __constant__ int x; in a function, then x is always uninitialized (right)? You should do static __constant__ int x = 42;?

tra added a comment.Mon, Sep 28, 11:31 AM

OK, backing up, what are the semantics of static on __constant__, __device__, and __shared__?

  • My understanding is that __shared__ behaves the same whether or not it's static. It's not equivalent to namespace a { __shared__ int c = 4; }, because that's illegal.

Yes. __shared__ is an odd duck. It is implicitly static, so whether we explicitly specify static makes no difference.
We're not changing anything about how it's implemented.

  • Does __constant__ behave the same whether or not it's static? A static __constant__ is equivalent to namespace a { __constant__ int c = 4; }, and a non-static __constant__ is *also* equivalent to that?

No. __constant__ is not allowed on non-static local variables as it can't be allocated on stack.

  • And __device__ does not behave the same whether or not it's static?

Correct.

In function scope __device__ int x; is a variable local to the thread.

Correct. __device__ in a device function is effectively a no-op and can be placed on stack as a regular local variable.

Whereas in global scope __device__ int x; is a global variable that lives in device memory (?).

Correct.

In function scope static __device__ int x; is equivalent to static int x; which is equivalent to int x; in namespace scope?

Yes, assuming you mean a __device__ function and __device__ int x; in the namespace scope.

Should we mandate that you initialize static __constant__ variables in function scope?
That is, if you write static __constant__ int x; in a function, then x is always uninitialized (right)? You should do static __constant__ int x = 42;?

No. Accoring to PTX spec: Variables in .const and .global state spaces are initialized to zero by default.
Those are the address spaces __constant__ and __device__ variables map to.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#state-spaces

OK, now I'm starting to I understand this change..

Before, in function scope, we allow static const/non-const __shared__, and allow static const so long as it's not __device__ or __constant__.

  • static -> error? (I understood us saying above that it is, but now that I read the code, isn't it saying it's an error?)
  • static const -> allowed
  • static __device__ -> error
  • static const __device__ -> error
  • static __constant__ -> error
  • static const __constant__ -> error

After, in function scope, the rule is, allow static const/non-const __shared__ or anything that's static const.

  • static -> error, must be const
  • static const -> allowed
  • static __device__ -> error, must be const
  • static const __device__ -> allowed
  • static __constant__ -> error, must be const
  • static const __constant__ -> allowed

I guess my question when I write out this table is, why shouldn't it be like this?

  • static -> allowed
  • static const -> allowed
  • static __device__ -> allowed
  • const static __device__ -> allowed
  • static __constant__ -> error, must be const
  • const static __constant__ -> allowed

This makes some sense to me because we're saying, "__constant__ must be const", otherwise, anything goes.

Or here's another way of thinking about it. You're saying that static and static __device__ in function scope are the same as a __device__ variable in block scope. And a __device__ variable in block scope doesn't have to be const (right?). So why the extra restriction on function-scope static?

clang/lib/Sema/SemaDecl.cpp
13154–13155

Update comment?

tra added a comment.Mon, Sep 28, 4:34 PM

OK, now I'm starting to I understand this change..

Before, in function scope, we allow static const/non-const __shared__, and allow static const so long as it's not __device__ or __constant__.

  • static -> error? (I understood us saying above that it is, but now that I read the code, isn't it saying it's an error?)
  • static const -> allowed
  • static __device__ -> error
  • static const __device__ -> error
  • static __constant__ -> error
  • static const __constant__ -> error

After, in function scope, the rule is, allow static const/non-const __shared__ or anything that's static const.

  • static -> error, must be const
  • static const -> allowed
  • static __device__ -> error, must be const
  • static const __device__ -> allowed
  • static __constant__ -> error, must be const
  • static const __constant__ -> allowed

I guess my question when I write out this table is, why shouldn't it be like this?

  • static -> allowed
  • static const -> allowed
  • static __device__ -> allowed
  • const static __device__ -> allowed
  • static __constant__ -> error, must be const
  • const static __constant__ -> allowed

It should. I did mention in a previous comment that > Looks like the const-ness check should not be there, either.
I need to revise the patch.

This makes some sense to me because we're saying, "__constant__ must be const", otherwise, anything goes.

Except that NVCC allows non-const __constant__, too. Generally speaking, C++ does not care about the attributes. While technically __constant__ is not changeable from the device code, not specifying const is a missed optimization/diagnostic opportunity, but not an error per se. It does not affect how the variable is emitted, but rather what user can do with it and that's beyond the scope of this patch. I don't think it warrants a hard error. A warning, perhaps, that non-const __constant__ is probably an error?

Or here's another way of thinking about it. You're saying that static and static __device__ in function scope are the same as a __device__ variable in block scope. And a __device__ variable in block scope doesn't have to be const (right?). So why the extra restriction on function-scope static?

Something like that.

It should. I did mention in a previous comment that > Looks like the const-ness check should not be there, either. I need to revise the patch.

Heh, okay. Sorry I missed that, somehow this patch was confusing to me.

Except that NVCC allows non-const constant, too. Generally speaking, C++ does not care about the attributes. While technically constant is not changeable from the device code, not specifying const is a missed optimization/diagnostic opportunity, but not an error per se. It does not affect how the variable is emitted, but rather what user can do with it and that's beyond the scope of this patch. I don't think it warrants a hard error. A warning, perhaps, that non-const constant is probably an error?

Sure, that makes sense to me.

tra retitled this revision from [CUDA] Allow local `static const {__constant__, __device__}` variables. to [CUDA] Allow local `static {__constant__, __device__}` variables..Thu, Oct 1, 11:28 AM
tra updated this revision to Diff 295635.Thu, Oct 1, 11:28 AM
tra retitled this revision from [CUDA] Allow local `static {__constant__, __device__}` variables. to [CUDA] Allow local `static const {__constant__, __device__}` variables..
tra edited the summary of this revision. (Show Details)

Further relaxed application of attributes on local static variables.

tra updated this revision to Diff 295636.Thu, Oct 1, 11:36 AM

Fixed a test.

yaxunl added inline comments.Thu, Oct 1, 12:38 PM
clang/test/SemaCUDA/device-var-init.cu
404

how does this work in device compilation? Is this equivalent to static __device__ int x = 42?

tra added a comment.Thu, Oct 1, 1:53 PM

I've verified that clang with this patch can compile Tensorflow and that it can also compile cooperative_groups.h from CUDA-11.

clang/test/SemaCUDA/device-var-init.cu
404

Correct.

yaxunl added inline comments.Fri, Oct 2, 7:11 AM
clang/test/SemaCUDA/device-var-init.cu
404

so static variable without __device__/__constant__ attribute in host device function implies __device__ attribute in device compilation.

Is this also true in device function? We need Sema and CodeGen tests for these cases.

Also, can we document these changes? It is easily forgotten.

tra added inline comments.Fri, Oct 2, 11:30 AM
clang/test/SemaCUDA/device-var-init.cu
404

I think of it as a static variable in a __device__ function. There should be no host-side shadow for it, which would normally be created for a __device__ variable.

The tests at the beginning of df_sema() in SemaCUDA/device-var-init.cu already check that static w/o attributes is allowed in __device__.
I'll update CodeGen tests to verify that we generate correct code.

Hey, I'm leaving on a vacation tomorrow and didn't have a chance to get to
this review today.

Is that ok? I'm not bringing my work laptop, but I could look at it on my
personal laptop.

jlebar accepted this revision.Tue, Oct 13, 3:31 PM
jlebar added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
8163

__device__ is not allowed on non-static function-scope variables?

This appears to be more restrictive than we were before. I want to check, are we OK with the possibility that this will break user code? https://gcc.godbolt.org/z/Y85GKe work with clang, though not with nvcc.

I notice that we even allow __device__ int x; in __host__ __device__ functions, which is...questionable. :) https://gcc.godbolt.org/z/GjjMGx

I'm OK matching the nvcc behavior here and accepting user breakage so long as we're intentional about it. Possibly should be called out in relnotes?

clang/lib/Sema/SemaDeclAttr.cpp
4353

So just to check, in our new world, __constant__ variables don't have to be const. That matches nvcc, fine.

This revision is now accepted and ready to land.Tue, Oct 13, 3:31 PM