While CUDA documentation claims that such variables are not allowed[1], NVCC has
been accepting them since CUDA-10.0 [2] and some headers in CUDA-11 rely on this
working.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
wha... As you know, const doesn't mean anything, that can be const-casted away. And then you'll be able to observe that this nominally-static variable is just a normal variable.
Since this doesn't make sense and contradicts their documentation, I'm tempted to say this should only apply to the nvidia headers. Is that technically possible? And then we file a bug against nvidia and/or ask Bryce?
clang/test/CodeGenCUDA/static-device-var-no-rdc.cu | ||
---|---|---|
84 | what happens to a const static device or constant var with non-trivial constructor? can we have a test for that? |
Yes, I'm aware of that. It is irrelevant in this case.
The point is that __constant__ and __device__ are not allowed for the local static variables. I believe originally only __shared__ was allowed for local static vars is because __shared__ is already implicitly static, while __constant__ and __device__ are not.
On the other hand, nothing stops us giving static __constant__ of __device__ variables static storage class and that's what NVIDIA has apparently done, though without updating the docs.
The checks on the constructor still apply -- trivialness is enforced, const must have an initializer, etc.
Looks like the const-ness check should not be there, either. NVCC allows non-const statics. We may get rid of this check altogether and allow everything.
Since this doesn't make sense and contradicts their documentation, I'm tempted to say this should only apply to the nvidia headers. Is that technically possible? And then we file a bug against nvidia and/or ask Bryce?
IMO the relaxation is sensible. A static var is just a global var in an odd namespace and we do support that.
E.g. these produce the same PTX for c (modulo mangling of the names):
namespace a { __constant__ int c = 4; } __device__ void foo() { static __constant__ int c = 4; }
clang/test/CodeGenCUDA/static-device-var-no-rdc.cu | ||
---|---|---|
84 | I believe constructor trivialness check is orthogonal and will still be applied. |
OK, backing up, what are the semantics of static on __constant__, __device__, and __shared__?
- My understanding is that __shared__ behaves the same whether or not it's static. It's not equivalent to namespace a { __shared__ int c = 4; }, because that's illegal.
- Does __constant__ behave the same whether or not it's static? A static __constant__ is equivalent to namespace a { __constant__ int c = 4; }, and a non-static __constant__ is *also* equivalent to that?
- And __device__ does not behave the same whether or not it's static? In function scope __device__ int x; is a variable local to the thread. Whereas in global scope __device__ int x; is a global variable that lives in device memory (?). In function scope static __device__ int x; is equivalent to static int x; which is equivalent to int x; in namespace scope?
Should we mandate that you initialize static __constant__ variables in function scope? That is, if you write static __constant__ int x; in a function, then x is always uninitialized (right)? You should do static __constant__ int x = 42;?
Yes. __shared__ is an odd duck. It is implicitly static, so whether we explicitly specify static makes no difference.
We're not changing anything about how it's implemented.
- Does __constant__ behave the same whether or not it's static? A static __constant__ is equivalent to namespace a { __constant__ int c = 4; }, and a non-static __constant__ is *also* equivalent to that?
No. __constant__ is not allowed on non-static local variables as it can't be allocated on stack.
- And __device__ does not behave the same whether or not it's static?
Correct.
In function scope __device__ int x; is a variable local to the thread.
Correct. __device__ in a device function is effectively a no-op and can be placed on stack as a regular local variable.
Whereas in global scope __device__ int x; is a global variable that lives in device memory (?).
Correct.
In function scope static __device__ int x; is equivalent to static int x; which is equivalent to int x; in namespace scope?
Yes, assuming you mean a __device__ function and __device__ int x; in the namespace scope.
Should we mandate that you initialize static __constant__ variables in function scope?
That is, if you write static __constant__ int x; in a function, then x is always uninitialized (right)? You should do static __constant__ int x = 42;?
No. Accoring to PTX spec: Variables in .const and .global state spaces are initialized to zero by default.
Those are the address spaces __constant__ and __device__ variables map to.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#state-spaces
OK, now I'm starting to I understand this change..
Before, in function scope, we allow static const/non-const __shared__, and allow static const so long as it's not __device__ or __constant__.
- static -> error? (I understood us saying above that it is, but now that I read the code, isn't it saying it's an error?)
- static const -> allowed
- static __device__ -> error
- static const __device__ -> error
- static __constant__ -> error
- static const __constant__ -> error
After, in function scope, the rule is, allow static const/non-const __shared__ or anything that's static const.
- static -> error, must be const
- static const -> allowed
- static __device__ -> error, must be const
- static const __device__ -> allowed
- static __constant__ -> error, must be const
- static const __constant__ -> allowed
I guess my question when I write out this table is, why shouldn't it be like this?
- static -> allowed
- static const -> allowed
- static __device__ -> allowed
- const static __device__ -> allowed
- static __constant__ -> error, must be const
- const static __constant__ -> allowed
This makes some sense to me because we're saying, "__constant__ must be const", otherwise, anything goes.
Or here's another way of thinking about it. You're saying that static and static __device__ in function scope are the same as a __device__ variable in block scope. And a __device__ variable in block scope doesn't have to be const (right?). So why the extra restriction on function-scope static?
clang/lib/Sema/SemaDecl.cpp | ||
---|---|---|
13177–13178 | Update comment? |
It should. I did mention in a previous comment that > Looks like the const-ness check should not be there, either.
I need to revise the patch.
This makes some sense to me because we're saying, "__constant__ must be const", otherwise, anything goes.
Except that NVCC allows non-const __constant__, too. Generally speaking, C++ does not care about the attributes. While technically __constant__ is not changeable from the device code, not specifying const is a missed optimization/diagnostic opportunity, but not an error per se. It does not affect how the variable is emitted, but rather what user can do with it and that's beyond the scope of this patch. I don't think it warrants a hard error. A warning, perhaps, that non-const __constant__ is probably an error?
Or here's another way of thinking about it. You're saying that static and static __device__ in function scope are the same as a __device__ variable in block scope. And a __device__ variable in block scope doesn't have to be const (right?). So why the extra restriction on function-scope static?
Something like that.
It should. I did mention in a previous comment that > Looks like the const-ness check should not be there, either. I need to revise the patch.
Heh, okay. Sorry I missed that, somehow this patch was confusing to me.
Except that NVCC allows non-const constant, too. Generally speaking, C++ does not care about the attributes. While technically constant is not changeable from the device code, not specifying const is a missed optimization/diagnostic opportunity, but not an error per se. It does not affect how the variable is emitted, but rather what user can do with it and that's beyond the scope of this patch. I don't think it warrants a hard error. A warning, perhaps, that non-const constant is probably an error?
Sure, that makes sense to me.
clang/test/SemaCUDA/device-var-init.cu | ||
---|---|---|
416–417 | how does this work in device compilation? Is this equivalent to static __device__ int x = 42? |
I've verified that clang with this patch can compile Tensorflow and that it can also compile cooperative_groups.h from CUDA-11.
clang/test/SemaCUDA/device-var-init.cu | ||
---|---|---|
416–417 | Correct. |
clang/test/SemaCUDA/device-var-init.cu | ||
---|---|---|
416–417 | so static variable without __device__/__constant__ attribute in host device function implies __device__ attribute in device compilation. Is this also true in device function? We need Sema and CodeGen tests for these cases. Also, can we document these changes? It is easily forgotten. |
clang/test/SemaCUDA/device-var-init.cu | ||
---|---|---|
416–417 | I think of it as a static variable in a __device__ function. There should be no host-side shadow for it, which would normally be created for a __device__ variable. The tests at the beginning of df_sema() in SemaCUDA/device-var-init.cu already check that static w/o attributes is allowed in __device__. |
Hey, I'm leaving on a vacation tomorrow and didn't have a chance to get to
this review today.
Is that ok? I'm not bringing my work laptop, but I could look at it on my
personal laptop.
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
8189 | __device__ is not allowed on non-static function-scope variables? This appears to be more restrictive than we were before. I want to check, are we OK with the possibility that this will break user code? https://gcc.godbolt.org/z/Y85GKe work with clang, though not with nvcc. I notice that we even allow __device__ int x; in __host__ __device__ functions, which is...questionable. :) https://gcc.godbolt.org/z/GjjMGx I'm OK matching the nvcc behavior here and accepting user breakage so long as we're intentional about it. Possibly should be called out in relnotes? | |
clang/lib/Sema/SemaDeclAttr.cpp | ||
4397–4398 | So just to check, in our new world, __constant__ variables don't have to be const. That matches nvcc, fine. |
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
8189 | It appears to have been an oversight. AFAICT, we just ignored the __device__ attribute of the local vars that sneaked past the isStaticLocal() check. | |
clang/lib/Sema/SemaDeclAttr.cpp | ||
4397–4398 | No. IMO, it's similar to explicitly putting a non-const variable into a .rodata section -- inadvisable, probably not very useful, and possibly implementation-defined, but not illegal. | |
clang/test/CodeGenCUDA/static-device-var-no-rdc.cu | ||
84 | This is checked in SemaCUDA/device-var-init.cu. |
__device__ is not allowed on non-static function-scope variables?
This appears to be more restrictive than we were before. I want to check, are we OK with the possibility that this will break user code? https://gcc.godbolt.org/z/Y85GKe work with clang, though not with nvcc.
I notice that we even allow __device__ int x; in __host__ __device__ functions, which is...questionable. :) https://gcc.godbolt.org/z/GjjMGx
I'm OK matching the nvcc behavior here and accepting user breakage so long as we're intentional about it. Possibly should be called out in relnotes?