This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Allow "extern __shared__ Foo foo[]" within anon. namespaces.
ClosedPublic

Authored by jlebar on May 11 2018, 3:44 PM.

Details

Summary

Previously this triggered a -Wundefined-internal warning. But it's not
an undefined variable -- any variable of this form is a pointer to the
base of GPU core's shared memory.

Diff Detail

Repository
rC Clang

Event Timeline

jlebar created this revision.May 11 2018, 3:44 PM
tra accepted this revision.May 11 2018, 3:59 PM
This revision is now accepted and ready to land.May 11 2018, 3:59 PM

Rather than suppressing the warning, should we instead give such variables external linkage?

tra added a comment.May 11 2018, 4:03 PM

Rather than suppressing the warning, should we instead give such variables external linkage?

That's a good point. I believe we already give local shared vars external linkage in other circumstances.

Rather than suppressing the warning, should we instead give such variables external linkage?

I suppose the bigger question is: what's the intended semantic model here? Does this behave like a tentative definition? Or like a variable with external linkage (linking to another variable defined elsewhere, magically, by the implementation)? Or is this simply a definition despite being declared extern? Just suppressing this one warning doesn't seem like the right thing to do in any of those cases. (Also: can it alias other __shared__ globals? Clang will not model that correctly without other changes -- you'll probably at least want VarDecl::isWeak() to return true if that's the case.)

jlebar added a comment.EditedMay 11 2018, 4:10 PM

I am a total ignoramus when it comes to linkage, so maybe this is a dumb question, but what does it mean for a static variable inside of a function inside of an anon ns to have external linkage?

Specifically, imagine that I have two TUs that I want to link together:

a.cu

namespace { __global__ void foo() { extern __shared__ char arr[]; } }

b.cu

namespace { __global__ void foo() { extern __shared__ double arr[]; } }

If arr has external linkage, does this mean that it is effectively the same (from the perspective of the variable) as if the function it's in were not in an anon namespace? That seems bad because these two arrs will have the same mangled name but different alignment requirements.

Does this behave like a tentative definition?

Not sure what that is, sorry. :(

Or like a variable with external linkage (linking to another variable defined elsewhere, magically, by the implementation)?

That's...close, I guess. extern __shared__ Foo arr[] is another name for "a pointer to the shared memory accessible by this function."

Or is this simply a definition despite being declared extern?

I'd think it's not a definition because two of these can (and do) alias.

Also: can it alias other shared globals?

I'd have expected that the only __shared__ global that should be allowed is this extern __shared__ Type array_with_no_bounds[] thing. (All of them alias each other.)

Some experimentation shows that to be false, at least with clang as-is today.

__shared__ int a;  // no error, but what does this mean?
extern __shared__ int b;  // error, WAI
extern __shared__ int c[];  // ok, WAI
extern __shared__ int d[10];   // error, WAI

__global__ void foo() {
  __shared__ int z;  // ok, WAI
  static __shared__ int y;  // no error, but I think that's broken?
  extern __shared__ int x;  // error, wai
  extern __shared__ int w[];  // ok, wai
  extern __shared__ int v[10];  // error, wai
}

Unfortunately nvcc seems happy with a and y, so we may not be able to disallow those. I have no idea what the semantics are supposed to be, though.

I'll run some tests.

Clang will not model that correctly without other changes -- you'll probably at least want VarDecl::isWeak() to return true if that's the case.

Okay. What's an example of the problem this causes?

Just suppressing this one warning doesn't seem like the right thing to do in any of those cases.

sgtm, thanks for the help here.

__shared__ int a;
extern __shared__ int c[];

__global__ void foo() {
  __shared__ int z;
  static __shared__ int y;
  extern __shared__ int w[];

  printf("&a=%p\n&c=%p\n&z=%p\n&y=%p\n&w=%p\n", &a, &c, &z, &y, &w);
}

int main() {
  foo<<<1, 1>>>();
  cudaDeviceSynchronize();
}

clang:

&a=0x7f3410000000
&c=0x7f3410000010
&z=0x7f3410000004
&y=0x7f3410000008
&w=0x7f3410000010

nvcc is the same:

&a=0x7f4ea1000000
&c=0x7f4ea1000010
&z=0x7f4ea1000004
&y=0x7f4ea1000008
&w=0x7f4ea1000010

Looks like I was wrong to characterize c and w as pointers to the base of the shared memory in the GPU core. They alias each other, but don't alias the other shared variables, global or otherwise.

The plot thickens.

tra pointed me to the CUDA spec, which says that all shared variables have implied static storage. Which is another way of saying that all shared variables are global variables.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-specifiers

But. The lifetime of these global variables is not the lifetime of the host program. Rather, their lifetime is tied to a single kernel launch. So when the threads for a particular block start running, the state of shared memory is undefined.

Does this clarify anything for you about how we should treat this extern __shared__ int arr[], Richard? I admit I'm more confused than when I started.

Justin and I looked at a few more examples, and I think we have a somewhat better understanding now. The general behavior is:

  • extern __shared__ is effectively a distinct storage class from __shared__ and from extern; such a variable cannot be defined via a later __shared__ definition because that would be a different storage class (though nvcc gives a somewhat confusing error here, talking about it being "static" and perhaps revealing something about how they implement it)
  • the extern __shared__ storage class can only be used in declarations of type T []
  • all such declarations are names for the same object

So, we have multiple declarations that are ways of naming some object that is provided by magic living elsewhere. This is pretty similar to what __attribute__((alias(X))) does: we're declaring an entity and saying "this declaration is just a name for something else that someone else owns". However, we can't quite model this the same way we model the alias attribute: we model the alias attribute as turning the declaration containing it into a definition. extern __shared__ doesn't work like that: an extern __shared__ variable *can* be redeclared.

In principle, it seems like the right model would be that an extern __shared__ variable *is* defined, but none of its declarations is actually "the definition". We could pretend that, say, the first declaration is "the definition", but that's somewhat arbitrary and would likely cause additional problems if the first declaration is function-local (I expect some parts of Clang to be surprised by a function-local extern variable declaration turning out to be a definition). Or we could model it as a VarDecl for which hasDefinition returns true but getDefinition returns nullptr. I expect that to break a lot more parts of Clang, even though it's in some sense an accurate description of the situation. If we wanted to model this with linkage, I think the right linkage would be VisibleInternalLinkage (which would be new). And even then that's not an entirely accurate model.

All of these options seem likely to add more complexity than just treating this as a super-special case in the (hopefully) very few places where it matters. So I think the best way forward is to stick with the model we have for now (that these are declarations of an internal linkage variable with no definition), and just fix up the observable symptoms of our not modeling that the variable does in fact have a definition. That is: go with something like this patch for now and apply similar fixes to address any other problems we find later.

A separate question is: is it OK that we can have multiple such extern __shared__ variables that all alias each other? Will LLVM or Clang decide that two such variables cannot possibly have the same address? Eg:

extern __shared__ int arr1[];
extern __shared__ int arr2[];
bool b = &arr1 == &arr2;

As it happens, Clang doesn't fold the comparison to false because it believes the arrays might turn out to have size zero, in which case they could have the same address, and it looks like LLVM makes the same deduction. I don't think that's entirely robust (for example, in an expression like &arr1[1] == &arr2[1], we could deduce that both arrays have size at least one, and so the comparison must yield false, but neither Clang nor LLVM happens to do that today).

clang/lib/Sema/Sema.cpp
661 ↗(On Diff #146432)

Should this apply to internal-linkage extern __shared__ globals too?

namespace {
extern __shared__ int arr[];
__global__ void use() { arr[0] = 0; }
}

In SemaDeclAttr.cpp we check VD->hasExternalStorage() for this purpose (which is basically checking for the extern keyword). Also getLangOpts().CUDARelocatableDeviceCode appears to have some bearing on this? (It looks like it might turn off the special handling of extern __shared__ and just treat it like a normal external-linkage variable?)

661–663 ↗(On Diff #146432)

As a way to make this slightly cleaner, you could move this check to a new function such as VarDecl::isKnownToBeDefined() ("Is this variable known to have a definition somewhere in the complete program? This may be true even if the declaration has internal linkage and has no declaration within this source file.") There are in theory other ways we could get into that state (such as a variable defined in inline asm).

But if this is the only place we'd use that, I'm fine with keeping this here for now.

jlebar updated this revision to Diff 147216.May 16 2018, 4:56 PM
jlebar marked an inline comment as done.

Address Richard's review comments.

Thank you for the careful review, Richard.

I went ahead and added isKnownToBeDefined, seems reasonable to me.

I also added some checks for extern __shared__ Foo foo[] with relocatable device code enabled.

rsmith accepted this revision.May 16 2018, 6:30 PM

Thanks, looks great.

clang/include/clang/AST/Decl.h
1461 ↗(On Diff #147216)

declaration -> definition

clang/lib/AST/Decl.cpp
2443 ↗(On Diff #147216)

Missing end of sentence.

jlebar marked 2 inline comments as done.May 17 2018, 9:18 AM

Thank you for the review, Richard. Submitting this with these changes...

This revision was automatically updated to reflect the committed changes.