This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Restrict init of local __shared__ variables to empty constructors only.
ClosedPublic

Authored by tra on May 6 2016, 3:49 PM.

Details

Summary

While shared variables look like any other variable with a static storage class to compiler, they behave differently on device side.

  • one instance is created per block of GPUS, so standard "initialize once using guard variable" model does not quite work.
  • lifetime of the variables ends when the global function exits. Again, it does not fit current assumption about static local vars as we will need to init them again if that function is called again.
  • with that in mind, deinitialization on app exit does not work either as the variable no longer exists past its kernel's exit.

nvcc takes a rather dangerous shortcut and allows non-empty constructors for local static variables. It calls initializer on every entry into the scope and produces a warning that there's going to be a data race as there will be many kernels doing init on many instances of that shared variable. It also calls destructors on exit from the scope. Now, imagine recursive call of a function with a local static variable...

Until we figure out better way to deal with this, clang will only allow empty constructors for local shared variables in a way identical to restrictions imposed on dynamic initializers for global variables.

Diff Detail

Repository
rL LLVM

Event Timeline

tra updated this revision to Diff 56471.May 6 2016, 3:49 PM
tra retitled this revision from to [CUDA] Restrict init of local __shared__ variables to empty constructors only..
tra updated this object.
tra added reviewers: jingyue, jlebar, rnk.
tra added a subscriber: cfe-commits.
jlebar edited edge metadata.May 6 2016, 5:58 PM

While I think this is 100% the right thing to do, I am worried about breaking existing targets. Maybe we need an escape valve, at least until we get that sorted out? Unless you're pretty confident this isn't happening / will be easy enough to fix.

lib/CodeGen/CGDecl.cpp
376 ↗(On Diff #56471)

Please set off "which is ensured by Sema" somehow. I'd probably say

initializers. (This is ensured by Sema.)

lib/Sema/SemaDecl.cpp
10416 ↗(On Diff #56471)

s/have the same checks applied/apply the same checks

tra updated this revision to Diff 56619.May 9 2016, 1:32 PM
tra edited edge metadata.

Reworded comments.

Removed tests that no longer apply as we don't generate constructors for static local variables on device side.
Empty constructor cases are already covered by test/CodeGenCUDA/device-var-init.cu.

jlebar accepted this revision.May 9 2016, 2:57 PM
jlebar edited edge metadata.

Art makes the good point that any code which is broken by this change is itself currently racy at the very best. And we have a way to fix any code which hits this new error (just manually initialize your shared variable using placement new from within thread 0). So, lgtm.

This revision is now accepted and ready to land.May 9 2016, 2:57 PM
tra added a comment.May 9 2016, 2:58 PM

While I think this is 100% the right thing to do, I am worried about breaking existing targets. Maybe we need an escape valve, at least until we get that sorted out? Unless you're pretty confident this isn't happening / will be easy enough to fix.

While empty constructors for shared variables are not unusual, I haven't seen any non-empty constructors in practice.
Considering that such constructor would crash clang until this patch, it would be hard to miss such cases.

Escape hatch would require to make clang to do something reasonable.
Currently llvm crashes because we attempt to generate a load from a guard variable using atomic load instruction that's not supported by NVPTX backend. Even if it was supported, we'd also need to implement _cxa_guard_acquire/release.

We can disable thread-safe guard variants. This will be broken, too, because there will be many per-block instances of the variable, but only one global guard.

We can spend an effort to do the same thing as nvcc -- no guard of any kind, ctor/dtor on function entry/exit and compiler warning about guaranteed data race, but I don't see much point doing *that*.

IMO fixing the source code to initialize static variable explicitly would be the right thing to do.

This revision was automatically updated to reflect the committed changes.