While shared variables look like any other variable with a static storage class to compiler, they behave differently on device side.
- one instance is created per block of GPUS, so standard "initialize once using guard variable" model does not quite work.
- lifetime of the variables ends when the global function exits. Again, it does not fit current assumption about static local vars as we will need to init them again if that function is called again.
- with that in mind, deinitialization on app exit does not work either as the variable no longer exists past its kernel's exit.
nvcc takes a rather dangerous shortcut and allows non-empty constructors for local static variables. It calls initializer on every entry into the scope and produces a warning that there's going to be a data race as there will be many kernels doing init on many instances of that shared variable. It also calls destructors on exit from the scope. Now, imagine recursive call of a function with a local static variable...
Until we figure out better way to deal with this, clang will only allow empty constructors for local shared variables in a way identical to restrictions imposed on dynamic initializers for global variables.
Please set off "which is ensured by Sema" somehow. I'd probably say