CUDA shared variable should be initialized with undef.
Patch by Greg Rodgers.
Revised and lit test added by Yaxun Liu.
Differential D44985
[CUDA] Let device-side shared variables be initialized with undef yaxunl on Mar 28 2018, 9:58 AM. Authored by
Details CUDA shared variable should be initialized with undef. Patch by Greg Rodgers.
Diff Detail Event Timeline
Comment Actions What exactly are you trying to express here? Are you just trying to make these external declarations when compiling for the device because __shared__ variables are actually defined on the host? That should be handled by the frontend by setting up the AST so that these declarations are not definitions. Comment Actions shared vars (at least in CUDA) are weird. Local-scoped ones are implicitly static (which compiler will attempt to zero-init) but in CUDA shared variables can't have static initializers and we don't know the value of such vars when we launch the kernel. Comment Actions No. These variables are not like external symbols defined on the host. They behave like global variables in the kernel code but never initialized. Currently no targets are able to initialize them and it is users' responsibility to initialize them explicitly. Giving them an initial value will cause error in some backends since they cannot handle them, therefore put undef as initializer. Comment Actions So undef is being used as a special marker to the backends that it's okay not to try to initialize these variables? Comment Actions I think undef as the initializer tells the llvm passes and backend that this global variable contains undefined value. I am not sure if this is better than without an initializer. I saw code in CodeGenModule::getOrCreateStaticVarDecl // Local address space cannot have an initializer. llvm::Constant *Init = nullptr; if (Ty.getAddressSpace() != LangAS::opencl_local) Init = EmitNullConstant(Ty); else Init = llvm::UndefValue::get(LTy); which means OpenCL static variable in local address space (equivalent to CUDA shared address space) gets an undef initializer. For CUDA shared variable, in CodeGenFunction::EmitStaticVarDecl, it first goes through call of CodeGenModule::getOrCreateStaticVarDecl and gets a zeroinitializer, then it reaches line 400 // Whatever initializer such variable may have when it gets here is // a no-op and should not be emitted. bool isCudaSharedVar = getLangOpts().CUDA && getLangOpts().CUDAIsDevice && D.hasAttr<CUDASharedAttr>(); // If this value has an initializer, emit it. if (D.getInit() && !isCudaSharedVar) var = AddInitializerToStaticVarDecl(D, var); Although this disables adding initializer from D, var already has a zeroinitializer from CodeGenModule::getOrCreateStaticVarDecl, therefore its initializer needs to be overwritten by undef. Probably a better solution would be do it in CodeGenModule::getOrCreateStaticVarDecl, side by side by the OpenCL code. Comment Actions Yes, I agree, just updating the condition to trigger if either language mode is set is the right fix.
Comment Actions Only check attribute.
|
This is too hard to read. Inverting it makes it somewhat easier to understand -- either opencl_local or device-side CUDA shared are undef.