This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Let device-side shared variables be initialized with undef
ClosedPublic

Authored by yaxunl on Mar 28 2018, 9:58 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl created this revision.Mar 28 2018, 9:58 AM
yaxunl edited the summary of this revision. (Show Details)Mar 28 2018, 10:01 AM
tra added a subscriber: tra.Mar 28 2018, 10:48 AM
tra added inline comments.
test/CodeGenCUDA/device-var-init.cu
112 ↗(On Diff #140095)

Hmm. shared should not be initialized in NVPTX either. This looks like a bug in NVPTX.

For now you should make shared uninitialized regardless of whether we're compiling for AMDGCN or NVPTX.

yaxunl updated this revision to Diff 140110.Mar 28 2018, 11:22 AM
yaxunl retitled this revision from Disable zeroinitializer for CUDA shared varirable for amdgcn target to Remove initializer for CUDA shared varirable.
yaxunl edited the summary of this revision. (Show Details)
yaxunl added a reviewer: tra.

Revised by Artem's comments.

tra accepted this revision.Mar 28 2018, 11:28 AM
This revision is now accepted and ready to land.Mar 28 2018, 11:28 AM
tra added inline comments.Mar 28 2018, 11:30 AM
test/CodeGenCUDA/device-var-init.cu
121 ↗(On Diff #140110)

Perhaps it would make sense to capture there names for NVPTX as well and avoid duplicating all the checks below.

yaxunl added inline comments.Mar 28 2018, 11:41 AM
test/CodeGenCUDA/device-var-init.cu
121 ↗(On Diff #140110)

will do when committing.

What exactly are you trying to express here? Are you just trying to make these external declarations when compiling for the device because __shared__ variables are actually defined on the host? That should be handled by the frontend by setting up the AST so that these declarations are not definitions.

tra added a comment.Mar 28 2018, 12:03 PM

What exactly are you trying to express here? Are you just trying to make these external declarations when compiling for the device because __shared__ variables are actually defined on the host? That should be handled by the frontend by setting up the AST so that these declarations are not definitions.

shared vars (at least in CUDA) are weird. Local-scoped ones are implicitly static (which compiler will attempt to zero-init) but in CUDA shared variables can't have static initializers and we don't know the value of such vars when we launch the kernel.

What exactly are you trying to express here? Are you just trying to make these external declarations when compiling for the device because __shared__ variables are actually defined on the host? That should be handled by the frontend by setting up the AST so that these declarations are not definitions.

No. These variables are not like external symbols defined on the host. They behave like global variables in the kernel code but never initialized. Currently no targets are able to initialize them and it is users' responsibility to initialize them explicitly.

Giving them an initial value will cause error in some backends since they cannot handle them, therefore put undef as initializer.

What exactly are you trying to express here? Are you just trying to make these external declarations when compiling for the device because __shared__ variables are actually defined on the host? That should be handled by the frontend by setting up the AST so that these declarations are not definitions.

No. These variables are not like external symbols defined on the host. They behave like global variables in the kernel code but never initialized. Currently no targets are able to initialize them and it is users' responsibility to initialize them explicitly.

Giving them an initial value will cause error in some backends since they cannot handle them, therefore put undef as initializer.

So undef is being used as a special marker to the backends that it's okay not to try to initialize these variables?

What exactly are you trying to express here? Are you just trying to make these external declarations when compiling for the device because __shared__ variables are actually defined on the host? That should be handled by the frontend by setting up the AST so that these declarations are not definitions.

No. These variables are not like external symbols defined on the host. They behave like global variables in the kernel code but never initialized. Currently no targets are able to initialize them and it is users' responsibility to initialize them explicitly.

Giving them an initial value will cause error in some backends since they cannot handle them, therefore put undef as initializer.

So undef is being used as a special marker to the backends that it's okay not to try to initialize these variables?

I think undef as the initializer tells the llvm passes and backend that this global variable contains undefined value. I am not sure if this is better than without an initializer. I saw code in CodeGenModule::getOrCreateStaticVarDecl

// Local address space cannot have an initializer.
llvm::Constant *Init = nullptr;
if (Ty.getAddressSpace() != LangAS::opencl_local)
  Init = EmitNullConstant(Ty);
else
  Init = llvm::UndefValue::get(LTy);

which means OpenCL static variable in local address space (equivalent to CUDA shared address space) gets an undef initializer.

For CUDA shared variable, in CodeGenFunction::EmitStaticVarDecl, it first goes through call of CodeGenModule::getOrCreateStaticVarDecl and gets a zeroinitializer, then it reaches line 400

// Whatever initializer such variable may have when it gets here is
  // a no-op and should not be emitted.
  bool isCudaSharedVar = getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&
                         D.hasAttr<CUDASharedAttr>();
  // If this value has an initializer, emit it.
  if (D.getInit() && !isCudaSharedVar)
    var = AddInitializerToStaticVarDecl(D, var);

Although this disables adding initializer from D, var already has a zeroinitializer from CodeGenModule::getOrCreateStaticVarDecl, therefore its initializer needs to be overwritten by undef.

Probably a better solution would be do it in CodeGenModule::getOrCreateStaticVarDecl, side by side by the OpenCL code.

What exactly are you trying to express here? Are you just trying to make these external declarations when compiling for the device because __shared__ variables are actually defined on the host? That should be handled by the frontend by setting up the AST so that these declarations are not definitions.

No. These variables are not like external symbols defined on the host. They behave like global variables in the kernel code but never initialized. Currently no targets are able to initialize them and it is users' responsibility to initialize them explicitly.

Giving them an initial value will cause error in some backends since they cannot handle them, therefore put undef as initializer.

So undef is being used as a special marker to the backends that it's okay not to try to initialize these variables?

I think undef as the initializer tells the llvm passes and backend that this global variable contains undefined value. I am not sure if this is better than without an initializer. I saw code in CodeGenModule::getOrCreateStaticVarDecl

// Local address space cannot have an initializer.
llvm::Constant *Init = nullptr;
if (Ty.getAddressSpace() != LangAS::opencl_local)
  Init = EmitNullConstant(Ty);
else
  Init = llvm::UndefValue::get(LTy);

which means OpenCL static variable in local address space (equivalent to CUDA shared address space) gets an undef initializer.

For CUDA shared variable, in CodeGenFunction::EmitStaticVarDecl, it first goes through call of CodeGenModule::getOrCreateStaticVarDecl and gets a zeroinitializer, then it reaches line 400

// Whatever initializer such variable may have when it gets here is
  // a no-op and should not be emitted.
  bool isCudaSharedVar = getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&
                         D.hasAttr<CUDASharedAttr>();
  // If this value has an initializer, emit it.
  if (D.getInit() && !isCudaSharedVar)
    var = AddInitializerToStaticVarDecl(D, var);

Although this disables adding initializer from D, var already has a zeroinitializer from CodeGenModule::getOrCreateStaticVarDecl, therefore its initializer needs to be overwritten by undef.

Probably a better solution would be do it in CodeGenModule::getOrCreateStaticVarDecl, side by side by the OpenCL code.

Yes, I agree, just updating the condition to trigger if either language mode is set is the right fix.

yaxunl updated this revision to Diff 140310.Mar 29 2018, 12:18 PM

Revised by John's comments. Also simplified the test by Artem's comments.

tra added inline comments.Mar 29 2018, 1:54 PM
lib/CodeGen/CGDecl.cpp
235–240 ↗(On Diff #140310)

This is too hard to read. Inverting it makes it somewhat easier to understand -- either opencl_local or device-side CUDA shared are undef.

if (Ty.getAddressSpace() == LangAS::opencl_local 
    || (getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&
        D.hasAttr<CUDASharedAttr>()))
  Init = llvm::UndefValue::get(LTy);
else
  Init = EmitNullConstant(Ty);
yaxunl updated this revision to Diff 140329.Mar 29 2018, 2:06 PM
yaxunl retitled this revision from Remove initializer for CUDA shared varirable to [CUDA] Let device-side shared variables be initialized with undef.
yaxunl edited the summary of this revision. (Show Details)

Revised by Artem's comment. Improve readability of condition.

tra added a comment.Mar 29 2018, 2:17 PM

Still LGTM.

rjmccall added inline comments.Mar 29 2018, 2:45 PM
lib/CodeGen/CGDecl.cpp
235–240 ↗(On Diff #140310)

I assume getLangOpts().CUDAIsDevice implies getLangOpts().CUDA, so you really only need to check CUDAIsDevice. But it might be faster still to just check for the attribute.

yaxunl updated this revision to Diff 140640.Apr 2 2018, 9:51 AM

Only check attribute.

lib/CodeGen/CGDecl.cpp
235–240 ↗(On Diff #140310)

Right. I think only attribute needs to be checked.

rjmccall accepted this revision.Apr 2 2018, 10:12 AM

LGTM, thanks.

This revision was automatically updated to reflect the committed changes.