This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Support for built-in cuda variables (threadIdx and its friends).
ClosedPublic

Authored by tra on Apr 16 2015, 4:51 PM.

Details

Diff Detail

Repository
rL LLVM

Event Timeline

tra updated this revision to Diff 23887.Apr 16 2015, 4:51 PM
tra retitled this revision from to [CUDA] Support for built-in cuda variables (threadIdx and its friends)..
tra updated this object.
tra edited the test plan for this revision. (Show Details)
tra added reviewers: eliben, echristo, rnk.
tra added a subscriber: Unknown Object (MLST).
eliben added inline comments.Apr 17 2015, 4:08 PM
lib/Headers/cuda/cuda_builtin_vars.h
3 ↗(On Diff #23887)

A top-level comment on the file that explains what this does with a usage example?

12 ↗(On Diff #23887)

Arent blockIdx and threadIdx unsigned?

tra updated this revision to Diff 23975.Apr 17 2015, 4:42 PM

Made fields unsigned. Added copyright header and comments.

tra added inline comments.Apr 17 2015, 4:46 PM
lib/Headers/cuda/cuda_builtin_vars.h
3 ↗(On Diff #23887)

Done.

12 ↗(On Diff #23887)

Yes. So are blockDim and gridDim as dim3 is based on uint3. Fixed.

tra updated this revision to Diff 23976.Apr 17 2015, 4:49 PM

Really changed the fields to be unsigned.

rsmith added a subscriber: rsmith.Apr 17 2015, 6:24 PM
rsmith added inline comments.
lib/Headers/cuda/cuda_builtin_vars.h
89 ↗(On Diff #23976)

This is a (strong) definition due to the extern in the __CUDA_BUILTIN_VAR macro, and will cause a link error if this file is #included into multiple objects in the same binary. You could solve this by removing the extern (that'd give the variable internal linkage) or by using enum { warpSize = 32 }; or similar. I'm not sure exactly what the CUDA spec requires here.

11 ↗(On Diff #23887)

Should these types also have deleted copy constructors and copy assignment operators? I'd imagine these should both be ill-formed:

auto x = threadIdx;
threadIdx = threadIdx;
tra updated this revision to Diff 24055.Apr 20 2015, 1:25 PM
  • Restricted access to built-in variables.
  • Implemented warpSize as class subject to the same restrictions as other built-in variables.
  • Made all built-in variables extern weak in case we ever emit any references to them.
  • Added a test case to check diagnostics on attempt to construct or improperly access a built-in variable.
lib/Headers/cuda/cuda_builtin_vars.h
89 ↗(On Diff #23976)

Those have to be extern because no instances of those special variables can ever exist, yet they are referred to as if they do. 'extern' matches this functionality best. I've added a weak attribute to make linking work. That is, if/when we'll get to do any linking on the GPU code. Currently we produce PTX assembly and we don't ever link one PTX file with another.

I've reworked warpSize into a class with the same restrictions on construction and access as the other built-in variables.

As for CUDA spec, it does not say much -- 'They [built-in variables] are only valid within functions that are executed on the device. [warpSize] is of type int and contains the warp size in threads.'

11 ↗(On Diff #23887)

I've disabled constructors for the special variables and also disabled address-of operator as nvcc does not allow taking their address.

rsmith added inline comments.Apr 20 2015, 2:37 PM
lib/Headers/cuda/cuda_builtin_vars.h
89 ↗(On Diff #23976)

I was only talking about warpSize, not the others. The others seem OK as (non-weak) extern declarations, on the assumption that programs aren't actually allowed to use the address of these.

Your new approach doesn't satisfy the requirement that "warpSize is of type int": this is detectable through template argument deduction, for instance:

template<typename T> T max(T, T)
max(warpSize, 10); // error, T is ambiguous, could be int or __cuda_builtin_warpSize_t

Using

__attribute__((device)) const int warpSize = 32;

would probably work (that is, drop the extern from what you had before).

tra updated this revision to Diff 24073.Apr 20 2015, 4:09 PM

Reverted warpSize back to plain old int. It's 'static const int' now.

lib/Headers/cuda/cuda_builtin_vars.h
89 ↗(On Diff #23976)

Another downside of warpSize-in-a-class was that a reference to an instance of that class was required for the type conversion operator as it can't be a static member function. Scratch that.

Dropping extern is not sufficient. 'const int' suffers from the same problem as 'extern' -- we end up with warpSize visible externally which causes linking error.

I could use 'static const int' -- it keeps the symbol local and does not create linking issues. Unfortunately it allows one to take address of the variable. It's not perfect, but we can live with that for now.

rsmith accepted this revision.Apr 20 2015, 5:52 PM
rsmith added a reviewer: rsmith.

LGTM

lib/Headers/cuda/cuda_builtin_vars.h
89 ↗(On Diff #23976)

In C++, const on a global variable definition (that is not explicitly marked extern) implies internal linkage just like static does -- the static is harmless but redundant.

This revision is now accepted and ready to land.Apr 20 2015, 5:52 PM
tra added inline comments.Apr 21 2015, 9:55 AM
lib/Headers/cuda/cuda_builtin_vars.h
89 ↗(On Diff #23976)

Thanks. I was unaware of that. Will fix.

This revision was automatically updated to reflect the committed changes.