This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim to uint3 and dim3.
ClosedPublic

Authored by jlebar on Feb 23 2016, 10:50 PM.

Details

Summary

This lets you write, e.g.

uint3 a = threadIdx;
uint3 b = blockIdx;
dim3 c = gridDim;
dim3 d = blockDim;

which is legal in nvcc, but was not legal in clang.

The fact that e.g. the type of threadIdx is not actually uint3 is still
observable, but now you have to try to observe it.

Diff Detail

Event Timeline

jlebar updated this revision to Diff 48884.Feb 23 2016, 10:50 PM
jlebar retitled this revision from to [CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim to uint3 and dim3..
jlebar updated this object.
jlebar added a reviewer: tra.
jlebar added subscribers: cfe-commits, echristo.
tra added inline comments.Feb 24 2016, 10:17 AM
lib/Headers/cuda_builtin_vars.h
72

Considering that built-in variables are never instantiated, I wonder how it's going to work as the operator will presumably need 'this' pointing *somewhere*, even if we don't use it. Unused 'this' would probably get optimized away with optimizations on, but -O0 may cause problems.

jlebar added inline comments.Feb 24 2016, 11:36 AM
lib/Headers/cuda_builtin_vars.h
72

This is interesting. In the ptx, threadIdx actually gets instantiated, as a non-weak global:

.global .align 1 .b8 threadIdx[1];

Then we take the address of this thing.

At -O2, we don't emit a threadIdx global at all.

I think this is basically fine. It's actually not right to change extern to static in the decl, because then we try to construct a __cuda_builtin_threadIdx_t, and the default constructor is deleted. :)

tra accepted this revision.Feb 24 2016, 1:47 PM
tra edited edge metadata.

OK.

This revision is now accepted and ready to land.Feb 24 2016, 1:47 PM
This revision was automatically updated to reflect the committed changes.