This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Provide CUDA's vector types implemented using clang's vector extension.
AbandonedPublic

Authored by tra on Mar 10 2016, 10:42 AM.

Details

Reviewers
jlebar
jingyue
Summary

This provides substantial performance boost on some benchmarks
(~25% on SHOC's FFT) due to vectorized loads/stores.

Unfortunately existing CUDA headers and user code occasionally
take pointer to vector fields which clang does not allow, so
we can't use vector types by default.

While vectorized types help in some cases, they may lower
performance in cases when user reads/writes only part of the vector as
Clang currently generates code to always load/store complete vector.
It may also create data races if user code assumed that parts of the
same vector can be safely changed from different threads.

For now control this feature via -DCUDA_VECTOR_TYPES and let user
choose whether to use Clang's vectorized types or CUDA's
non-vectorized ones.

Diff Detail

Event Timeline

tra updated this revision to Diff 50301.Mar 10 2016, 10:42 AM
tra retitled this revision from to [CUDA] Provide CUDA's vector types implemented using clang's vector extension..
tra updated this object.
tra added reviewers: jlebar, jingyue.
tra added a subscriber: cfe-commits.
jlebar added inline comments.Mar 10 2016, 1:04 PM
lib/Headers/__clang_cuda_runtime_wrapper.h
72

Hm, this is a surprising (to me) way of controlling this feature. Can we use a -f flag instead? Even if all that -f flag does is define something (although in this case I'd suggest giving it a longer name so it's harder to collide with it).

-fsomething would be more discoverable and canonical, I think, and would be easier to document.

lib/Headers/__clang_cuda_vector_types.h
76

I thought host/device attributes weren't needed on classes, only functions?

80

Nit: double underscore is a little weird here, and sort of needlessly competes with the language-reserved __ identifier namespace. Could we just use one underscore?

82

nvidia's version of this function is not explicit -- is this difference intentional?

84

This requires C++11 -- is that intentional?

tra updated this revision to Diff 50341.Mar 10 2016, 1:29 PM
tra marked an inline comment as done.

Removed unneeded struct attributes.

tra added inline comments.Mar 10 2016, 1:30 PM
lib/Headers/__clang_cuda_runtime_wrapper.h
72

I want to tweak end-user's view of CUDA headers which has nothing to do with compiler, IMO.

lib/Headers/__clang_cuda_vector_types.h
76

Ugh. Removed.

80

I'm following the change Eric made to other headers in r260647 so that all arguments use __.

82

That's due to the way vector types are their base types with attribute. Without explicit dim3 variant, compiler can't disambiguate between dim3(int=1,int=1,int=1) and dim3(uint3 which is int w/ attribute).

84

It looks that way, but it does not need c++11. Vector literals allow brace initializers w/o requiring c++11.

jlebar added inline comments.Mar 10 2016, 1:44 PM
lib/Headers/__clang_cuda_runtime_wrapper.h
72

The compiler driver is responsible for enabling/disabling language extensions, and for choosing exactly which dialect we accept. It's also responsible for deciding which optimizations to use. This fits in all of those ways.

Moreover, again, -Dfoo won't appear in --help, so, from a user's perspective, is undiscoverable. In the event that they do discover it somehow, there's no documentation attached to the flag.

I am not aware of any switches built into clang that rely on -D. If you really want to do it this way, can you point me to prior art?

lib/Headers/__clang_cuda_vector_types.h
82

Huh, apparently we do want to use the reserved namespace?

If so, this logic applies very strongly to a -D, which is going to be far more user-visible than the arg names here.

84

If I'm understanding correctly, you're saying that if we have

struct dim3 {
  dim3(unsigned, unsigned, unsigned);
  dim3(uint3);
};

void foo(dim3);

that the call

uint3 x;
foo(x);

is ambiguous, because it could call either dim3 constructor overload?

That is bizarre, but if so, do we need the dim3(uint3) constructor at all?

tra abandoned this revision.Mar 10 2016, 2:51 PM

Ugh. Found more problems with using vector types in C++. Abandoning the idea.

In D18051#372490, @tra wrote:

Ugh. Found more problems with using vector types in C++. Abandoning the idea.

I'm curious, what problems?

tra added a subscriber: tra.Mar 10 2016, 4:38 PM

There were ambiguities in overload resolution between vector types and
their base types. I.e. if I had

void foo(int);
void foo(int3);

then call foo(3) was ambiguous.
It wasn't clear whether this extension is supposed to work in C++ at all.