Page MenuHomePhabricator

[NVPTX] Add intrinsics for shfl instructions.

Authored by jlebar on Jun 8 2016, 5:21 PM.



Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers. Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.

Diff Detail


Event Timeline

jlebar updated this revision to Diff 60123.Jun 8 2016, 5:21 PM
jlebar retitled this revision from to [NVPTX] Add intrinsics for shfl instructions..
jlebar updated this object.
jlebar added a reviewer: tra.
jlebar added subscribers: jholewinski, llvm-commits.

Looks good to me!

tra accepted this revision.Jun 9 2016, 11:02 AM
tra edited edge metadata.


19 ↗(On Diff #60123)

I'm curious why {{.}}32 here? Do you expect return type to change?

This revision is now accepted and ready to land.Jun 9 2016, 11:02 AM
jlebar marked an inline comment as done.Jun 9 2016, 12:50 PM
jlebar added inline comments.
19 ↗(On Diff #60123)

It's currently a b32, but there's no reason (afaict) that it couldn't be a u32 (or i32). I didn't want to tie this test to the current behavior, since I don't think it matters.

This revision was automatically updated to reflect the committed changes.
jlebar marked an inline comment as done.