This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Add intrinsics for shfl instructions.
ClosedPublic

Authored by jlebar on Jun 8 2016, 5:21 PM.

Details

Summary

Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers. Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.

Diff Detail

Repository
rL LLVM

Event Timeline

jlebar updated this revision to Diff 60123.Jun 8 2016, 5:21 PM
jlebar retitled this revision from to [NVPTX] Add intrinsics for shfl instructions..
jlebar updated this object.
jlebar added a reviewer: tra.
jlebar added subscribers: jholewinski, llvm-commits.

Looks good to me!

tra accepted this revision.Jun 9 2016, 11:02 AM
tra edited edge metadata.

LGTM

test/CodeGen/NVPTX/shfl.ll
19 ↗(On Diff #60123)

I'm curious why {{.}}32 here? Do you expect return type to change?

This revision is now accepted and ready to land.Jun 9 2016, 11:02 AM
jlebar marked an inline comment as done.Jun 9 2016, 12:50 PM
jlebar added inline comments.
test/CodeGen/NVPTX/shfl.ll
19 ↗(On Diff #60123)

It's currently a b32, but there's no reason (afaict) that it couldn't be a u32 (or i32). I didn't want to tie this test to the current behavior, since I don't think it matters.

This revision was automatically updated to reflect the committed changes.
jlebar marked an inline comment as done.