This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] fixes for __shfl_* intrinsics.
ClosedPublic

Authored by tra on Dec 21 2017, 2:37 PM.

Details

Summary
  • __shfl_{up,down}* uses unsigned int for the third parameter.
  • added [unsigned] long overloads for non-sync shuffles. Augments r319908 which added long overload for sync shuffles.

Event Timeline

tra created this revision.Dec 21 2017, 2:37 PM
jlebar accepted this revision.Dec 21 2017, 3:13 PM

Since this is tricky and we've seen it affecting user code, do you think it's a bad idea to add tests to the test-suite?

This revision is now accepted and ready to land.Dec 21 2017, 3:13 PM
tra added a comment.Dec 21 2017, 3:29 PM

Added to my todo list. There are few more gaps that I want to test in order to make sure we don't regress on compatibility with older CUDA versions while changing these wrappers.

This revision was automatically updated to reflect the committed changes.