Page MenuHomePhabricator

[CUDA] Implement __shfl* intrinsics in clang headers.

Authored by jlebar on Jun 8 2016, 5:23 PM.

Diff Detail


Event Timeline

jlebar updated this revision to Diff 60125.Jun 8 2016, 5:23 PM
jlebar retitled this revision from to [CUDA] Implement __shfl* intrinsics in clang headers..
jlebar updated this object.
jlebar added a reviewer: tra.
jlebar added subscribers: cfe-commits, jholewinski.

Looks reasonable to me.

(Art, I would appreciate a second set of eyes on this one, as the last time I did this -- with ldg -- I messed up pretty badly.)

Thank you for the reviews, Justin!

tra added inline comments.Jun 9 2016, 10:58 AM
77–80 ↗(On Diff #60125)

Could we use a union instead?

87 ↗(On Diff #60125)

Ugh. Took me a while to figure out why 0 is used here.
Unlike other variants shfl.up apparently applies to lanes >= maxLane. Who would have thought.
Might add a comment here so it's not mistaken for a typo.

jlebar updated this revision to Diff 60223.Jun 9 2016, 12:48 PM
jlebar marked 2 inline comments as done.

Update after tra's review.

77–80 ↗(On Diff #60125)

I'm pretty sure using a union for this purpose is UB in C++. "[9.5.1] In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time" Although apparently it's fine in C11,

87 ↗(On Diff #60125)

Done, thanks.

This revision was automatically updated to reflect the committed changes.