This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
ClosedPublic

Authored by cwabbott on Jun 27 2017, 3:50 PM.

Details

Summary

This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.

Event Timeline

cwabbott created this revision.Jun 27 2017, 3:50 PM
cwabbott added a comment.EditedJun 27 2017, 4:04 PM

I've just pushed a WIP Mesa branch which demonstrates how llvm.amdgcn.update.dpp and llvm.amdgcn.set.inactive are intended to be used. In particular, the main kernel which implements the inclusive scan operation is here. I haven't fully gotten it to work for my test, though, so there's probably still something wrong.

cwabbott updated this revision to Diff 106994.Jul 17 2017, 5:51 PM

rebase on latest WWM implementation, tweak semantics and implementation
to force WQM whenever WQM is used.

cwabbott updated this revision to Diff 108737.Jul 28 2017, 3:52 PM

Rebase on using s_or_saveexec_b64 for WWM (fix test).

cwabbott updated this revision to Diff 108959.Jul 31 2017, 11:49 AM

Fixup lowerCopyInstrs() after using setDesc() by removing extra arguments.

nhaehnle accepted this revision.Aug 2 2017, 2:32 AM

One comment, apart from that LGTM.

lib/Target/AMDGPU/SIWholeQuadMode.cpp
401–404

Hmm. so automatic propagation of the WQM bit doesn't cover this? It would be nicer if it did, but I don't think it's a big deal in practice. Could you please add an explanatory comment in the code?

This revision is now accepted and ready to land.Aug 2 2017, 2:32 AM
cwabbott added inline comments.Aug 2 2017, 7:15 PM
lib/Target/AMDGPU/SIWholeQuadMode.cpp
401–404

No, it doesn't, since this is doing something different. It's implementing the semantics we talked about, that if *anything* in the program needs WQM then the instruction should be in WQM and the source should be in WQM, to make sure that helper lanes participate in reductions. I don't think that can be handled by any kind of propagation. It's also described in the definition of llvm.amdgcn.set.inactive and tested by test_set_inactive2. I can add a comment here to explain that, though.

cwabbott updated this revision to Diff 109479.Aug 2 2017, 7:27 PM

Clarify what the code that implements the WQM semantics is doing.

nhaehnle added inline comments.Aug 3 2017, 5:13 AM
lib/Target/AMDGPU/SIWholeQuadMode.cpp
401–404

Ok, thanks.

cwabbott updated this revision to Diff 109627.Aug 3 2017, 1:50 PM

Rebase on master.

This revision was automatically updated to reflect the committed changes.