This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I've just pushed a WIP Mesa branch which demonstrates how llvm.amdgcn.update.dpp and llvm.amdgcn.set.inactive are intended to be used. In particular, the main kernel which implements the inclusive scan operation is here. I haven't fully gotten it to work for my test, though, so there's probably still something wrong.
rebase on latest WWM implementation, tweak semantics and implementation
to force WQM whenever WQM is used.
One comment, apart from that LGTM.
lib/Target/AMDGPU/SIWholeQuadMode.cpp | ||
---|---|---|
401–404 ↗ | (On Diff #108959) | Hmm. so automatic propagation of the WQM bit doesn't cover this? It would be nicer if it did, but I don't think it's a big deal in practice. Could you please add an explanatory comment in the code? |
lib/Target/AMDGPU/SIWholeQuadMode.cpp | ||
---|---|---|
401–404 ↗ | (On Diff #108959) | No, it doesn't, since this is doing something different. It's implementing the semantics we talked about, that if *anything* in the program needs WQM then the instruction should be in WQM and the source should be in WQM, to make sure that helper lanes participate in reductions. I don't think that can be handled by any kind of propagation. It's also described in the definition of llvm.amdgcn.set.inactive and tested by test_set_inactive2. I can add a comment here to explain that, though. |
lib/Target/AMDGPU/SIWholeQuadMode.cpp | ||
---|---|---|
401–404 ↗ | (On Diff #108959) | Ok, thanks. |