This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
ClosedPublic

Authored by cwabbott on Jul 7 2017, 6:43 PM.

Details

Summary

Previously, we assumed that certain types of instructions needed WQM in
pixel shaders, particularly DS instructions and image sampling
instructions. This was ok because with OpenGL, the assumption was
correct. But we want to start using DPP instructions for derivatives as
well as other things, so the assumption that we can infer whether to use
WQM based on the instruction won't continue to hold. This intrinsic lets
frontends like Mesa indicate what things need WQM based on their
knowledge of the API, rather than second-guessing them in the backend.
We need to keep around the old method of enabling WQM, but eventually we
should remove it once Mesa catches up. For now, this will let us use DPP
instructions for computing derivatives correctly.

Event Timeline

cwabbott created this revision.Jul 7 2017, 6:43 PM
cwabbott updated this revision to Diff 106945.Jul 17 2017, 1:59 PM

Avoid illegal SGPR<->VGPR copies by treating WQM more like COPY, add test that
exposes the problem.

tpr accepted this revision.Jul 19 2017, 9:22 AM
This revision is now accepted and ready to land.Jul 19 2017, 9:22 AM
nhaehnle requested changes to this revision.Jul 20 2017, 8:45 AM

Some minor comments. In addition, I think it can be simplified, and we probably want the intrinsic to be convergent, because sinking WQM computations into a non-uniform branch could mean that the computation becomes non-WQM for practical purposes.

include/llvm/IR/IntrinsicsAMDGPU.td
744–748

I believe this should be convergent, due to the way neighboring lanes may disappear due to control flow.

lib/Target/AMDGPU/SIInstructions.td
121

I believe this should have a let isConvergent = 1, due to the way neighboring lanes could "disappear" with additional control flow.

lib/Target/AMDGPU/SIWholeQuadMode.cpp
300

Capitalize the comment.

676–688

You can probably use MI->setDesc for this.

This revision now requires changes to proceed.Jul 20 2017, 8:45 AM
cwabbott added inline comments.Jul 20 2017, 1:14 PM
include/llvm/IR/IntrinsicsAMDGPU.td
744–748

Hmm, I'm not really convinced. All this intrinsic does is to guarantee something about how its source value is computed, which obviously won't change if the instruction itself is moved around. The operation itself is a simple move operation, which normally isn't convergent. Can you give me an example where adding a control dependency to the WQM intrinsic causes problems?

cwabbott added inline comments.Jul 26 2017, 3:28 PM
lib/Target/AMDGPU/SIWholeQuadMode.cpp
676–688

It's not quite that simple, since I'm also using this code to optimize llvm.amdgcn.set.inactive with an undef second argument, in which case we need to get rid of the second (undef) argument. But I think the end-result is still a little shorter and otherwise equivalent, so I'll change it.

cwabbott updated this revision to Diff 108372.Jul 26 2017, 3:31 PM
cwabbott edited edge metadata.

Minor comment style fixes, simplify lowerCopyInstrs by using setDesc()

cwabbott marked 2 inline comments as done.Jul 26 2017, 3:33 PM
nhaehnle accepted this revision.Aug 2 2017, 2:15 AM

Thanks. Assuming we agree on the derivative calculations logic I added in a comment, this LGTM.

include/llvm/IR/IntrinsicsAMDGPU.td
744–748

How about derivative calculations where the result is only used in a subsequent if-block? Basically, we need to prevent

%deriv.0 = derivative calculations
%deriv = llvm.amdgcn.wqm(%deriv.0)

if (cc) {
   only_use_of(%deriv)
}

being sunk into

if (cc) {
  %deriv.0 = derivative calculations
  %deriv = llvm.amdgcn.wqm(%deriv.0)
  only_use_of(%deriv)
}

Although, on second thought, I guess all the cross-lane operations involved in computing %deriv.0 are already convergent? So I guess it's fine in the end...

This revision is now accepted and ready to land.Aug 2 2017, 2:15 AM
cwabbott added inline comments.Aug 2 2017, 12:16 PM
include/llvm/IR/IntrinsicsAMDGPU.td
744–748

Yes, in this case we should still be fine. The way think about it is that llvm.amdgcn.wqm merely guarantees that its source have their helper lanes computed correctly; making sure the correct helper lanes are enabled when computing the source is up to the source computations themselves. So, it seems like it should be up to the uses of llvm.amdgcn.wqm to be marked as convergent if necessary.

This revision was automatically updated to reflect the committed changes.