This is an archive of the discontinued LLVM Phabricator instance.

[WIP][AMDGPU] Divergence-driven instruction selection for fshr
Needs ReviewPublic

Authored by foad on Jul 12 2023, 2:37 AM.

Details

Reviewers
None
Group Reviewers
Restricted Project
Summary

Make divergent fshr legal since it is selected to v_alignbit, but
expand uniform fshr since there is no s_alignbit instruction.

Diff Detail

Event Timeline

foad created this revision.Jul 12 2023, 2:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 2:37 AM
foad requested review of this revision.Jul 12 2023, 2:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 2:37 AM
foad added inline comments.Jul 12 2023, 2:49 AM
llvm/test/CodeGen/AMDGPU/bf16.ll
631

There are lots of minor regressions like this. I will investigate.

llvm/test/CodeGen/AMDGPU/build_vector.ll
74

This is the intended change. In general we should select SALU instructions for uniform calculations, even if it's more instructions. In this case it's a shame that the result gets copied to a VGPR anyway...

foad added inline comments.Jul 16 2023, 6:23 AM
llvm/test/CodeGen/AMDGPU/bf16.ll
631

The problem here is that really need to combine shifts and ORs into fshr post-legalization. This no longer happens automatically because we have marked fshr as Custom instead of Legal. I could do it with a target-specific OR combine, but I can't find any way to call back into helper code like MatchRotate in the generic DAGCombiner from a target-specific combine.

arsenm added inline comments.Aug 18 2023, 4:23 AM
llvm/test/CodeGen/AMDGPU/bf16.ll
631

is there just an isLegal call that needs to be isLegalOrCustom?

foad added inline comments.Sep 27 2023, 9:22 AM
llvm/test/CodeGen/AMDGPU/bf16.ll
631

is there just an isLegal call that needs to be isLegalOrCustom?

I can make that change in DAGCombiner::MatchRotate but it does not work:

--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -8339,10 +8339,10 @@ SDValue DAGCombiner::MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL) {
   // The target must have at least one rotate/funnel flavor.
   // We still try to match rotate by constant pre-legalization.
   // TODO: Support pre-legalization funnel-shift by constant.
-  bool HasROTL = hasOperation(ISD::ROTL, VT);
-  bool HasROTR = hasOperation(ISD::ROTR, VT);
-  bool HasFSHL = hasOperation(ISD::FSHL, VT);
-  bool HasFSHR = hasOperation(ISD::FSHR, VT);
+  bool HasROTL = TLI.isOperationLegalOrCustom(ISD::ROTL, VT);
+  bool HasROTR = TLI.isOperationLegalOrCustom(ISD::ROTR, VT);
+  bool HasFSHL = TLI.isOperationLegalOrCustom(ISD::FSHL, VT);
+  bool HasFSHR = TLI.isOperationLegalOrCustom(ISD::FSHR, VT);
 
   // If the type is going to be promoted and the target has enabled custom
   // lowering for rotate, allow matching rotate by non-constants. Only allow

The problem is that during legalization a uniform fshr will be legalized by lowering it to shifts and ORs, but this combine will immediately kick in and combine it back into a fshr. That causes an infinite loop.

Maybe the whole premise of this patch is flawed? Is it OK to say that fshr is only legal if it is divergent? Or do I have to say fshr is always legal, and then lower uniform fshr back into shift and ORs at some later stage?