This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Check for unneeded shift mask in shift PatFrags.
ClosedPublic

Authored by abinavpp on Nov 8 2021, 6:20 PM.

Details

Summary

The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.

Diff Detail

Event Timeline

abinavpp created this revision.Nov 8 2021, 6:20 PM
abinavpp requested review of this revision.Nov 8 2021, 6:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2021, 6:20 PM
abinavpp added inline comments.Nov 8 2021, 6:26 PM
llvm/test/CodeGen/AMDGPU/constrained-shift.ll
19

Do we see anything obvious in this change that's not allowing us to eliminate the and in global-isel for the divergent cases?

foad added inline comments.Nov 9 2021, 2:50 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3887

Could use getIConstantVRegVal here to get the APInt value directly?

3887

Opnd1 is a bit confusing because it's MI.getOperand(2). Maybe call it something vague like MaskVal?

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
261

Maybe change this to foreach logwidth = [4, 5, 6] so you can put the definition of csh_mask_#logwidth inside the loop? Or maybe that's impossible because you need to refer to logwidth inside a C++ code fragment?

foad added inline comments.Nov 9 2021, 3:05 AM
llvm/test/CodeGen/AMDGPU/constrained-shift.ll
19

I think a cross-regbank copy is getting in the way of matching the constant value 15. Maybe use getIConstantVRegValWithLookThrough to look through the copy?

arsenm added inline comments.Nov 9 2021, 5:47 AM
llvm/test/CodeGen/AMDGPU/constrained-shift.ll
19

This is another case where regbankselect or a the post regbank combiner should have materialized the constant in VGPR to begin with

abinavpp updated this revision to Diff 388404.Nov 19 2021, 12:17 AM

Partially addressed review comments.

abinavpp marked an inline comment as done.Nov 19 2021, 12:43 AM
abinavpp added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3887

For some reason, naming the operands as Val and MaskVal is confusing me, how about LHS and RHS?

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
261

Right, I'm not able to refer them in the code fragments.

llvm/test/CodeGen/AMDGPU/constrained-shift.ll
19

I think a cross-regbank copy is getting in the way of matching the constant value 15. Maybe use getIConstantVRegValWithLookThrough to look through the copy?

getIConstantVRegValWithLookThrough() in the predicate alone won't help here
since we're not able to match the pattern in the first place.

19

This is another case where regbankselect or a the post regbank combiner should have materialized the constant in VGPR to begin with

Doing something like:

--- a/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp
@@ -472,6 +472,10 @@ RegBankSelect::MappingCost RegBankSelect::computeMapping(
     Register Reg = MO.getReg();
     if (!Reg)
       continue;
+
+    if (MO.isUse() && isConstantOrConstantVector(*(MRI->getVRegDef(Reg)), *MRI))
+      continue;
+
     LLVM_DEBUG(dbgs() << "Opd" << OpIdx << '\n');

or forcing SGPRRegBank for constant operands in
AMDGPURegisterBankInfo::getDefaultMappingVOP() fixes this problem buts ends up
violating the constant bus restriction for a lot of AMDGPU tests.

I'm not sure how the original PatFrags (i.e. the ones with the masks as literal
constants without predicates) are working correctly in global-isel for *some*
(vector cases and scalar 64-bit cases are not working) of the divergent cases.

Is there a way to write a constant operand in a tblgen DAG that peeks through
trivial cross regbank copies? Or, is there a better way to fix this?

foad added inline comments.Nov 19 2021, 6:52 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3887

Sure.

llvm/test/CodeGen/AMDGPU/constrained-shift.ll
19

Does D113784 help? Anyway see the discussion in that review about how to pick better banks for constants.

abinavpp added inline comments.Nov 21 2021, 5:32 PM
llvm/test/CodeGen/AMDGPU/constrained-shift.ll
19

Yes, D113784 will fix this. We can wait till that gets merged.

arsenm accepted this revision.Nov 23 2021, 3:12 PM

LGTM. Solving constant regbankselect is not really related and shouldn't hold this up

llvm/test/CodeGen/AMDGPU/constrained-shift.ll
19

The solution I decided on for the constant bus problem is we should just not handle it during globalisel at all. VALU mapped instructions should get all VGPR operands. We should have a new and improved SIFoldOperands which would fold SGPRs into instruction operands. The current scheme was built around the assumption that there were attempts to fold before

This revision is now accepted and ready to land.Nov 23 2021, 3:12 PM
abinavpp added inline comments.Nov 23 2021, 9:38 PM
llvm/test/CodeGen/AMDGPU/constrained-shift.ll
19

Sounds good to me.

foad added inline comments.Nov 29 2021, 2:54 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
2927

@abinavpp Something has gone wrong here. These ANDs were removed by D110231, but now they have come back.

abinavpp added inline comments.Nov 30 2021, 1:32 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
2927

I was aware of this, but I don't think I made it clear in my previous comment:

I'm not sure how the original PatFrags (i.e. the ones with the masks as literal
constants without predicates) are working correctly in global-isel for *some*
(vector cases and scalar 64-bit cases are not working) of the divergent cases.

Sorry about that. The right way to fix this is to fix the cross regbank constant match problem in global-isel. A temporary workaround for this is to keep both the predicated and the literal match versions in PatFrags, i.e., to do:

--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -263,21 +263,24 @@ defvar mask = !sub(width, 1);
 defvar csh_mask = !cast<SDPatternOperator>("csh_mask_"#width);
 
 def cshl_#width : PatFrags<(ops node:$src0, node:$src1),
-  [(shl node:$src0, node:$src1), (shl node:$src0, (csh_mask node:$src1))]>;
+  [(shl node:$src0, node:$src1), (shl node:$src0, (csh_mask node:$src1)),
+    (shl node:$src0, (and node:$src1, mask))]>;
 defvar cshl = !cast<SDPatternOperator>("cshl_"#width);
 def cshl_#width#_oneuse : HasOneUseBinOp<cshl>;
 def clshl_rev_#width : PatFrag <(ops node:$src0, node:$src1),
   (cshl $src1, $src0)>;
 
 def csrl_#width : PatFrags<(ops node:$src0, node:$src1),
-  [(srl node:$src0, node:$src1), (srl node:$src0, (csh_mask node:$src1))]>;
+  [(srl node:$src0, node:$src1), (srl node:$src0, (csh_mask node:$src1)),
+    (srl node:$src0, (and node:$src1, mask))]>;
 defvar csrl = !cast<SDPatternOperator>("csrl_"#width);
 def csrl_#width#_oneuse : HasOneUseBinOp<csrl>;
 def clshr_rev_#width : PatFrag <(ops node:$src0, node:$src1),
   (csrl $src1, $src0)>;
 
 def csra_#width : PatFrags<(ops node:$src0, node:$src1),
-  [(sra node:$src0, node:$src1), (sra node:$src0, (csh_mask node:$src1))]>;
+  [(sra node:$src0, node:$src1), (sra node:$src0, (csh_mask node:$src1)),
+    (sra node:$src0, (and node:$src1, mask))]>;

Should I create a revision for the above change (for now) and then revert it after we fix the constant match problem in global-isel?

foad added inline comments.Nov 30 2021, 1:42 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
2927

Oh, my bad, I had forgotten about the known problem with cross regbank copies. No need to do anything now, let's just wait for a proper fix for that problem.