The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/constrained-shift.ll | ||
---|---|---|
19 | Do we see anything obvious in this change that's not allowing us to eliminate the and in global-isel for the divergent cases? |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
3887 | Could use getIConstantVRegVal here to get the APInt value directly? | |
3887 | Opnd1 is a bit confusing because it's MI.getOperand(2). Maybe call it something vague like MaskVal? | |
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td | ||
261 | Maybe change this to foreach logwidth = [4, 5, 6] so you can put the definition of csh_mask_#logwidth inside the loop? Or maybe that's impossible because you need to refer to logwidth inside a C++ code fragment? |
llvm/test/CodeGen/AMDGPU/constrained-shift.ll | ||
---|---|---|
19 | I think a cross-regbank copy is getting in the way of matching the constant value 15. Maybe use getIConstantVRegValWithLookThrough to look through the copy? |
llvm/test/CodeGen/AMDGPU/constrained-shift.ll | ||
---|---|---|
19 | This is another case where regbankselect or a the post regbank combiner should have materialized the constant in VGPR to begin with |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
3887 | For some reason, naming the operands as Val and MaskVal is confusing me, how about LHS and RHS? | |
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td | ||
261 | Right, I'm not able to refer them in the code fragments. | |
llvm/test/CodeGen/AMDGPU/constrained-shift.ll | ||
19 |
getIConstantVRegValWithLookThrough() in the predicate alone won't help here | |
19 |
Doing something like: --- a/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp +++ b/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp @@ -472,6 +472,10 @@ RegBankSelect::MappingCost RegBankSelect::computeMapping( Register Reg = MO.getReg(); if (!Reg) continue; + + if (MO.isUse() && isConstantOrConstantVector(*(MRI->getVRegDef(Reg)), *MRI)) + continue; + LLVM_DEBUG(dbgs() << "Opd" << OpIdx << '\n'); or forcing SGPRRegBank for constant operands in I'm not sure how the original PatFrags (i.e. the ones with the masks as literal Is there a way to write a constant operand in a tblgen DAG that peeks through |
LGTM. Solving constant regbankselect is not really related and shouldn't hold this up
llvm/test/CodeGen/AMDGPU/constrained-shift.ll | ||
---|---|---|
19 | The solution I decided on for the constant bus problem is we should just not handle it during globalisel at all. VALU mapped instructions should get all VGPR operands. We should have a new and improved SIFoldOperands which would fold SGPRs into instruction operands. The current scheme was built around the assumption that there were attempts to fold before |
llvm/test/CodeGen/AMDGPU/constrained-shift.ll | ||
---|---|---|
19 | Sounds good to me. |
llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | ||
---|---|---|
2927 | I was aware of this, but I don't think I made it clear in my previous comment:
Sorry about that. The right way to fix this is to fix the cross regbank constant match problem in global-isel. A temporary workaround for this is to keep both the predicated and the literal match versions in PatFrags, i.e., to do: --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td @@ -263,21 +263,24 @@ defvar mask = !sub(width, 1); defvar csh_mask = !cast<SDPatternOperator>("csh_mask_"#width); def cshl_#width : PatFrags<(ops node:$src0, node:$src1), - [(shl node:$src0, node:$src1), (shl node:$src0, (csh_mask node:$src1))]>; + [(shl node:$src0, node:$src1), (shl node:$src0, (csh_mask node:$src1)), + (shl node:$src0, (and node:$src1, mask))]>; defvar cshl = !cast<SDPatternOperator>("cshl_"#width); def cshl_#width#_oneuse : HasOneUseBinOp<cshl>; def clshl_rev_#width : PatFrag <(ops node:$src0, node:$src1), (cshl $src1, $src0)>; def csrl_#width : PatFrags<(ops node:$src0, node:$src1), - [(srl node:$src0, node:$src1), (srl node:$src0, (csh_mask node:$src1))]>; + [(srl node:$src0, node:$src1), (srl node:$src0, (csh_mask node:$src1)), + (srl node:$src0, (and node:$src1, mask))]>; defvar csrl = !cast<SDPatternOperator>("csrl_"#width); def csrl_#width#_oneuse : HasOneUseBinOp<csrl>; def clshr_rev_#width : PatFrag <(ops node:$src0, node:$src1), (csrl $src1, $src0)>; def csra_#width : PatFrags<(ops node:$src0, node:$src1), - [(sra node:$src0, node:$src1), (sra node:$src0, (csh_mask node:$src1))]>; + [(sra node:$src0, node:$src1), (sra node:$src0, (csh_mask node:$src1)), + (sra node:$src0, (and node:$src1, mask))]>; Should I create a revision for the above change (for now) and then revert it after we fix the constant match problem in global-isel? |
llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | ||
---|---|---|
2927 | Oh, my bad, I had forgotten about the known problem with cross regbank copies. No need to do anything now, let's just wait for a proper fix for that problem. |
Could use getIConstantVRegVal here to get the APInt value directly?