Add implementation for the legalization of G_ROTL and G_ROTR machine instructions.
They are very similar to funnel shift instructions, the only difference is funnel shifts have 3 operands,
whereas rotate instructions have two operands, the first being the register that is being rotated and
the second being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering them into
funnel shift instructions if they are legal.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
---|---|---|
346 ↗ | (On Diff #356157) | Isn't there a lowering to turn this into fshr? Why do we need to directly select this? |
llvm/test/CodeGen/AMDGPU/GlobalISel/rotate_pre_instselect.mir | ||
6 ↗ | (On Diff #356157) | Don't need the IR section, registers section or most of the MIR extras here |
llvm/test/CodeGen/AMDGPU/GlobalISel/rotate_pre_regbankselect.mir | ||
713 ↗ | (On Diff #356157) | You can simplify out stuff like this, for the purpose of this test it doesn't need the readlanes |
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
---|---|---|
346 ↗ | (On Diff #356157) | I have previously discussed this with Mateja. There's a combine that turns funnel shifts into rotates, so I wasn't sure if that meant that the rotates were canonical and we had to handle them. Thinking about it now, I guess we can say that rotates are illegal, and the combine should bail out in that case? |
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
---|---|---|
346 ↗ | (On Diff #356157) | If we disabled the "funnel shift to rotate" combiner with a check if the rotate is legal or not, we would also be disabling that combiner for AArch64, specifically for aarch64-prelegalizer-combiner pass. |
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
---|---|---|
346 ↗ | (On Diff #356157) | That combine already checks "is legal or before legalizer", so it should be OK. (It is a bit inefficient, because the pre-legalizer combiner will convert funnel shift -> rotate and then the legalizer will convert it back again, but I think that is OK.) |
When rotating left, use the right negated rotation (negate the number of shifts) only in cases when the destination type size is a power of 2.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2763 | buildFNeg is only for floating point. You need buildNeg, which does not exist yet, but you could implement it. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2752 | There's nothing target specific here, the generic LegalizerHelper can handle this | |
llvm/lib/Target/AMDGPU/VOP3Instructions.td | ||
346 ↗ | (On Diff #356157) | Saying anything is canonical in the MIR is a bit fuzzy. I do think the is legal or before legalizer checks aren't ideal now. I think the combiner needs to distinguish between being legal and not having legalization info |
llvm/test/CodeGen/AMDGPU/GlobalISel/rotate_pre_regbankselect.mir | ||
516–613 ↗ | (On Diff #356157) | Don't need all this meta stuff |
713 ↗ | (On Diff #356157) | You should still remove these intrinsic copies |
Move the legalization of G_ROTL and G_ROTR from AMDGPULegalizerInfo to LegalizerHelper.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
5976 | Thanks! :) |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
5978 | What if it's not a power of two? In that case, should we fall through to the G_SHL/G_LSHR expansion below? |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
5978 | Yes, we should... we go into the "if (!IsFShLegal) " statement only if the instruction with the opcode FShOpc isn't legal and the other is. The problem is if the RevFsh instruction is legal and the FShOpc instruction isn't and the type isn't a power of 2, this code will try to build an instruction which ins't legal, instead of falling through to the G_SHL/G_LSHR part. |
In case the funnel shift instruction is not legal and the type is not a power of 2, use the old way of lowering (G_ROTL/G_ROTR -> G_SHL/G_LSHR).
LGTM, thanks!
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
5982 | Nit: this would be slightly neater: if (IsFshLegal) { return ... } else if (isPowerOf2_32(EltSizeInBits)) { return ... } |
This doesn't do what you think it does :) It sets IsFShLegal to the whole expression LI.isLegalOrCustom({FShOpc, {DstTy, AmtTy}}) || LI.isLegalOrCustom({RevFsh, {DstTy, AmtTy}})