Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
the scalar or vector ALU bitfield extract instructions. The
instructions allow both constant or register values for the
offset and width operands.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
2034 | Extra space before ) | |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
771 | Typo scslar | |
781–801 | It would be better to handle this operand packing in RegBankSelect applyMappingImpl. We can get simplifications after the fact if you emit the generic bit operations. If the offset/width are really constants, this won't end up constant folding them (actually today SIFoldOperands should constant fold them, but the only reason it tries to do this is a hack for SelectionDAG) |
Updates based upon initial review. The code that expands G_SBFX/G_UBFX
instructions is moved to regbankselect. This version also expands the 64-bit
vector version of the bitfield extract opcodes. For this, the expansion uses
shifts and masks to perform the 64-bit vector bitfield extract.
Should have some end to end IR tests for all of these. Also would like to see if the constant cases get appropriately constant folded without special casing it. Another case to look out for is when the offset is known if we can reduce the 64-bit shift to 32-bits (it won't happen now, but theoretically it should trigger in the post-regbank combiner)
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
1661 | Also needs to scalarize vectors | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
1560 | I don't like creating single use MachineIRBuilders, but you have to do this right now | |
1621 | This dropped the result register | |
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-sbfx.mir | ||
66 | Doesn't cover the scalar cases | |
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-ubfx.mir | ||
4 | There's no real reason to check both wave sizes here | |
66 | Doesn't cover the scalar cases | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ubfx.mir | ||
70 | Should also test s16 which is more interesting. Also some vectors, in particular <2 x s32>, <2 x s16>, <3 x s16> and <4 x s16> | |
llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ubfx.mir | ||
27 | You can technically get away with this now, but I would prefer to have a use for all of the results. The other passes all tend to DCE instructions as they go, and someday regbankselect may start doing this |
Added ability to use G_SBFX/G_UBFX in end-to-end cases.
This enables additional test cases. To do this, this patch moves
the AArch64 code that creates G_SBFX for shift+sext_inreg to
CombinerHelper.cpp so that it can be used by AMDGPU (and
other targets). Also, added the ability to create G_UBFX from
shift+and in CombinerHelper.cpp. This works on cases when
the offset and width are constants, since the general case is
not supported on AArch64.
The ability to generate G_SBFX/G_UBFX causes multiple existing
global-isel test cases to change. These test cases have been updated.
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-sbfx.mir | ||
---|---|---|
66 | Scalar cases are expanded in the RegBankSelect pass. | |
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-ubfx.mir | ||
66 | Scalar cases are expanded during the RegBankSelect pass. | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ubfx.mir | ||
70 | Vector cases are disallowed for G_SBFX/G_UBFX by an explicit check in the MachineVerifier. |
LGTM with some nits. There are a few codesize regressions where a shift would work just as well but I'm not sure if there's anything you should be doing about it here
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
3973–3975 | Don't need to check this. It would be illegal mir to have a non-immediate here. All G_* opcode operands are always required to be the same operand kind | |
3990 | Isn't there a buildSBfx? If not there should be | |
4024 | Isn't there a buildUbfx? If not there should be | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
1550 | Spaces around + | |
1551 | Spaces around + | |
llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll | ||
554 | This is worse for codesize | |
676 | This is worse for codesize. Is this just missing another simplify bits combine somewhere? | |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ubfx.mir | ||
70 | That doesn't make much sense but OK |
Added cases for G_SBFX and G_UBFX to GISelKnownBits, which
helps improve code generation.
llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp | ||
---|---|---|
498–499 ↗ | (On Diff #346833) | This should really go in as a separate patch with unit tests in unittests/CodeGen/GlobalISel/KnownBitsTest.cpp. |
518 ↗ | (On Diff #346833) | This looks technically correct but I can't help feeling you'd get more precise results (when Width isn't exactly known) from a specialized KnownBits::sext function. On the other hand, maybe it's not worth optimizing for the case where Width isn't exactly known. |
llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp | ||
---|---|---|
498–499 ↗ | (On Diff #346833) | Good idea - added https://reviews.llvm.org/D102969 |
518 ↗ | (On Diff #346833) | My initial feeling was that it wasn't worth adding this to KnownBits, but I'm open to it. |
I assume we don't have any execution tests for this, but could really use some
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1612 | Yes, this looks suspicious |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1612 | I would suggest that if width is not a constant, you instead expand to Src >> Offset << (64 - Width) >> (64 - Width) where the last shift is signed for G_SBFX, and the first shift might disappear if Offset happens to be 0. I'm assuming the result is undefined if Width is not in the range 1..BitWidth inclusive but the definition of these opcodes does not make that clear. |
Rebased due to recent related commits and updated with review comments.
Fixed the 64-bit non-constant code generation for bitfield extract.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1612 | Thanks for pointing this out and for the suggestion. I've implemented your suggestion. |
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
4045 | You've now got different doxygen comments on the declaration and definition of this function. | |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
2034 | Should this be TypeIdx == 0 || TypeIdx == 1? | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
1577 | I think this needs to be <= 32. At least, the else case won't correctly handle a signed extract with width == 32. | |
1602 | The fact that it's undefined if Width>64 is not really surprising. The things that might be surprising are (a) it's defined if Width==64 and (b) it's undefined if Width==0. Especially the latter, since some people might expect to be able to do an unsigned zero-width extract and rely on the result being zero. TODO: make these rules clear in the definition of the G_*BFX opcodes. |
Addressed review comments. Fixed doxygen comment, and a check for width <= 32. Also, when converting the G_SBFX/G_UBFX to a shift sequence for the sign extend, clarify that the expectation is that the width range is between 1 and 64-Offset.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
2034 | I believe that TypeIdx 0 is for the destination and source operands. And, TypeIdx 1 is for the offset and width operands, which should be zero extended. | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
1577 | Thanks for point this out! | |
1602 | Ah right, good point. I've updated the comment and added the TODO. Let me know if you'd prefer more details in the comment. I can follow up with another PR to update the G_SBFX/G_UBFX opcodes. |
LGTM for AMDGPU, and I guess you haven't changed the behaviour for AArch64. What about other GlobalISel targets like MIPS and X86? Will they suddenly start seeing G_SBFX/G_UBSX where they weren't expecting them? Anyway I've added a couple more globalisel reviewers.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
2034 | Oh yes, you're right. I had not realized that these type indices refer back to "type0" and "type1" in the definition of G_SBFX/G_UBFX. | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
1602 |
Yes please! |
Just to follow up on the question - Neither MIPS nor X86 have added support for G_SBFX and G_UBFX, so they shouldn't see them. The target independent parts that convert to G_SBFX/G_UBFX are guarded by isLegalOrCustom() checks.
Don't need to check this. It would be illegal mir to have a non-immediate here. All G_* opcode operands are always required to be the same operand kind