This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
ClosedPublic

Authored by bcahoon on Apr 8 2021, 3:50 PM.

Details

Summary

Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
the scalar or vector ALU bitfield extract instructions. The
instructions allow both constant or register values for the
offset and width operands.

Diff Detail

Event Timeline

bcahoon created this revision.Apr 8 2021, 3:50 PM
bcahoon requested review of this revision.Apr 8 2021, 3:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2021, 3:50 PM
arsenm added inline comments.Apr 8 2021, 4:06 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2034

Extra space before )

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
771

Typo scslar

781–801

It would be better to handle this operand packing in RegBankSelect applyMappingImpl. We can get simplifications after the fact if you emit the generic bit operations. If the offset/width are really constants, this won't end up constant folding them (actually today SIFoldOperands should constant fold them, but the only reason it tries to do this is a hack for SelectionDAG)

bcahoon updated this revision to Diff 337189.Apr 13 2021, 9:44 AM

Updates based upon initial review. The code that expands G_SBFX/G_UBFX
instructions is moved to regbankselect. This version also expands the 64-bit
vector version of the bitfield extract opcodes. For this, the expansion uses
shifts and masks to perform the 64-bit vector bitfield extract.

Should have some end to end IR tests for all of these. Also would like to see if the constant cases get appropriately constant folded without special casing it. Another case to look out for is when the offset is known if we can reduce the 64-bit shift to 32-bits (it won't happen now, but theoretically it should trigger in the post-regbank combiner)

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1661

Also needs to scalarize vectors

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1560

I don't like creating single use MachineIRBuilders, but you have to do this right now

1621

This dropped the result register

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-sbfx.mir
66

Doesn't cover the scalar cases

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-ubfx.mir
4

There's no real reason to check both wave sizes here

66

Doesn't cover the scalar cases

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ubfx.mir
70

Should also test s16 which is more interesting. Also some vectors, in particular <2 x s32>, <2 x s16>, <3 x s16> and <4 x s16>

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-ubfx.mir
27

You can technically get away with this now, but I would prefer to have a use for all of the results. The other passes all tend to DCE instructions as they go, and someday regbankselect may start doing this

bcahoon updated this revision to Diff 342165.May 1 2021, 11:15 AM

Added ability to use G_SBFX/G_UBFX in end-to-end cases.
This enables additional test cases. To do this, this patch moves
the AArch64 code that creates G_SBFX for shift+sext_inreg to
CombinerHelper.cpp so that it can be used by AMDGPU (and
other targets). Also, added the ability to create G_UBFX from
shift+and in CombinerHelper.cpp. This works on cases when
the offset and width are constants, since the general case is
not supported on AArch64.

The ability to generate G_SBFX/G_UBFX causes multiple existing
global-isel test cases to change. These test cases have been updated.

bcahoon marked 4 inline comments as done.May 1 2021, 11:23 AM
bcahoon added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-sbfx.mir
66

Scalar cases are expanded in the RegBankSelect pass.

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-ubfx.mir
66

Scalar cases are expanded during the RegBankSelect pass.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ubfx.mir
70

Vector cases are disallowed for G_SBFX/G_UBFX by an explicit check in the MachineVerifier.

LGTM with some nits. There are a few codesize regressions where a shift would work just as well but I'm not sure if there's anything you should be doing about it here

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
3973–3975

Don't need to check this. It would be illegal mir to have a non-immediate here. All G_* opcode operands are always required to be the same operand kind

3990

Isn't there a buildSBfx? If not there should be

4024

Isn't there a buildUbfx? If not there should be

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1550

Spaces around +

1551

Spaces around +

llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
554

This is worse for codesize

676

This is worse for codesize. Is this just missing another simplify bits combine somewhere?

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ubfx.mir
70

That doesn't make much sense but OK

bcahoon updated this revision to Diff 346833.May 20 2021, 1:09 PM

Added cases for G_SBFX and G_UBFX to GISelKnownBits, which
helps improve code generation.

foad added a subscriber: foad.May 21 2021, 3:18 AM
foad added inline comments.
llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
498–499 ↗(On Diff #346833)

This should really go in as a separate patch with unit tests in unittests/CodeGen/GlobalISel/KnownBitsTest.cpp.

518 ↗(On Diff #346833)

This looks technically correct but I can't help feeling you'd get more precise results (when Width isn't exactly known) from a specialized KnownBits::sext function. On the other hand, maybe it's not worth optimizing for the case where Width isn't exactly known.

bcahoon marked 5 inline comments as done.May 22 2021, 10:53 AM
bcahoon added inline comments.
llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
498–499 ↗(On Diff #346833)
518 ↗(On Diff #346833)

My initial feeling was that it wasn't worth adding this to KnownBits, but I'm open to it.

bcahoon updated this revision to Diff 347210.May 22 2021, 11:15 AM

Created a separate patch for the changes to computeKnownBits.

foad added inline comments.May 24 2021, 6:29 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1589

Typo uppoer.

1612

Looks like you're missing the sign extension here for the sbfe/sbfx case?

I assume we don't have any execution tests for this, but could really use some

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1612

Yes, this looks suspicious

foad added inline comments.Jun 3 2021, 7:54 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1612

I would suggest that if width is not a constant, you instead expand to Src >> Offset << (64 - Width) >> (64 - Width) where the last shift is signed for G_SBFX, and the first shift might disappear if Offset happens to be 0. I'm assuming the result is undefined if Width is not in the range 1..BitWidth inclusive but the definition of these opcodes does not make that clear.

bcahoon updated this revision to Diff 350313.Jun 7 2021, 8:46 AM

Rebased due to recent related commits and updated with review comments.
Fixed the 64-bit non-constant code generation for bitfield extract.

bcahoon marked 4 inline comments as done.Jun 7 2021, 8:47 AM
bcahoon added inline comments.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1612

Thanks for pointing this out and for the suggestion. I've implemented your suggestion.

foad added inline comments.Jun 8 2021, 1:57 AM
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
4045

You've now got different doxygen comments on the declaration and definition of this function.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2034

Should this be TypeIdx == 0 || TypeIdx == 1?

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1577

I think this needs to be <= 32. At least, the else case won't correctly handle a signed extract with width == 32.

1602

The fact that it's undefined if Width>64 is not really surprising.

The things that might be surprising are (a) it's defined if Width==64 and (b) it's undefined if Width==0. Especially the latter, since some people might expect to be able to do an unsigned zero-width extract and rely on the result being zero.

TODO: make these rules clear in the definition of the G_*BFX opcodes.

bcahoon updated this revision to Diff 351759.Jun 13 2021, 6:28 PM
bcahoon marked an inline comment as done.

Addressed review comments. Fixed doxygen comment, and a check for width <= 32. Also, when converting the G_SBFX/G_UBFX to a shift sequence for the sign extend, clarify that the expectation is that the width range is between 1 and 64-Offset.

bcahoon marked an inline comment as done.Jun 13 2021, 6:31 PM
bcahoon added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2034

I believe that TypeIdx 0 is for the destination and source operands. And, TypeIdx 1 is for the offset and width operands, which should be zero extended.

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1577

Thanks for point this out!

1602

Ah right, good point. I've updated the comment and added the TODO. Let me know if you'd prefer more details in the comment. I can follow up with another PR to update the G_SBFX/G_UBFX opcodes.

LGTM for AMDGPU, and I guess you haven't changed the behaviour for AArch64. What about other GlobalISel targets like MIPS and X86? Will they suddenly start seeing G_SBFX/G_UBSX where they weren't expecting them? Anyway I've added a couple more globalisel reviewers.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
2034

Oh yes, you're right. I had not realized that these type indices refer back to "type0" and "type1" in the definition of G_SBFX/G_UBFX.

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1602

I can follow up with another PR to update the G_SBFX/G_UBFX opcodes.

Yes please!

bcahoon updated this revision to Diff 354264.Jun 24 2021, 8:33 AM
bcahoon marked an inline comment as done.

Rebased

LGTM for AMDGPU, and I guess you haven't changed the behaviour for AArch64. What about other GlobalISel targets like MIPS and X86? Will they suddenly start seeing G_SBFX/G_UBSX where they weren't expecting them? Anyway I've added a couple more globalisel reviewers.

Just to follow up on the question - Neither MIPS nor X86 have added support for G_SBFX and G_UBFX, so they shouldn't see them. The target independent parts that convert to G_SBFX/G_UBFX are guarded by isLegalOrCustom() checks.

foad accepted this revision.Jun 26 2021, 11:52 PM
This revision is now accepted and ready to land.Jun 26 2021, 11:52 PM
This revision was landed with ongoing or failed builds.Jun 28 2021, 6:13 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll