Page MenuHomePhabricator

[AArch64] Combine shift instructions in SelectionDAG
ClosedPublic

Authored by asavonic on May 12 2021, 8:54 AM.

Details

Summary

bswap.v2i16 + sitofp in LLVM IR generate a sequence of:

  • REV32 + USHR for bswap.v2i16
  • SHL + SSHR + SCVTF for sext to v2i32 and scvtf

As noted in PR24820, the shift instructions are excessive, and they can
be optimized to just SSHR.

Diff Detail

Event Timeline

asavonic created this revision.May 12 2021, 8:54 AM
asavonic requested review of this revision.May 12 2021, 8:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2021, 8:54 AM

This sound interesting, but there might be a more general way to handle it. From what I can tell the base sshr demands a certain number of top bits. That is usually communicated through TLI.SimplifyDemandedBits with an appropriate DemandedMask.

Then I think it could specify the simplification that happens to target nodes based on demanded bits with an overridden SimplifyDemandedBitsForTargetNode. It would need code similar to https://github.com/llvm/llvm-project/blob/4f05f4c8e66bc76b1d94f5283494404382e3bacd/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L1455, but using AArch64ISD::VSHL/AArch64ISD::VLSHR.

That might be more general, handling any cases where the demanded bits come from anywhere. And SimplifyDemandedBitsForTargetNode can be expanded with more cases if we find them.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14562

I think that the shift amount of VASHR/VLSHR/VSHL are always constants, so Op0.getConstantOperandVal(1) can be used directly.

If you want to do it this way instead though, that sounds fine too. There will only be a limited number of cases where the AArch64ISD::VSHL etc haven't already been simplified.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14575

SDValue Shift2 = Shift1->getOperand(0);
Same for Shift3 below.

14581

This is only used in one place. Same for Shift2Opc above.

14602

I believe these could both just be hasOneUse()

14608

Shift1Opc

Thanks a lot Dave! I'll follow your first suggestion, and if does not work, we can get back to the original patch.

asavonic updated this revision to Diff 346164.May 18 2021, 6:51 AM
  • Used TLI.SimplifyDemandedBits for performShiftCombine.
  • Extended SimplifyDemandedBits to cover AArch64 VLSHR + VSHL.

Thanks. I'm glad this way worked.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14561

Perhaps performVectorShiftCombine

14564

This probably isn't needed, or could be an assert.

14574

I'm not sure these need casts, or to be uint64_t. They should both be fairly small.

14577

Most other uses of this function that I see seem to use:
if (TLI.SimplifyDemandedBits(..))

return SDValue(N, 0);

It may not alter much, but will be closer to what DAGCombiner::combine expects the return value to be for something that changed.

llvm/test/CodeGen/AArch64/aarch64-bswap-ext.ll
3

This probably doesn't need -O2

asavonic updated this revision to Diff 346497.May 19 2021, 10:20 AM
  • Applied CR comments.
dmgreen accepted this revision.May 19 2021, 10:31 AM

Thanks. LGTM

This revision is now accepted and ready to land.May 19 2021, 10:31 AM