bswap.v2i16 + sitofp in LLVM IR generate a sequence of:
- REV32 + USHR for bswap.v2i16
- SHL + SSHR + SCVTF for sext to v2i32 and scvtf
As noted in PR24820, the shift instructions are excessive, and they can
be optimized to just SSHR.
Differential D102333
[AArch64] Combine shift instructions in SelectionDAG asavonic on May 12 2021, 8:54 AM. Authored by
Details bswap.v2i16 + sitofp in LLVM IR generate a sequence of:
As noted in PR24820, the shift instructions are excessive, and they can
Diff Detail Event TimelineComment Actions This sound interesting, but there might be a more general way to handle it. From what I can tell the base sshr demands a certain number of top bits. That is usually communicated through TLI.SimplifyDemandedBits with an appropriate DemandedMask. Then I think it could specify the simplification that happens to target nodes based on demanded bits with an overridden SimplifyDemandedBitsForTargetNode. It would need code similar to https://github.com/llvm/llvm-project/blob/4f05f4c8e66bc76b1d94f5283494404382e3bacd/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L1455, but using AArch64ISD::VSHL/AArch64ISD::VLSHR. That might be more general, handling any cases where the demanded bits come from anywhere. And SimplifyDemandedBitsForTargetNode can be expanded with more cases if we find them.
Comment Actions If you want to do it this way instead though, that sounds fine too. There will only be a limited number of cases where the AArch64ISD::VSHL etc haven't already been simplified.
Comment Actions Thanks a lot Dave! I'll follow your first suggestion, and if does not work, we can get back to the original patch. Comment Actions
Comment Actions Thanks. I'm glad this way worked.
|
Perhaps performVectorShiftCombine