This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Generate ASRD instructions for power of 2 signed divides
ClosedPublic

Authored by bsmith on Nov 5 2021, 7:31 AM.

Diff Detail

Event Timeline

bsmith created this revision.Nov 5 2021, 7:31 AM
bsmith requested review of this revision.Nov 5 2021, 7:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2021, 7:31 AM
paulwalker-arm added inline comments.Nov 5 2021, 7:45 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
15094–15096

This doesn't look correct because AArch64ISD::SRAD_PRED implies inactive lanes are undef. Perhaps you meant to use convertMergedOpToPredOp?

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
1585

I don't think you intended this change, which is likely the result of the performIntrinsicCombine issue.

bsmith updated this revision to Diff 385094.Nov 5 2021, 9:05 AM
  • Make semantics of SRAD_PRED match the other *_PRED nodes
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13038

Given this is a dag combine does this need to ensure VT is a legal type before it can safely emit an AArch64ISD specific node? useSVEForFixedLengthVectorVT has probably got you covered for fixed-length types but you're also handling scalable vector types here.

Just a thought but given the isIntDivCheap logic above I'm wondering if it is better to do likewise for vectors when SVE will be used, i.e. just say they're cheap and then have isel rules to lower them to ASRD. I believe this'll mean you'll get free handling for the larger than legal types.

That said, the one thing I'm not sure about is if that makes the handling of isNegatedPowerOf2 cases more awkward.

Matt added a subscriber: Matt.Nov 17 2021, 12:04 PM
bsmith updated this revision to Diff 388193.Nov 18 2021, 7:05 AM
  • Rework patch to use instruction selection to match (sdiv (dup <pow2>)) nodes rather than generating asrd nodes
    • This allows both larger than legal fixed types to work correctly, as well as sdiv intrinsics.
peterwaller-arm accepted this revision.Nov 23 2021, 4:26 AM

Looks good with one minor suggestion from me.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13605

Nit/suggestion: Please add a comment showing the form of the combine.

This revision is now accepted and ready to land.Nov 23 2021, 4:26 AM

If I'm following correctly, this patch is doing three things:

  1. It adds patterns to select asrd if the RHS is a power of two.
  2. It turns sdiv of negative power of two into sdiv+neg.
  3. It makes sdiv of i8 and i16 legal if the RHS is a power of two.

The first two are clearly fine. (3) is a bit fragile; usually legality doesn't depend on the structure of the operands, only their type. The legalizer won't rerun if the structure of the operand changes. If you're specifically checking for a splat of a constant, I guess it's okay.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13610

Do we really need to preserve the fact that the operation was originally an intrinsic?

If I'm following correctly, this patch is doing three things:

  1. It adds patterns to select asrd if the RHS is a power of two.
  2. It turns sdiv of negative power of two into sdiv+neg.
  3. It makes sdiv of i8 and i16 legal if the RHS is a power of two.

The first two are clearly fine. (3) is a bit fragile; usually legality doesn't depend on the structure of the operands, only their type. The legalizer won't rerun if the structure of the operand changes. If you're specifically checking for a splat of a constant, I guess it's okay.

I agree on this, however there didn't seem to be a good alternative that would allow catching both the intrinsics as well as larger than legal fixed types (i.e. using BuildSDIVPow2 directly).

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13610

I believe so yes, the semantics of the predicate are different between SDIV_PRED and the intrinsics, hence why we have different patterns to select these.

efriedma accepted this revision.Nov 24 2021, 11:14 AM
efriedma added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13610

Oh, I see, the intrinsic has an extra operand.

The previous patch looked cleaner but had the issue of not working well for illegal types, whereas this patch works well but is a little more verbose than I was expected. What about having the ASRD_MERGE_OP1 target specific node but introducing it as part of LowerFixedLengthVectorIntDivideToSVE. I feel this will simplify the patch whilst maintaining it current capabilities and remove the frailty Eli described. When it comes to optimising the intrinsic we could achieve this via instcombine.

bsmith updated this revision to Diff 389807.Nov 25 2021, 8:09 AM
  • Change approach again to be between the last two
    • Go back to matching asrd nodes in instruction selection
    • Create the asrd nodes during lowering rather than BuildSDIVPow2
    • This makes the patch cleaner whilst still supporting larger than legal types
    • Intrinsics will be handled in a separate patch in instcombine.
paulwalker-arm accepted this revision.Nov 25 2021, 9:19 AM

Thanks @bsmith and sorry for the late review.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13029

This change is not needed anymore.

13032

This should just be handle larger than legal types as it applies to all vector types.

18763–18764

Up to you but I think having

SDValue Op1 = convertToScalableVector(DAG, ContainerVT, Op.getOperand(0));
SDValue Op2 = DAG.getTargetConstant(Log2_64(SplatVal), dl, MVT::i32)

is cleaner whilst also being more consistent with the OP1 in the opcode's name.

This revision was landed with ongoing or failed builds.Nov 26 2021, 3:14 AM
This revision was automatically updated to reflect the committed changes.