Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
15094–15096 | This doesn't look correct because AArch64ISD::SRAD_PRED implies inactive lanes are undef. Perhaps you meant to use convertMergedOpToPredOp? | |
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td | ||
1585 | I don't think you intended this change, which is likely the result of the performIntrinsicCombine issue. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
13038 | Given this is a dag combine does this need to ensure VT is a legal type before it can safely emit an AArch64ISD specific node? useSVEForFixedLengthVectorVT has probably got you covered for fixed-length types but you're also handling scalable vector types here. Just a thought but given the isIntDivCheap logic above I'm wondering if it is better to do likewise for vectors when SVE will be used, i.e. just say they're cheap and then have isel rules to lower them to ASRD. I believe this'll mean you'll get free handling for the larger than legal types. That said, the one thing I'm not sure about is if that makes the handling of isNegatedPowerOf2 cases more awkward. |
- Rework patch to use instruction selection to match (sdiv (dup <pow2>)) nodes rather than generating asrd nodes
- This allows both larger than legal fixed types to work correctly, as well as sdiv intrinsics.
Looks good with one minor suggestion from me.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
13605 | Nit/suggestion: Please add a comment showing the form of the combine. |
If I'm following correctly, this patch is doing three things:
- It adds patterns to select asrd if the RHS is a power of two.
- It turns sdiv of negative power of two into sdiv+neg.
- It makes sdiv of i8 and i16 legal if the RHS is a power of two.
The first two are clearly fine. (3) is a bit fragile; usually legality doesn't depend on the structure of the operands, only their type. The legalizer won't rerun if the structure of the operand changes. If you're specifically checking for a splat of a constant, I guess it's okay.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
13610 | Do we really need to preserve the fact that the operation was originally an intrinsic? |
I agree on this, however there didn't seem to be a good alternative that would allow catching both the intrinsics as well as larger than legal fixed types (i.e. using BuildSDIVPow2 directly).
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
13610 | I believe so yes, the semantics of the predicate are different between SDIV_PRED and the intrinsics, hence why we have different patterns to select these. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
13610 | Oh, I see, the intrinsic has an extra operand. |
The previous patch looked cleaner but had the issue of not working well for illegal types, whereas this patch works well but is a little more verbose than I was expected. What about having the ASRD_MERGE_OP1 target specific node but introducing it as part of LowerFixedLengthVectorIntDivideToSVE. I feel this will simplify the patch whilst maintaining it current capabilities and remove the frailty Eli described. When it comes to optimising the intrinsic we could achieve this via instcombine.
- Change approach again to be between the last two
- Go back to matching asrd nodes in instruction selection
- Create the asrd nodes during lowering rather than BuildSDIVPow2
- This makes the patch cleaner whilst still supporting larger than legal types
- Intrinsics will be handled in a separate patch in instcombine.
Thanks @bsmith and sorry for the late review.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
13029 | This change is not needed anymore. | |
13032 | This should just be handle larger than legal types as it applies to all vector types. | |
18763–18764 | Up to you but I think having SDValue Op1 = convertToScalableVector(DAG, ContainerVT, Op.getOperand(0)); SDValue Op2 = DAG.getTargetConstant(Log2_64(SplatVal), dl, MVT::i32) is cleaner whilst also being more consistent with the OP1 in the opcode's name. |
This change is not needed anymore.