64-bit vector mul is not supported in NEON,
so we use the SVE's mul.
To improve the performance, we can go one step further,
and use SVE's add, so that we can use SVE's mla.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17807 | Maybe add a comment of what is your combination about. | |
17808 | Maybe replace this : return SDValue(); and then you can remove all the rest from the brackets. | |
llvm/test/CodeGen/AArch64/aarch64-combine-mul-add.ll | ||
30 ↗ | (On Diff #509679) | Can you add a test changing the order of the add. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17810 | Nit: Typo: s/sclable/scalable/ |
This sounds like it will be good for performance. I think this needs to be more careful about what it extracts from though. It probably needs a check that we have SVE, and that the extract is from the bottom (index 0) of a scalable vector. (The "scalable vector" might imply the "have-SVE" part though). But otherwise it could trigger in a lot more cases than it should.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17812 | I think we can easily extend this to include ISD::SUB too while we're fixing the add case, right? We should also match to the mls instruction. | |
17815 | I ConstValue is a bit misleading here - isn't this the other operand for the add? Perhaps you can rename this to AddValue? | |
17816 | I think you also need to check what the extract subvector index is too - I think the index should be 0. | |
17824 | I think you need to also check the opcode of MulValue here otherwise we'll start matching any arbitrary pattern: add <2 x i64> (any_op <2 x i64> ...), %op2 | |
17825 | I think we should really be checking the input VT used for EXTRACT_SUBVECTOR here too and only apply the optimisation if the input is a scalable type. Once you know the input VT you also don't need to recalculate it with getContainerForFixedLengthVector because the container VT should be the same, i.e. add (extract_subvector (<vscale x 2 x i64> %in), i64 0), %op2 where we know from the type of %in that ContainerVT=<vscale x 2 x i64>. |
This is looking much better now @hassnaa-arm - thanks for making the changes! I just have a few more minor comments, but then it looks good to go.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17807 | nit: This is just a suggestion, but perhaps this version of the comment is more explicit: // This works on the patterns of: // add v1, (mul v2, v3) // sub v1, (mul v2, v3) // for vectors of type <1 x i64> and <2 x i64> when SVE is available. It will // transform the add/sub to a scalable version, so that we can make use of // SVE's MLA/MLS that will be generated for that pattern. | |
17819 | I still think it's a bit misleading to have 'Const' in the name because it's not necessarily a constant. For example, in your test @test_mul_add_2x64 below the other operand is a function argument. | |
17841 | I think the definition of EXTRACT_SUBVECTOR says this must be a constant so you can just write this instead: if (!cast<ConstantSDNode>(ExtractIndexValue)->isZero()) return SDValue(); Note you can just the ConstantSDNode::isZero function here instead as it's a bit simpler. | |
llvm/test/CodeGen/AArch64/aarch64-combine-add-sub-mul.ll | ||
57 | Nice! |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17837 | Hi @hassnaa-arm, it looks like you're trying to check the use of both the add/sub operands, but you're potentially only checking one. For example, MulValue could have come from N->getOperand(0). I think we may only need to check for multiple uses of the mul, because if it's used more than once we probably want to calculate the mul separately and reuse the result rather than recalculating it in several mla/mls instructions. Also, I imagine that MulValue only ever has one use because the sequence will be %MulValue = mul <vscale x 2 x i64> ... %ExtractLowMul = <2 x i64> extract_subvector <vscale x 2 x i64> %MulValue, i64 0 %Add = add <2 x i64> %ExtractLowMul, %AddOp I think what you probably should be testing for here is one use of %ExtractLowMul, since that's the thing likely to get reused. | |
17850 | nit: Sorry, I only just spotted this. Could you rename this to ScaledAddOp instead? | |
17850 | Again, sorry but I just realised that convertToScalableVector expects AddOp to have a fixed-length vector type. In theory, we could match the same pattern for just scalable vectors, i.e. %MulValue = mul <vscale x 2 x i64> ... %ExtractLowMul = <vscale x 2 x i64> extract_subvector <vscale x 2 x i64> %MulValue, i64 0 %Add = add <vscale x 2 x i64> %ExtractLowMul, %AddOp so it's worth checking for the type of the node (N) result at the start of performMulAddSubCombine, i.e. something like if (N->getOpcode() != ISD::ADD && N->getOpcode() != ISD::SUB) return SDValue(); if (!N->getValueType(0).isFixedLengthVector()) return SDValue(); |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17840 | I think the extract_subvector could also be operand 1, right? So I think you'll need to use the if .. else if .. logic above to decide what the operand is. |
Check that extract node and mul node has one use.
That change triggered new changes in testing file of sve-fixed-length-int-rem.ll
LGTM! Thanks for making all the changes @hassnaa-arm. :)
llvm/test/CodeGen/AArch64/sve-fixed-length-int-rem.ll | ||
---|---|---|
705–706 | Some nice improvements here too! |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17826–17833 | Hi @hassnaa-arm, I think this patch should be reverted and fixed before re-landing because the optimisation doesn't look sound. The highlighted code block allows the original operands for N to be swapped, which whilst correct for commutative operations like ISD::ADD it is incorrect for ISD::SUB which this patch also supports. You can see the bogus result by looking at the tests. For example, test_mul_sub_1x64 shows the IR for (b * c) - a which is not an mls operation, that would be a - (b * c). |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17826–17833 | Ah good spot @paulwalker-arm! This is partly my fault too since I asked @hassnaa-arm to include the ISD::SUB case, but forgot about the fact we can't allow operands to be swapped for ISD::SUB. |
Maybe add a comment of what is your combination about.
add v1 , ( mul v2, v3 ) -> mla v1, v2, v3