Page MenuHomePhabricator

[AArch64][Combine]: combine <2xi64> Mul-Add.

Authored by hassnaa-arm on Mar 30 2023, 7:51 AM.



64-bit vector mul is not supported in NEON,
so we use the SVE's mul.
To improve the performance, we can go one step further,
and use SVE's add, so that we can use SVE's mla.

Diff Detail

Event Timeline

hassnaa-arm created this revision.Mar 30 2023, 7:51 AM
hassnaa-arm requested review of this revision.Mar 30 2023, 7:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 7:51 AM
CarolineConcatto added inline comments.

Maybe add a comment of what is your combination about.
add v1 , ( mul v2, v3 ) -> mla v1, v2, v3


Maybe replace this :
if (N->getOpcode() == ISD::ADD)
by this:
if (N->getOpcode() != ISD::ADD)

return SDValue();

and then you can remove all the rest from the brackets.

30 ↗(On Diff #509679)

Can you add a test changing the order of the add.
%add = add <1 x i64> %mul, %a

hassnaa-arm marked 3 inline comments as done.Mar 30 2023, 8:54 AM

Improve code readability, Add comments.

Matt added a subscriber: Matt.Mar 30 2023, 12:13 PM
Matt added inline comments.

Nit: Typo: s/sclable/scalable/

hassnaa-arm marked an inline comment as done.Mar 31 2023, 2:38 AM
dmgreen added a subscriber: dmgreen.Apr 3 2023, 1:20 AM

This sounds like it will be good for performance. I think this needs to be more careful about what it extracts from though. It probably needs a check that we have SVE, and that the extract is from the bottom (index 0) of a scalable vector. (The "scalable vector" might imply the "have-SVE" part though). But otherwise it could trigger in a lot more cases than it should.

david-arm added inline comments.

I think we can easily extend this to include ISD::SUB too while we're fixing the add case, right? We should also match to the mls instruction.


I ConstValue is a bit misleading here - isn't this the other operand for the add? Perhaps you can rename this to AddValue?


I think you also need to check what the extract subvector index is too - I think the index should be 0.


I think you need to also check the opcode of MulValue here otherwise we'll start matching any arbitrary pattern:

add <2 x i64> (any_op <2 x i64> ...), %op2

I think we should really be checking the input VT used for EXTRACT_SUBVECTOR here too and only apply the optimisation if the input is a scalable type. Once you know the input VT you also don't need to recalculate it with getContainerForFixedLengthVector because the container VT should be the same, i.e.

add (extract_subvector (<vscale x 2 x i64> %in), i64 0), %op2

where we know from the type of %in that ContainerVT=<vscale x 2 x i64>.

hassnaa-arm marked 5 inline comments as done.Apr 3 2023, 6:21 AM

Add additional checks to make sure of the expected pattern.

Remove line added by mistake.

This is looking much better now @hassnaa-arm - thanks for making the changes! I just have a few more minor comments, but then it looks good to go.


nit: This is just a suggestion, but perhaps this version of the comment is more explicit:

// This works on the patterns of:
//   add v1, (mul v2, v3)
//   sub v1, (mul v2, v3)
// for vectors of type <1 x i64> and <2 x i64> when SVE is available. It will
// transform the add/sub to a scalable version, so that we can make use of
// SVE's MLA/MLS that will be generated for that pattern.

I still think it's a bit misleading to have 'Const' in the name because it's not necessarily a constant. For example, in your test @test_mul_add_2x64 below the other operand is a function argument.


I think the definition of EXTRACT_SUBVECTOR says this must be a constant so you can just write this instead:

if (!cast<ConstantSDNode>(ExtractIndexValue)->isZero())
  return SDValue();

Note you can just the ConstantSDNode::isZero function here instead as it's a bit simpler.



hassnaa-arm marked 3 inline comments as done.Apr 3 2023, 7:51 AM

Enhance code readability.

Remove line added by mistake.

david-arm accepted this revision.Apr 3 2023, 8:51 AM

LGTM! Thanks @hassnaa-arm. :)

This revision is now accepted and ready to land.Apr 3 2023, 8:51 AM

Add a check to make sure that the mul has single use.

david-arm added inline comments.Apr 4 2023, 1:25 AM

Hi @hassnaa-arm, it looks like you're trying to check the use of both the add/sub operands, but you're potentially only checking one. For example, MulValue could have come from N->getOperand(0). I think we may only need to check for multiple uses of the mul, because if it's used more than once we probably want to calculate the mul separately and reuse the result rather than recalculating it in several mla/mls instructions.

Also, I imagine that MulValue only ever has one use because the sequence will be

%MulValue = mul <vscale x 2 x i64> ...
%ExtractLowMul = <2 x i64> extract_subvector <vscale x 2 x i64> %MulValue, i64 0
%Add = add <2 x i64> %ExtractLowMul, %AddOp

I think what you probably should be testing for here is one use of %ExtractLowMul, since that's the thing likely to get reused.


nit: Sorry, I only just spotted this. Could you rename this to ScaledAddOp instead?


Again, sorry but I just realised that convertToScalableVector expects AddOp to have a fixed-length vector type. In theory, we could match the same pattern for just scalable vectors, i.e.

%MulValue = mul <vscale x 2 x i64> ...
%ExtractLowMul = <vscale x 2 x i64> extract_subvector <vscale x 2 x i64> %MulValue, i64 0
%Add = add <vscale x 2 x i64> %ExtractLowMul, %AddOp

so it's worth checking for the type of the node (N) result at the start of performMulAddSubCombine, i.e. something like

if (N->getOpcode() != ISD::ADD && N->getOpcode() != ISD::SUB)
  return SDValue();

if (!N->getValueType(0).isFixedLengthVector())
  return SDValue();
hassnaa-arm marked 3 inline comments as done.Apr 4 2023, 4:18 AM

Add check for fixed-length vectors.

david-arm added inline comments.Apr 4 2023, 5:01 AM

I think the extract_subvector could also be operand 1, right? So I think you'll need to use the if .. else if .. logic above to decide what the operand is.

hassnaa-arm marked an inline comment as done.

Check that extract node and mul node has one use.
That change triggered new changes in testing file of sve-fixed-length-int-rem.ll

Fix format.

david-arm accepted this revision.Apr 4 2023, 7:40 AM

LGTM! Thanks for making all the changes @hassnaa-arm. :)


Some nice improvements here too!

This revision was landed with ongoing or failed builds.Apr 5 2023, 2:19 AM
This revision was automatically updated to reflect the committed changes.
paulwalker-arm added inline comments.

Hi @hassnaa-arm, I think this patch should be reverted and fixed before re-landing because the optimisation doesn't look sound. The highlighted code block allows the original operands for N to be swapped, which whilst correct for commutative operations like ISD::ADD it is incorrect for ISD::SUB which this patch also supports. You can see the bogus result by looking at the tests. For example, test_mul_sub_1x64 shows the IR for (b * c) - a which is not an mls operation, that would be a - (b * c).

david-arm added inline comments.Apr 11 2023, 8:29 AM

Ah good spot @paulwalker-arm! This is partly my fault too since I asked @hassnaa-arm to include the ISD::SUB case, but forgot about the fact we can't allow operands to be swapped for ISD::SUB.