This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
17/17
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
aarch64-combine-add-sub-mul.ll
1
sve-fixed-length-int-rem.ll

Differential D147236

[AArch64][Combine]: combine <2xi64> Mul-Add.
ClosedPublic

Authored by hassnaa-arm on Mar 30 2023, 7:51 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
CarolineConcatto
david-arm

Commits

rG6a8d8f3e28ae: [AArch64][DAGCombiner]: combine <2xi64> add/sub.

Summary

64-bit vector mul is not supported in NEON,
so we use the SVE's mul.
To improve the performance, we can go one step further,
and use SVE's add, so that we can use SVE's mla.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hassnaa-arm created this revision.Mar 30 2023, 7:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 7:51 AM

Herald added subscribers: ctetreau, hiraditya, kristof.beyls. · View Herald Transcript

hassnaa-arm requested review of this revision.Mar 30 2023, 7:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 7:51 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hassnaa-arm added a reviewer: sdesmalen.Mar 30 2023, 7:52 AM

Harbormaster completed remote builds in B222753: Diff 509679.Mar 30 2023, 8:36 AM

CarolineConcatto added a subscriber: CarolineConcatto.Mar 30 2023, 8:41 AM

CarolineConcatto added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17807	Maybe add a comment of what is your combination about. add v1 , ( mul v2, v3 ) -> mla v1, v2, v3
17808	Maybe replace this : if (N->getOpcode() == ISD::ADD) by this: if (N->getOpcode() != ISD::ADD) return SDValue(); and then you can remove all the rest from the brackets.
llvm/test/CodeGen/AArch64/aarch64-combine-mul-add.ll
30 ↗	(On Diff #509679)	Can you add a test changing the order of the add. %add = add <1 x i64> %mul, %a

hassnaa-arm marked 3 inline comments as done.Mar 30 2023, 8:54 AM

Improve code readability, Add comments.

hassnaa-arm added a reviewer: CarolineConcatto.Mar 30 2023, 9:06 AM

Harbormaster completed remote builds in B222760: Diff 509691.Mar 30 2023, 9:32 AM

Matt added a subscriber: Matt.Mar 30 2023, 12:13 PM

Matt added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17810	Nit: Typo: s/sclable/scalable/

hassnaa-arm marked an inline comment as done.Mar 31 2023, 2:38 AM

Fix Typo.

Harbormaster completed remote builds in B222949: Diff 509950.Mar 31 2023, 5:00 AM

This sounds like it will be good for performance. I think this needs to be more careful about what it extracts from though. It probably needs a check that we have SVE, and that the extract is from the bottom (index 0) of a scalable vector. (The "scalable vector" might imply the "have-SVE" part though). But otherwise it could trigger in a lot more cases than it should.

david-arm added a subscriber: david-arm.Apr 3 2023, 1:59 AM

david-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17812	I think we can easily extend this to include `ISD::SUB` too while we're fixing the add case, right? We should also match to the `mls` instruction.
17815	I `ConstValue` is a bit misleading here - isn't this the other operand for the add? Perhaps you can rename this to `AddValue`?
17816	I think you also need to check what the extract subvector index is too - I think the index should be 0.
17824	I think you need to also check the opcode of `MulValue` here otherwise we'll start matching any arbitrary pattern: add <2 x i64> (any_op <2 x i64> ...), %op2
17825	I think we should really be checking the input VT used for EXTRACT_SUBVECTOR here too and only apply the optimisation if the input is a scalable type. Once you know the input VT you also don't need to recalculate it with `getContainerForFixedLengthVector` because the container VT should be the same, i.e. add (extract_subvector (<vscale x 2 x i64> %in), i64 0), %op2 where we know from the type of `%in` that `ContainerVT=<vscale x 2 x i64>`.

hassnaa-arm marked 5 inline comments as done.Apr 3 2023, 6:21 AM

Add additional checks to make sure of the expected pattern.

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptApr 3 2023, 6:22 AM

Remove line added by mistake.

Harbormaster completed remote builds in B223340: Diff 510480.Apr 3 2023, 6:57 AM

This is looking much better now @hassnaa-arm - thanks for making the changes! I just have a few more minor comments, but then it looks good to go.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17807	nit: This is just a suggestion, but perhaps this version of the comment is more explicit: // This works on the patterns of: // add v1, (mul v2, v3) // sub v1, (mul v2, v3) // for vectors of type <1 x i64> and <2 x i64> when SVE is available. It will // transform the add/sub to a scalable version, so that we can make use of // SVE's MLA/MLS that will be generated for that pattern.
17819	I still think it's a bit misleading to have 'Const' in the name because it's not necessarily a constant. For example, in your test `@test_mul_add_2x64` below the other operand is a function argument.
17841	I think the definition of EXTRACT_SUBVECTOR says this must be a constant so you can just write this instead: if (!cast<ConstantSDNode>(ExtractIndexValue)->isZero()) return SDValue(); Note you can just the `ConstantSDNode::isZero` function here instead as it's a bit simpler.
llvm/test/CodeGen/AArch64/aarch64-combine-add-sub-mul.ll
57	Nice!

hassnaa-arm marked 3 inline comments as done.Apr 3 2023, 7:51 AM

Enhance code readability.

Remove line added by mistake.

Harbormaster completed remote builds in B223353: Diff 510499.Apr 3 2023, 8:28 AM

LGTM! Thanks @hassnaa-arm. :)

This revision is now accepted and ready to land.Apr 3 2023, 8:51 AM

Add a check to make sure that the mul has single use.

Harbormaster completed remote builds in B223376: Diff 510530.Apr 3 2023, 10:22 AM

david-arm added inline comments.Apr 4 2023, 1:25 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17837	Hi @hassnaa-arm, it looks like you're trying to check the use of both the add/sub operands, but you're potentially only checking one. For example, `MulValue` could have come from `N->getOperand(0)`. I think we may only need to check for multiple uses of the mul, because if it's used more than once we probably want to calculate the mul separately and reuse the result rather than recalculating it in several mla/mls instructions. Also, I imagine that `MulValue` only ever has one use because the sequence will be %MulValue = mul <vscale x 2 x i64> ... %ExtractLowMul = <2 x i64> extract_subvector <vscale x 2 x i64> %MulValue, i64 0 %Add = add <2 x i64> %ExtractLowMul, %AddOp I think what you probably should be testing for here is one use of `%ExtractLowMul`, since that's the thing likely to get reused.
17850	nit: Sorry, I only just spotted this. Could you rename this to `ScaledAddOp` instead?
17850	Again, sorry but I just realised that `convertToScalableVector` expects `AddOp` to have a fixed-length vector type. In theory, we could match the same pattern for just scalable vectors, i.e. %MulValue = mul <vscale x 2 x i64> ... %ExtractLowMul = <vscale x 2 x i64> extract_subvector <vscale x 2 x i64> %MulValue, i64 0 %Add = add <vscale x 2 x i64> %ExtractLowMul, %AddOp so it's worth checking for the type of the node (N) result at the start of `performMulAddSubCombine`, i.e. something like if (N->getOpcode() != ISD::ADD && N->getOpcode() != ISD::SUB) return SDValue(); if (!N->getValueType(0).isFixedLengthVector()) return SDValue();

hassnaa-arm marked 3 inline comments as done.Apr 4 2023, 4:18 AM

Add check for fixed-length vectors.

david-arm added inline comments.Apr 4 2023, 5:01 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17840	I think the extract_subvector could also be operand 1, right? So I think you'll need to use the `if .. else if ..` logic above to decide what the operand is.

Harbormaster completed remote builds in B223540: Diff 510745.Apr 4 2023, 5:59 AM

Check that extract node and mul node has one use.
That change triggered new changes in testing file of sve-fixed-length-int-rem.ll

Fix format.

LGTM! Thanks for making all the changes @hassnaa-arm. :)

llvm/test/CodeGen/AArch64/sve-fixed-length-int-rem.ll
705–706	Some nice improvements here too!

Harbormaster completed remote builds in B223577: Diff 510802.Apr 4 2023, 8:32 AM

This revision was landed with ongoing or failed builds.Apr 5 2023, 2:19 AM

Closed by commit rG6a8d8f3e28ae: [AArch64][DAGCombiner]: combine <2xi64> add/sub. (authored by Hassnaa Hamdi <hassnaa.hamdi@arm.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Hassnaa Hamdi <hassnaa.hamdi@arm.com> added a commit: rG6a8d8f3e28ae: [AArch64][DAGCombiner]: combine <2xi64> add/sub..

paulwalker-arm added a subscriber: paulwalker-arm.Apr 11 2023, 8:26 AM

paulwalker-arm added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17826–17833	Hi @hassnaa-arm, I think this patch should be reverted and fixed before re-landing because the optimisation doesn't look sound. The highlighted code block allows the original operands for `N` to be swapped, which whilst correct for commutative operations like `ISD::ADD` it is incorrect for `ISD::SUB` which this patch also supports. You can see the bogus result by looking at the tests. For example, `test_mul_sub_1x64` shows the IR for `(b * c) - a` which is not an `mls` operation, that would be `a - (b * c)`.

david-arm added inline comments.Apr 11 2023, 8:29 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17826–17833	Ah good spot @paulwalker-arm! This is partly my fault too since I asked @hassnaa-arm to include the `ISD::SUB` case, but forgot about the fact we can't allow operands to be swapped for `ISD::SUB`.

Hassnaa Hamdi <hassnaa.hamdi@arm.com> added a reverting change: rGcae2a36d480d: Revert "[AArch64][DAGCombiner]: combine <2xi64> add/sub.".Apr 11 2023, 5:44 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

55 lines

test/

CodeGen/

AArch64/

aarch64-combine-add-sub-mul.ll

62 lines

sve-fixed-length-int-rem.ll

112 lines

Diff 511025

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,798 Lines • ▼ Show 20 Lines if (M2.getOpcode() != ISD::MUL && M2.getOpcode() != AArch64ISD::SMULL &&

M2.getOpcode() != AArch64ISD::UMULL) M2.getOpcode() != AArch64ISD::UMULL)

return SDValue(); return SDValue();

EVT VT = N->getValueType(0); EVT VT = N->getValueType(0);

SDValue Sub = DAG.getNode(ISD::SUB, SDLoc(N), VT, X, M1); SDValue Sub = DAG.getNode(ISD::SUB, SDLoc(N), VT, X, M1);

return DAG.getNode(ISD::SUB, SDLoc(N), VT, Sub, M2); return DAG.getNode(ISD::SUB, SDLoc(N), VT, Sub, M2);

} }

// This works on the patterns of:

CarolineConcattoUnsubmitted

Done

Maybe add a comment of what is your combination about.
add v1 , ( mul v2, v3 ) -> mla v1, v2, v3

CarolineConcatto: Maybe add a comment of what is your combination about. add v1 , ( mul v2, v3 ) -> mla v1, v2…

david-armUnsubmitted

Done

nit: This is just a suggestion, but perhaps this version of the comment is more explicit:

// This works on the patterns of:
//   add v1, (mul v2, v3)
//   sub v1, (mul v2, v3)
// for vectors of type <1 x i64> and <2 x i64> when SVE is available. It will
// transform the add/sub to a scalable version, so that we can make use of
// SVE's MLA/MLS that will be generated for that pattern.

david-arm: nit: This is just a suggestion, but perhaps this version of the comment is more explicit: //…

// add v1, (mul v2, v3)

CarolineConcattoUnsubmitted

Done

Maybe replace this :
if (N->getOpcode() == ISD::ADD)
by this:
if (N->getOpcode() != ISD::ADD)

return SDValue();

and then you can remove all the rest from the brackets.

CarolineConcatto: Maybe replace this : if (N->getOpcode() == ISD::ADD) by this: if (N->getOpcode() != ISD::ADD)…

// sub v1, (mul v2, v3)

// for vectors of type <1 x i64> and <2 x i64> when SVE is available.

MattUnsubmitted

Done

// we can make use of SVE's MLA that will be generated for that pattern.

- // Given that the mul is already sclable, as NEON doesn't support i64 mul.

+ // Given that the mul is already scalable, as NEON doesn't support i64 mul.

static SDValue performAddMulCombine(SDNode *N, SelectionDAG &DAG) {

Nit: Typo: s/sclable/scalable/

Matt: Nit: Typo: s/sclable/scalable/

// It will transform the add/sub to a scalable version, so that we can

// make use of SVE's MLA/MLS that will be generated for that pattern

david-armUnsubmitted

Done

I think we can easily extend this to include ISD::SUB too while we're fixing the add case, right? We should also match to the mls instruction.

david-arm: I think we can easily extend this to include `ISD::SUB` too while we're fixing the add case…

static SDValue performMulAddSubCombine(SDNode *N, SelectionDAG &DAG) {

// Before using SVE's features, check first if it's available.

if (!DAG.getSubtarget<AArch64Subtarget>().hasSVE())

david-armUnsubmitted

Done

I ConstValue is a bit misleading here - isn't this the other operand for the add? Perhaps you can rename this to AddValue?

david-arm: I `ConstValue` is a bit misleading here - isn't this the other operand for the add? Perhaps you…

return SDValue();

david-armUnsubmitted

Done

I think you also need to check what the extract subvector index is too - I think the index should be 0.

david-arm: I think you also need to check what the extract subvector index is too - I think the index…

if (N->getOpcode() != ISD::ADD && N->getOpcode() != ISD::SUB)

return SDValue();

david-armUnsubmitted

Done

I still think it's a bit misleading to have 'Const' in the name because it's not necessarily a constant. For example, in your test @test_mul_add_2x64 below the other operand is a function argument.

david-arm: I still think it's a bit misleading to have 'Const' in the name because it's not necessarily a…

if (!N->getValueType(0).isFixedLengthVector())

return SDValue();

SDValue MulValue, Op, ExtractIndexValue, ExtractOp;

david-armUnsubmitted

Done

I think you need to also check the opcode of MulValue here otherwise we'll start matching any arbitrary pattern:

add <2 x i64> (any_op <2 x i64> ...), %op2

david-arm: I think you need to also check the opcode of `MulValue` here otherwise we'll start matching any…

david-armUnsubmitted

Done

I think we should really be checking the input VT used for EXTRACT_SUBVECTOR here too and only apply the optimisation if the input is a scalable type. Once you know the input VT you also don't need to recalculate it with getContainerForFixedLengthVector because the container VT should be the same, i.e.

add (extract_subvector (<vscale x 2 x i64> %in), i64 0), %op2

where we know from the type of %in that ContainerVT=<vscale x 2 x i64>.

david-arm: I think we should really be checking the input VT used for EXTRACT_SUBVECTOR here too and only…

if (N->getOperand(0)->getOpcode() == ISD::EXTRACT_SUBVECTOR) {

ExtractOp = N->getOperand(0);

Op = N->getOperand(1);

} else if (N->getOperand(1)->getOpcode() == ISD::EXTRACT_SUBVECTOR) {

ExtractOp = N->getOperand(1);

Op = N->getOperand(0);

} else

return SDValue();

paulwalker-armUnsubmitted

Not Done

Hi @hassnaa-arm, I think this patch should be reverted and fixed before re-landing because the optimisation doesn't look sound. The highlighted code block allows the original operands for N to be swapped, which whilst correct for commutative operations like ISD::ADD it is incorrect for ISD::SUB which this patch also supports. You can see the bogus result by looking at the tests. For example, test_mul_sub_1x64 shows the IR for (b * c) - a which is not an mls operation, that would be a - (b * c).

paulwalker-arm: Hi @hassnaa-arm, I think this patch should be reverted and fixed before re-landing because the…

david-armUnsubmitted

Not Done

Ah good spot @paulwalker-arm! This is partly my fault too since I asked @hassnaa-arm to include the ISD::SUB case, but forgot about the fact we can't allow operands to be swapped for ISD::SUB.

david-arm: Ah good spot @paulwalker-arm! This is partly my fault too since I asked @hassnaa-arm to include…

MulValue = ExtractOp.getOperand(0);

ExtractIndexValue = ExtractOp.getOperand(1);

david-armUnsubmitted

Done

Hi @hassnaa-arm, it looks like you're trying to check the use of both the add/sub operands, but you're potentially only checking one. For example, MulValue could have come from N->getOperand(0). I think we may only need to check for multiple uses of the mul, because if it's used more than once we probably want to calculate the mul separately and reuse the result rather than recalculating it in several mla/mls instructions.

Also, I imagine that MulValue only ever has one use because the sequence will be

%MulValue = mul <vscale x 2 x i64> ...
%ExtractLowMul = <2 x i64> extract_subvector <vscale x 2 x i64> %MulValue, i64 0
%Add = add <2 x i64> %ExtractLowMul, %AddOp

I think what you probably should be testing for here is one use of %ExtractLowMul, since that's the thing likely to get reused.

david-arm: Hi @hassnaa-arm, it looks like you're trying to check the use of both the add/sub operands, but…

if (!ExtractOp.hasOneUse() && !MulValue.hasOneUse())

return SDValue();

david-armUnsubmitted

Done

I think the extract_subvector could also be operand 1, right? So I think you'll need to use the if .. else if .. logic above to decide what the operand is.

david-arm: I think the extract_subvector could also be operand 1, right? So I think you'll need to use the…

// If the Opcode is NOT MUL, then that is NOT the expected pattern:

david-armUnsubmitted

Done

I think the definition of EXTRACT_SUBVECTOR says this must be a constant so you can just write this instead:

if (!cast<ConstantSDNode>(ExtractIndexValue)->isZero())
  return SDValue();

Note you can just the ConstantSDNode::isZero function here instead as it's a bit simpler.

david-arm: I think the definition of EXTRACT_SUBVECTOR says this must be a constant so you can just write…

if (MulValue.getOpcode() != AArch64ISD::MUL_PRED)

return SDValue();

// If the Mul value type is NOT scalable vector, then that is NOT the expected

// pattern:

EVT VT = MulValue.getValueType();

if (!VT.isScalableVector())

return SDValue();

david-armUnsubmitted

Done

nit: Sorry, I only just spotted this. Could you rename this to ScaledAddOp instead?

david-arm: nit: Sorry, I only just spotted this. Could you rename this to `ScaledAddOp` instead?

david-armUnsubmitted

Done

Again, sorry but I just realised that convertToScalableVector expects AddOp to have a fixed-length vector type. In theory, we could match the same pattern for just scalable vectors, i.e.

%MulValue = mul <vscale x 2 x i64> ...
%ExtractLowMul = <vscale x 2 x i64> extract_subvector <vscale x 2 x i64> %MulValue, i64 0
%Add = add <vscale x 2 x i64> %ExtractLowMul, %AddOp

so it's worth checking for the type of the node (N) result at the start of performMulAddSubCombine, i.e. something like

if (N->getOpcode() != ISD::ADD && N->getOpcode() != ISD::SUB)
  return SDValue();

if (!N->getValueType(0).isFixedLengthVector())
  return SDValue();

david-arm: Again, sorry but I just realised that `convertToScalableVector` expects `AddOp` to have a fixed…

// If the ConstValue is NOT 0, then that is NOT the expected pattern:

if (!cast<ConstantSDNode>(ExtractIndexValue)->isZero())

return SDValue();

SDValue ScaledOp = convertToScalableVector(DAG, VT, Op);

SDValue NewValue = DAG.getNode(N->getOpcode(), SDLoc(N), VT, {ScaledOp, MulValue});

return convertFromScalableVector(DAG, N->getValueType(0), NewValue);

}

static SDValue performAddSubCombine(SDNode *N, static SDValue performAddSubCombine(SDNode *N,

TargetLowering::DAGCombinerInfo &DCI, TargetLowering::DAGCombinerInfo &DCI,

SelectionDAG &DAG) { SelectionDAG &DAG) {

if (SDValue Val = performMulAddSubCombine(N, DAG))

return Val;

// Try to change sum of two reductions. // Try to change sum of two reductions.

if (SDValue Val = performAddUADDVCombine(N, DAG)) if (SDValue Val = performAddUADDVCombine(N, DAG))

return Val; return Val;

if (SDValue Val = performAddDotCombine(N, DAG)) if (SDValue Val = performAddDotCombine(N, DAG))

return Val; return Val;

if (SDValue Val = performAddCSelIntoCSinc(N, DAG)) if (SDValue Val = performAddCSelIntoCSinc(N, DAG))

return Val; return Val;

if (SDValue Val = performNegCSelCombine(N, DAG)) if (SDValue Val = performNegCSelCombine(N, DAG))

▲ Show 20 Lines • Show All 6,731 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-combine-add-sub-mul.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+sve \| FileCheck %s

				define <2 x i64> @test_mul_add_2x64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) {
				; CHECK-LABEL: test_mul_add_2x64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q2 killed $q2 def $z2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: mla z0.d, p0/m, z1.d, z2.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%mul = mul <2 x i64> %b, %c
				%add = add <2 x i64> %a, %mul
				ret <2 x i64> %add
				}

				define <1 x i64> @test_mul_add_1x64(<1 x i64> %a, <1 x i64> %b, <1 x i64> %c) {
				; CHECK-LABEL: test_mul_add_1x64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d2 killed $d2 def $z2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: mla z0.d, p0/m, z1.d, z2.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				; CHECK-NEXT: ret
				%mul = mul <1 x i64> %b, %c
				%add = add <1 x i64> %mul, %a
				ret <1 x i64> %add
				}

				define <2 x i64> @test_mul_sub_2x64(<2 x i64> %a, <2 x i64> %b, <2 x i64> %c) {
				; CHECK-LABEL: test_mul_sub_2x64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl2
				; CHECK-NEXT: // kill: def $q2 killed $q2 def $z2
				; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
				; CHECK-NEXT: mls z0.d, p0/m, z1.d, z2.d
				; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
				; CHECK-NEXT: ret
				%mul = mul <2 x i64> %b, %c
				%sub = sub <2 x i64> %a, %mul
				ret <2 x i64> %sub
				}

				define <1 x i64> @test_mul_sub_1x64(<1 x i64> %a, <1 x i64> %b, <1 x i64> %c) {
				; CHECK-LABEL: test_mul_sub_1x64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
				; CHECK-NEXT: ptrue p0.d, vl1
				; CHECK-NEXT: // kill: def $d2 killed $d2 def $z2
				; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
				; CHECK-NEXT: mls z0.d, p0/m, z1.d, z2.d
				; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
				david-armUnsubmitted Not Done Reply Inline Actions Nice! david-arm: Nice!
				; CHECK-NEXT: ret
				%mul = mul <1 x i64> %b, %c
				%sub = sub <1 x i64> %mul, %a
				ret <1 x i64> %sub
				}

llvm/test/CodeGen/AArch64/sve-fixed-length-int-rem.ll

Show First 20 Lines • Show All 600 Lines • ▼ Show 20 Lines
; VBITS_GE_128-NEXT: mls v2.4s, v5.4s, v7.4s		; VBITS_GE_128-NEXT: mls v2.4s, v5.4s, v7.4s
; VBITS_GE_128-NEXT: mls v3.4s, v4.4s, v6.4s		; VBITS_GE_128-NEXT: mls v3.4s, v4.4s, v6.4s
; VBITS_GE_128-NEXT: stp q0, q1, [x0, #32]		; VBITS_GE_128-NEXT: stp q0, q1, [x0, #32]
; VBITS_GE_128-NEXT: stp q2, q3, [x0]		; VBITS_GE_128-NEXT: stp q2, q3, [x0]
; VBITS_GE_128-NEXT: ret		; VBITS_GE_128-NEXT: ret
;		;
; VBITS_GE_256-LABEL: srem_v16i32:		; VBITS_GE_256-LABEL: srem_v16i32:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #8		; VBITS_GE_256-NEXT: mov x8, #8 // =0x8
; VBITS_GE_256-NEXT: ptrue p0.s, vl8		; VBITS_GE_256-NEXT: ptrue p0.s, vl8
; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]
; VBITS_GE_256-NEXT: movprfx z4, z0		; VBITS_GE_256-NEXT: movprfx z4, z0
; VBITS_GE_256-NEXT: sdiv z4.s, p0/m, z4.s, z2.s		; VBITS_GE_256-NEXT: sdiv z4.s, p0/m, z4.s, z2.s
; VBITS_GE_256-NEXT: movprfx z5, z1		; VBITS_GE_256-NEXT: movprfx z5, z1
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret void		ret void
}		}

; Vector i64 sdiv are not legal for NEON so use SVE when available.		; Vector i64 sdiv are not legal for NEON so use SVE when available.
; FIXME: We should be able to improve the codegen for the 128 bits case here.		; FIXME: We should be able to improve the codegen for the 128 bits case here.
define <1 x i64> @srem_v1i64(<1 x i64> %op1, <1 x i64> %op2) vscale_range(1,0) #0 {		define <1 x i64> @srem_v1i64(<1 x i64> %op1, <1 x i64> %op2) vscale_range(1,0) #0 {
; CHECK-LABEL: srem_v1i64:		; CHECK-LABEL: srem_v1i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
; CHECK-NEXT: ptrue p0.d, vl1
; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0		; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
		; CHECK-NEXT: ptrue p0.d, vl1
		; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
; CHECK-NEXT: movprfx z2, z0		; CHECK-NEXT: movprfx z2, z0
; CHECK-NEXT: sdiv z2.d, p0/m, z2.d, z1.d		; CHECK-NEXT: sdiv z2.d, p0/m, z2.d, z1.d
; CHECK-NEXT: mul z1.d, p0/m, z1.d, z2.d		; CHECK-NEXT: mls z0.d, p0/m, z2.d, z1.d
; CHECK-NEXT: sub d0, d0, d1		; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%res = srem <1 x i64> %op1, %op2		%res = srem <1 x i64> %op1, %op2
ret <1 x i64> %res		ret <1 x i64> %res
}		}

; Vector i64 sdiv are not legal for NEON so use SVE when available.		; Vector i64 sdiv are not legal for NEON so use SVE when available.
; FIXME: We should be able to improve the codegen for the 128 bits case here.		; FIXME: We should be able to improve the codegen for the 128 bits case here.
define <2 x i64> @srem_v2i64(<2 x i64> %op1, <2 x i64> %op2) vscale_range(1,0) #0 {		define <2 x i64> @srem_v2i64(<2 x i64> %op1, <2 x i64> %op2) vscale_range(1,0) #0 {
; CHECK-LABEL: srem_v2i64:		; CHECK-LABEL: srem_v2i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
; CHECK-NEXT: ptrue p0.d, vl2
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0		; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
		; CHECK-NEXT: ptrue p0.d, vl2
		; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
; CHECK-NEXT: movprfx z2, z0		; CHECK-NEXT: movprfx z2, z0
; CHECK-NEXT: sdiv z2.d, p0/m, z2.d, z1.d		; CHECK-NEXT: sdiv z2.d, p0/m, z2.d, z1.d
; CHECK-NEXT: mul z1.d, p0/m, z1.d, z2.d		; CHECK-NEXT: mls z0.d, p0/m, z2.d, z1.d
; CHECK-NEXT: sub v0.2d, v0.2d, v1.2d		; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
		david-armUnsubmitted Not Done Reply Inline Actions Some nice improvements here too! david-arm: Some nice improvements here too!
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%res = srem <2 x i64> %op1, %op2		%res = srem <2 x i64> %op1, %op2
ret <2 x i64> %res		ret <2 x i64> %res
}		}

define void @srem_v4i64(ptr %a, ptr %b) vscale_range(2,0) #0 {		define void @srem_v4i64(ptr %a, ptr %b) vscale_range(2,0) #0 {
; CHECK-LABEL: srem_v4i64:		; CHECK-LABEL: srem_v4i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 10 Lines	; CHECK-NEXT: ret
%res = srem <4 x i64> %op1, %op2		%res = srem <4 x i64> %op1, %op2
store <4 x i64> %res, ptr %a		store <4 x i64> %res, ptr %a
ret void		ret void
}		}

define void @srem_v8i64(ptr %a, ptr %b) #0 {		define void @srem_v8i64(ptr %a, ptr %b) #0 {
; VBITS_GE_128-LABEL: srem_v8i64:		; VBITS_GE_128-LABEL: srem_v8i64:
; VBITS_GE_128: // %bb.0:		; VBITS_GE_128: // %bb.0:
; VBITS_GE_128-NEXT: ldp q4, q5, [x1]
; VBITS_GE_128-NEXT: ptrue p0.d, vl2
; VBITS_GE_128-NEXT: ldp q7, q6, [x1, #32]
; VBITS_GE_128-NEXT: ldp q0, q1, [x0, #32]		; VBITS_GE_128-NEXT: ldp q0, q1, [x0, #32]
; VBITS_GE_128-NEXT: ldp q2, q3, [x0]		; VBITS_GE_128-NEXT: ptrue p0.d, vl2
; VBITS_GE_128-NEXT: movprfx z16, z3		; VBITS_GE_128-NEXT: ldp q2, q3, [x1, #32]
; VBITS_GE_128-NEXT: sdiv z16.d, p0/m, z16.d, z5.d
; VBITS_GE_128-NEXT: movprfx z17, z2
; VBITS_GE_128-NEXT: sdiv z17.d, p0/m, z17.d, z4.d
; VBITS_GE_128-NEXT: mul z5.d, p0/m, z5.d, z16.d
; VBITS_GE_128-NEXT: movprfx z16, z1		; VBITS_GE_128-NEXT: movprfx z16, z1
		; VBITS_GE_128-NEXT: sdiv z16.d, p0/m, z16.d, z3.d
		; VBITS_GE_128-NEXT: mls z1.d, p0/m, z16.d, z3.d
		; VBITS_GE_128-NEXT: movprfx z3, z0
		; VBITS_GE_128-NEXT: sdiv z3.d, p0/m, z3.d, z2.d
		; VBITS_GE_128-NEXT: mls z0.d, p0/m, z3.d, z2.d
		; VBITS_GE_128-NEXT: ldp q4, q5, [x0]
		; VBITS_GE_128-NEXT: ldp q7, q6, [x1]
		; VBITS_GE_128-NEXT: movprfx z16, z5
; VBITS_GE_128-NEXT: sdiv z16.d, p0/m, z16.d, z6.d		; VBITS_GE_128-NEXT: sdiv z16.d, p0/m, z16.d, z6.d
; VBITS_GE_128-NEXT: mul z4.d, p0/m, z4.d, z17.d		; VBITS_GE_128-NEXT: movprfx z2, z4
; VBITS_GE_128-NEXT: movprfx z17, z0		; VBITS_GE_128-NEXT: sdiv z2.d, p0/m, z2.d, z7.d
; VBITS_GE_128-NEXT: sdiv z17.d, p0/m, z17.d, z7.d
; VBITS_GE_128-NEXT: mul z6.d, p0/m, z6.d, z16.d
; VBITS_GE_128-NEXT: mul z7.d, p0/m, z7.d, z17.d
; VBITS_GE_128-NEXT: sub v0.2d, v0.2d, v7.2d
; VBITS_GE_128-NEXT: sub v1.2d, v1.2d, v6.2d
; VBITS_GE_128-NEXT: sub v2.2d, v2.2d, v4.2d
; VBITS_GE_128-NEXT: stp q0, q1, [x0, #32]		; VBITS_GE_128-NEXT: stp q0, q1, [x0, #32]
; VBITS_GE_128-NEXT: sub v0.2d, v3.2d, v5.2d		; VBITS_GE_128-NEXT: movprfx z0, z4
; VBITS_GE_128-NEXT: stp q2, q0, [x0]		; VBITS_GE_128-NEXT: mls z0.d, p0/m, z2.d, z7.d
		; VBITS_GE_128-NEXT: movprfx z1, z5
		; VBITS_GE_128-NEXT: mls z1.d, p0/m, z16.d, z6.d
		; VBITS_GE_128-NEXT: stp q0, q1, [x0]
; VBITS_GE_128-NEXT: ret		; VBITS_GE_128-NEXT: ret
;		;
; VBITS_GE_256-LABEL: srem_v8i64:		; VBITS_GE_256-LABEL: srem_v8i64:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #4		; VBITS_GE_256-NEXT: mov x8, #4 // =0x4
; VBITS_GE_256-NEXT: ptrue p0.d, vl4		; VBITS_GE_256-NEXT: ptrue p0.d, vl4
; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]
; VBITS_GE_256-NEXT: movprfx z4, z0		; VBITS_GE_256-NEXT: movprfx z4, z0
; VBITS_GE_256-NEXT: sdiv z4.d, p0/m, z4.d, z2.d		; VBITS_GE_256-NEXT: sdiv z4.d, p0/m, z4.d, z2.d
; VBITS_GE_256-NEXT: movprfx z5, z1		; VBITS_GE_256-NEXT: movprfx z5, z1
▲ Show 20 Lines • Show All 652 Lines • ▼ Show 20 Lines
; VBITS_GE_128-NEXT: mls v2.4s, v5.4s, v7.4s		; VBITS_GE_128-NEXT: mls v2.4s, v5.4s, v7.4s
; VBITS_GE_128-NEXT: mls v3.4s, v4.4s, v6.4s		; VBITS_GE_128-NEXT: mls v3.4s, v4.4s, v6.4s
; VBITS_GE_128-NEXT: stp q0, q1, [x0, #32]		; VBITS_GE_128-NEXT: stp q0, q1, [x0, #32]
; VBITS_GE_128-NEXT: stp q2, q3, [x0]		; VBITS_GE_128-NEXT: stp q2, q3, [x0]
; VBITS_GE_128-NEXT: ret		; VBITS_GE_128-NEXT: ret
;		;
; VBITS_GE_256-LABEL: urem_v16i32:		; VBITS_GE_256-LABEL: urem_v16i32:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #8		; VBITS_GE_256-NEXT: mov x8, #8 // =0x8
; VBITS_GE_256-NEXT: ptrue p0.s, vl8		; VBITS_GE_256-NEXT: ptrue p0.s, vl8
; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1w { z1.s }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]		; VBITS_GE_256-NEXT: ld1w { z2.s }, p0/z, [x1, x8, lsl #2]
; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1w { z3.s }, p0/z, [x1]
; VBITS_GE_256-NEXT: movprfx z4, z0		; VBITS_GE_256-NEXT: movprfx z4, z0
; VBITS_GE_256-NEXT: udiv z4.s, p0/m, z4.s, z2.s		; VBITS_GE_256-NEXT: udiv z4.s, p0/m, z4.s, z2.s
; VBITS_GE_256-NEXT: movprfx z5, z1		; VBITS_GE_256-NEXT: movprfx z5, z1
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret void		ret void
}		}

; Vector i64 udiv are not legal for NEON so use SVE when available.		; Vector i64 udiv are not legal for NEON so use SVE when available.
; FIXME: We should be able to improve the codegen for the 128 bits case here.		; FIXME: We should be able to improve the codegen for the 128 bits case here.
define <1 x i64> @urem_v1i64(<1 x i64> %op1, <1 x i64> %op2) vscale_range(1,0) #0 {		define <1 x i64> @urem_v1i64(<1 x i64> %op1, <1 x i64> %op2) vscale_range(1,0) #0 {
; CHECK-LABEL: urem_v1i64:		; CHECK-LABEL: urem_v1i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
; CHECK-NEXT: ptrue p0.d, vl1
; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0		; CHECK-NEXT: // kill: def $d0 killed $d0 def $z0
		; CHECK-NEXT: ptrue p0.d, vl1
		; CHECK-NEXT: // kill: def $d1 killed $d1 def $z1
; CHECK-NEXT: movprfx z2, z0		; CHECK-NEXT: movprfx z2, z0
; CHECK-NEXT: udiv z2.d, p0/m, z2.d, z1.d		; CHECK-NEXT: udiv z2.d, p0/m, z2.d, z1.d
; CHECK-NEXT: mul z1.d, p0/m, z1.d, z2.d		; CHECK-NEXT: mls z0.d, p0/m, z2.d, z1.d
; CHECK-NEXT: sub d0, d0, d1		; CHECK-NEXT: // kill: def $d0 killed $d0 killed $z0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%res = urem <1 x i64> %op1, %op2		%res = urem <1 x i64> %op1, %op2
ret <1 x i64> %res		ret <1 x i64> %res
}		}

; Vector i64 udiv are not legal for NEON so use SVE when available.		; Vector i64 udiv are not legal for NEON so use SVE when available.
; FIXME: We should be able to improve the codegen for the 128 bits case here.		; FIXME: We should be able to improve the codegen for the 128 bits case here.
define <2 x i64> @urem_v2i64(<2 x i64> %op1, <2 x i64> %op2) vscale_range(1,0) #0 {		define <2 x i64> @urem_v2i64(<2 x i64> %op1, <2 x i64> %op2) vscale_range(1,0) #0 {
; CHECK-LABEL: urem_v2i64:		; CHECK-LABEL: urem_v2i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
; CHECK-NEXT: ptrue p0.d, vl2
; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0		; CHECK-NEXT: // kill: def $q0 killed $q0 def $z0
		; CHECK-NEXT: ptrue p0.d, vl2
		; CHECK-NEXT: // kill: def $q1 killed $q1 def $z1
; CHECK-NEXT: movprfx z2, z0		; CHECK-NEXT: movprfx z2, z0
; CHECK-NEXT: udiv z2.d, p0/m, z2.d, z1.d		; CHECK-NEXT: udiv z2.d, p0/m, z2.d, z1.d
; CHECK-NEXT: mul z1.d, p0/m, z1.d, z2.d		; CHECK-NEXT: mls z0.d, p0/m, z2.d, z1.d
; CHECK-NEXT: sub v0.2d, v0.2d, v1.2d		; CHECK-NEXT: // kill: def $q0 killed $q0 killed $z0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%res = urem <2 x i64> %op1, %op2		%res = urem <2 x i64> %op1, %op2
ret <2 x i64> %res		ret <2 x i64> %res
}		}

define void @urem_v4i64(ptr %a, ptr %b) vscale_range(2,0) #0 {		define void @urem_v4i64(ptr %a, ptr %b) vscale_range(2,0) #0 {
; CHECK-LABEL: urem_v4i64:		; CHECK-LABEL: urem_v4i64:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 10 Lines	; CHECK-NEXT: ret
%res = urem <4 x i64> %op1, %op2		%res = urem <4 x i64> %op1, %op2
store <4 x i64> %res, ptr %a		store <4 x i64> %res, ptr %a
ret void		ret void
}		}

define void @urem_v8i64(ptr %a, ptr %b) #0 {		define void @urem_v8i64(ptr %a, ptr %b) #0 {
; VBITS_GE_128-LABEL: urem_v8i64:		; VBITS_GE_128-LABEL: urem_v8i64:
; VBITS_GE_128: // %bb.0:		; VBITS_GE_128: // %bb.0:
; VBITS_GE_128-NEXT: ldp q4, q5, [x1]
; VBITS_GE_128-NEXT: ptrue p0.d, vl2
; VBITS_GE_128-NEXT: ldp q7, q6, [x1, #32]
; VBITS_GE_128-NEXT: ldp q0, q1, [x0, #32]		; VBITS_GE_128-NEXT: ldp q0, q1, [x0, #32]
; VBITS_GE_128-NEXT: ldp q2, q3, [x0]		; VBITS_GE_128-NEXT: ptrue p0.d, vl2
; VBITS_GE_128-NEXT: movprfx z16, z3		; VBITS_GE_128-NEXT: ldp q2, q3, [x1, #32]
; VBITS_GE_128-NEXT: udiv z16.d, p0/m, z16.d, z5.d
; VBITS_GE_128-NEXT: movprfx z17, z2
; VBITS_GE_128-NEXT: udiv z17.d, p0/m, z17.d, z4.d
; VBITS_GE_128-NEXT: mul z5.d, p0/m, z5.d, z16.d
; VBITS_GE_128-NEXT: movprfx z16, z1		; VBITS_GE_128-NEXT: movprfx z16, z1
		; VBITS_GE_128-NEXT: udiv z16.d, p0/m, z16.d, z3.d
		; VBITS_GE_128-NEXT: mls z1.d, p0/m, z16.d, z3.d
		; VBITS_GE_128-NEXT: movprfx z3, z0
		; VBITS_GE_128-NEXT: udiv z3.d, p0/m, z3.d, z2.d
		; VBITS_GE_128-NEXT: mls z0.d, p0/m, z3.d, z2.d
		; VBITS_GE_128-NEXT: ldp q4, q5, [x0]
		; VBITS_GE_128-NEXT: ldp q7, q6, [x1]
		; VBITS_GE_128-NEXT: movprfx z16, z5
; VBITS_GE_128-NEXT: udiv z16.d, p0/m, z16.d, z6.d		; VBITS_GE_128-NEXT: udiv z16.d, p0/m, z16.d, z6.d
; VBITS_GE_128-NEXT: mul z4.d, p0/m, z4.d, z17.d		; VBITS_GE_128-NEXT: movprfx z2, z4
; VBITS_GE_128-NEXT: movprfx z17, z0		; VBITS_GE_128-NEXT: udiv z2.d, p0/m, z2.d, z7.d
; VBITS_GE_128-NEXT: udiv z17.d, p0/m, z17.d, z7.d
; VBITS_GE_128-NEXT: mul z6.d, p0/m, z6.d, z16.d
; VBITS_GE_128-NEXT: mul z7.d, p0/m, z7.d, z17.d
; VBITS_GE_128-NEXT: sub v0.2d, v0.2d, v7.2d
; VBITS_GE_128-NEXT: sub v1.2d, v1.2d, v6.2d
; VBITS_GE_128-NEXT: sub v2.2d, v2.2d, v4.2d
; VBITS_GE_128-NEXT: stp q0, q1, [x0, #32]		; VBITS_GE_128-NEXT: stp q0, q1, [x0, #32]
; VBITS_GE_128-NEXT: sub v0.2d, v3.2d, v5.2d		; VBITS_GE_128-NEXT: movprfx z0, z4
; VBITS_GE_128-NEXT: stp q2, q0, [x0]		; VBITS_GE_128-NEXT: mls z0.d, p0/m, z2.d, z7.d
		; VBITS_GE_128-NEXT: movprfx z1, z5
		; VBITS_GE_128-NEXT: mls z1.d, p0/m, z16.d, z6.d
		; VBITS_GE_128-NEXT: stp q0, q1, [x0]
; VBITS_GE_128-NEXT: ret		; VBITS_GE_128-NEXT: ret
;		;
; VBITS_GE_256-LABEL: urem_v8i64:		; VBITS_GE_256-LABEL: urem_v8i64:
; VBITS_GE_256: // %bb.0:		; VBITS_GE_256: // %bb.0:
; VBITS_GE_256-NEXT: mov x8, #4		; VBITS_GE_256-NEXT: mov x8, #4 // =0x4
; VBITS_GE_256-NEXT: ptrue p0.d, vl4		; VBITS_GE_256-NEXT: ptrue p0.d, vl4
; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z0.d }, p0/z, [x0, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]		; VBITS_GE_256-NEXT: ld1d { z1.d }, p0/z, [x0]
; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]		; VBITS_GE_256-NEXT: ld1d { z2.d }, p0/z, [x1, x8, lsl #3]
; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]		; VBITS_GE_256-NEXT: ld1d { z3.d }, p0/z, [x1]
; VBITS_GE_256-NEXT: movprfx z4, z0		; VBITS_GE_256-NEXT: movprfx z4, z0
; VBITS_GE_256-NEXT: udiv z4.d, p0/m, z4.d, z2.d		; VBITS_GE_256-NEXT: udiv z4.d, p0/m, z4.d, z2.d
; VBITS_GE_256-NEXT: movprfx z5, z1		; VBITS_GE_256-NEXT: movprfx z5, z1
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][Combine]: combine <2xi64> Mul-Add.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 511025

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/aarch64-combine-add-sub-mul.ll

llvm/test/CodeGen/AArch64/sve-fixed-length-int-rem.ll

[AArch64][Combine]: combine <2xi64> Mul-Add.
ClosedPublic