This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
atomicrmw-O0.ll
-
bcmp-inline-small.ll
-
bcmp.ll
-
dag-combine-setcc.ll
-
i128-cmp.ll
-
umulo-128-legalisation-lowering.ll

Differential D124325

[AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns
ClosedPublic

Authored by Allen on Apr 23 2022, 2:23 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
dmgreen
thakis
efriedma
dancgr
david-arm

Summary

Logical operation BIC with DestructiveBinary patterns is temporarily removed as
causes an assert (commit 3c382ed71f15), so try to fix that.
The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. This is the main reason the zeroing is an experimental feature because it has known bugs.
So we add an extra LSL for movprfx expand BIC_ZPZZ_ZERO A, P, A, A when necessary.

movprfx	z0.s, p0/z, z0.s
lsl z0.b, p0/m, z0.b, #0
bic	z0.s, p0/m, z0.s, z0.s

Depends on D88595

Diff Detail

Event Timeline

Allen created this revision.Apr 23 2022, 2:23 AM

Herald added a reviewer: efriedma. · View Herald TranscriptApr 23 2022, 2:23 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: psnobl, hiraditya, kristof.beyls, tschuett. · View Herald Transcript

Allen requested review of this revision.Apr 23 2022, 2:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2022, 2:23 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

add a new test case

Harbormaster completed remote builds in B161020: Diff 424709.Apr 23 2022, 3:26 AM

Hi @Allen, you might be interested in D88595.

This is the main reason the zeroing is an experimental feature because it has known bugs. The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. I believe BIC is an example of this because there's no way to movprfx expand BIC_ZPZZ_ZERO A, P, A, A. An unrealistic usage I know but it's still legitamet.

define <vscale x 4 x i32> @foo(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) #0 {
entry:
  %t1 = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
  %t2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %t1, <vscale x 4 x i32> %a)
  ret <vscale x 4 x i32> %t2
}

This will trigger an assert, even after this patch, because otherwise it would emit

movprfx	z0.s, p0/z, z0.s
bic	z0.s, p0/m, z0.s, z0.s

which is not a valid use of movprfx.

For ACfL(https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-compiler-for-linux) we solve this using a couple of machine function passes but I feel they're not the correct solution hence the above patch where we'd rather have a mechanism to better express the register requirements as part of the pseudo's definition.

Allen added a reviewer: dancgr.Apr 23 2022, 5:59 PM

In D124325#3469815, @paulwalker-arm wrote:
Hi @Allen, you might be interested in D88595.

This is the main reason the zeroing is an experimental feature because it has known bugs. The most significant being that for pseudo instructions that do not have real instructions (including movpfx'd ones) that cover all combinations of register allocation, their expansion will be broken. I believe BIC is an example of this because there's no way to movprfx expand BIC_ZPZZ_ZERO A, P, A, A. An unrealistic usage I know but it's still legitamet.
define <vscale x 4 x i32> @foo(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %a) #0 {
entry:
  %t1 = select <vscale x 4 x i1> %pg, <vscale x 4 x i32> %a, <vscale x 4 x i32> zeroinitializer
  %t2 = tail call <vscale x 4 x i32> @llvm.aarch64.sve.bic.nxv4i32(<vscale x 4 x i1> %pg, <vscale x 4 x i32> %t1, <vscale x 4 x i32> %a)
  ret <vscale x 4 x i32> %t2
}
This will trigger an assert, even after this patch, because otherwise it would emit
movprfx	z0.s, p0/z, z0.s
bic	z0.s, p0/m, z0.s, z0.s
which is not a valid use of movprfx.

For ACfL(https://developer.arm.com/tools-and-software/server-and-hpc/compile/arm-compiler-for-linux) we solve this using a couple of machine function passes but I feel they're not the correct solution hence the above patch where we'd rather have a mechanism to better express the register requirements as part of the pseudo's definition.

Thanks @paulwalker-arm for detail explaning the issue of of register allocation. After read the D88595, I think there is two candidate ways to address this, do I missing something ？

1、the instruction bic is special, so when all the registers are same, it should be expansion with zero  (i.e. BIC (A, A) = A & ~A = 0)

2、make use of  not_all_same, and  {not_all_same(Dst, Op0, Op1), earlyclobber(Dst)} can be simplified as {earlyclobber(Dst)}, which can guard the registers without all same.

Allen added a reviewer: david-arm.Apr 25 2022, 7:42 PM

I don't think either of these are good options. I don't like the idea of having instruction specific handling within the expand code and I'm not sure where the logic from (2) comes from but regardless using earlyclobber in this way will force Dst to be unique and thus forces the need for movpfrx (i.e. it blocks the BIC_ZPZZ_ZERO A, P, A, B use case ).

To my mind the best option is to allow the register requirements to be better expressed as part of the pseudo instruction. Until then I cannot see how we can move this feature out of its experimental phase. Perhaps there's an alternative to D88595 but I've not thought about it for a while. How important is movpfrx based zeroing to you right now? Have you got a problematic use case you can share?

In D124325#3475793, @paulwalker-arm wrote:

I don't think either of these are good options. I don't like the idea of having instruction specific handling within the expand code and I'm not sure where the logic from (2) comes from but regardless using earlyclobber in this way will force Dst to be unique and thus forces the need for movpfrx (i.e. it blocks the BIC_ZPZZ_ZERO A, P, A, B use case ).

To my mind the best option is to allow the register requirements to be better expressed as part of the pseudo instruction. Until then I cannot see how we can move this feature out of its experimental phase. Perhaps there's an alternative to D88595 but I've not thought about it for a while. How important is movpfrx based zeroing to you right now? Have you got a problematic use case you can share?

Thanks, I'll wait for D88595 , and I'm not in a hurry for this, but only confirm the idea.
BTW, the logic from (2) comes from the comment https://reviews.llvm.org/D88595#inline-886782

Allen retitled this revision from [AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns to [WIP][AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns.Apr 26 2022, 9:28 PM

Allen edited the summary of this revision. (Show Details)Apr 26 2022, 9:28 PM

Allen added a parent revision: D88595: [TableGen] Add not_all_same constraint check.

Matt added a subscriber: Matt.Apr 27 2022, 11:47 AM

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

In D124325#3481275, @efriedma wrote:

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

In D124325#3484543, @paulwalker-arm wrote:

In D124325#3481275, @efriedma wrote:

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

I see what you mean. And I guess some of the instructions don't have any sort of "identity" result like this.

I guess you could use an alternative sequence. I can't come up with a two-instruction sequence, but I guess you can movprfx a dummy instruction, like movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; bic z0.b, p0/m, z0.b, z0.b.

In D124325#3485956, @efriedma wrote:

In D124325#3484543, @paulwalker-arm wrote:

In D124325#3481275, @efriedma wrote:

I don't like the idea of having instruction specific handling within the expand code

Would this really be so terrible? I mean, it's arguably a bit of a hack, but it's not that different from the way we handle other pseudo-instructions.

I think so. Pseudo-instruction expansion is often instruction specific but for the movprfx handling we've detached the logic from the instructions (because there's 100s of them) and instead split them across various categories. So I'd much rather see problems solved for a whole category rather than partially for a single instruction within a category.

I see what you mean. And I guess some of the instructions don't have any sort of "identity" result like this.

I guess you could use an alternative sequence. I can't come up with a two-instruction sequence, but I guess you can movprfx a dummy instruction, like movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; bic z0.b, p0/m, z0.b, z0.b.

As D88595 will not be accepted in a short time, so I'll try with your idea.
Before doing so, I want to check your suggestion:
do you mean the additonal add z0.b, z0.b, #0; should match the above movprfx instuction(meet the constraints of movprfx instruction), then the bic instruction will get "identity" result?

The sequence movprfx z0.b, p0/z, z0.b; add z0.b, z0.b, #0; zeros the lanes in z0 that aren't active in p0. (It's a slightly weird way to write it, but as far as I know there isn't any single instruction with equivalent semantics.) Then we can just use the regular instruction that leaves the lanes we just zeroed unmodified. This works regardless of the operation (as long as it doesn't do cross-lane processing).

The "identity" thing is just noting that if the two operands to bic are identical, the result is always zero. But as noted before, that doesn't generalize to other operations.

Add additional add z0.b, z0.b, #0 according comment

Harbormaster completed remote builds in B192500: Diff 468206.Oct 17 2022, 8:05 AM

I think you're missing the point... we only want to insert the extra add instruction if the movprfx would be otherwise be illegal.

Add additional add z0.b, z0.b, #0 when necessary

Harbormaster completed remote builds in B192753: Diff 468545.Oct 18 2022, 8:01 AM

paulwalker-arm added inline comments.Oct 19 2022, 10:31 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
592 ↗	(On Diff #468545)	I don't believe this is safe because only predicated instructions are allowed to follow a predicated `movprfx` instruction. There's a section within https://developer.arm.com/documentation/ddi0487/latest/ Data processing - SVE -> Move operations --> Move prefix that details which instructions are allowed to follow a `movprfx`.
592–600 ↗	(On Diff #468545)	Do any of the tests exercise this code?

efriedma added inline comments.Oct 19 2022, 10:39 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
592 ↗	(On Diff #468545)	Oh, oops, I misread the documentation. Can we use `lsl z0.b, p0/m, z0.b, #0` instead?

paulwalker-arm added inline comments.Oct 19 2022, 10:48 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
592 ↗	(On Diff #468545)	I think so but will check if there's an architecture preferred answer and will report back.

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Oct 20 2022, 1:48 AM

Allen added inline comments.Oct 20 2022, 6:05 PM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
592 ↗	(On Diff #468545)	Thanks @paulwalker-arm and @efriedma. If we don't have a better architecture preferred answer, how about using lsl z0.b, p0/m, z0.b, #0 first? In fact, there are very few scenarios where this additional instruction is required, such as case bic_i64_zero_no_comm.

paulwalker-arm added inline comments.Oct 21 2022, 2:49 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
592 ↗	(On Diff #468545)	lsl is not a great fit for current implementations but as you say this will only be used in rare cases that we ultimately want to solve during register allocation anyway. Plus, looking over the instruction set I'm not sure there's a non-shift alternative so yes let's go with `lsl z0.b, p0/m, z0.b, #0`.

Update add z0.b, z0.b, #0 with lsl z0.b, p0/m, z0.b, #0 as comment

Harbormaster completed remote builds in B193706: Diff 469851.Oct 21 2022, 8:12 PM

Allen marked 5 inline comments as done.Oct 21 2022, 8:12 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
592 ↗	(On Diff #468545)	Done, thanks for your suggestion.

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Oct 21 2022, 8:14 PM

Allen edited the summary of this revision. (Show Details)Oct 23 2022, 7:09 AM

Allen added a parent revision: D88595: [TableGen] Add not_all_same constraint check.

ping ?

Still missing a testcase that actually triggers the "lsl" codepath.

Add a mir case to trigger the "lsl" codepath.

Harbormaster completed remote builds in B194094: Diff 470369.Oct 24 2022, 10:03 PM

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Oct 27 2022, 9:06 AM

any new suggestion about the last update? Thanks.

ping ?

paulwalker-arm added inline comments.Nov 1 2022, 11:34 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
495–499 ↗	(On Diff #470369)	Rather than forcing `DOPRegIsUnique` to an incorrect value perhaps this switch serves a different purpose and at least one[1] of the matching asserts are just not relevant anymore. Your fix is essentially saying "if we cannot prefix the requested instruction we'll instead emit a prefixed_zeroing_mov". Which suggests this fixes the problem for all `DType`s and thus we no longer require `DOPRegIsUnique == true` for correctness (although it is a hint to slightly poor code generation). [1] I say at least one because for this patch you only care about the zering case so really the `Create the additional LSL to zero` code belongs in the `if (FalseZero)` block and thus the other `DOPRegIsUnique` assert remains valid. That said we can always emit a normal COPY/MOV for the `!FalseZero` case which it one instruction rather than two. However, that's not a scenario that can occur with the current code so you wouldn't be able to write tests and thus best avoided.
591–592 ↗	(On Diff #470369)	If you keep the above code then does `if (!DOPRegIsUnique)` work here? and as mentioned I believe it should sit with the `if (FalseZero)` block.

address comments

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
495–499 ↗	(On Diff #470369)	Done, delete the code about forcing DOPRegIsUnique to an incorrect value, thanks. This is version is expect to fix the DestructiveBinary only to begin with.
591–592 ↗	(On Diff #470369)	No, the above code is only active when the #ifndef NDEBUG is true, so it is depend on our configue. Apply your comment and move into the if (FalseZero) block, thanks

Harbormaster completed remote builds in B195644: Diff 472527.Nov 2 2022, 1:14 AM

paulwalker-arm added inline comments.Nov 2 2022, 7:34 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
591–592 ↗	(On Diff #470369)	Sure, but what I'm suggesting is that the code is no longer `DEBUG` only as we now have a real world use for it.

@Allen It looks like something has gone wrong because the latest version looks like a different piece of work, unrelated to movprfx.

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Nov 2 2022, 7:47 AM

update to fix the different piece of work

Harbormaster completed remote builds in B195705: Diff 472614.Nov 2 2022, 8:02 AM

In D124325#3902259, @paulwalker-arm wrote:

@Allen It looks like something has gone wrong because the latest version looks like a different piece of work, unrelated to movprfx.

Oh. sorry for the mistake, and now it is the right version

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Nov 2 2022, 8:04 AM

ping ?

@Allen Did you see my comment from "Wed, Nov 2, 2:34 PM"? Not a requirement but I figure moving this code out of DEBUG and ensuring DOPRegIsUnique is correct for AArch64::DestructiveBinary might be better in the long run. I know at this time you only care about DestructiveBinary, but I think your solution can easily be extended to the other destructive types later on, although part of me thinks they'll just work after this patch.

Delete the NDEBUG as comment, thanks

Harbormaster completed remote builds in B196464: Diff 473641.Nov 7 2022, 5:59 AM

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Nov 7 2022, 6:00 AM

In D124325#3911789, @paulwalker-arm wrote:

@Allen Did you see my comment from "Wed, Nov 2, 2:34 PM"? Not a requirement but I figure moving this code out of DEBUG and ensuring DOPRegIsUnique is correct for AArch64::DestructiveBinary might be better in the long run. I know at this time you only care about DestructiveBinary, but I think your solution can easily be extended to the other destructive types later on, although part of me thinks they'll just work after this patch.

sorry @paulwalker-arm for missing that. Now I deleted the DEBUG.

paulwalker-arm added inline comments.Nov 9 2022, 10:39 AM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
454–455 ↗	(On Diff #473641)	Is it possible to move this logic into the `DOPRegIsUnique` switch statement like case AArch64::DestructiveBinary: DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).getReg();
558 ↗	(On Diff #473641)	I think this reads better as `DOPRegIsUnique \|\| AArch64::DestructiveBinary == DType` because the second part os the exception.
575–576 ↗	(On Diff #473641)	With my previous suggestion does if (DType == AArch64::DestructiveBinary && !DOPRegIsUnique) work here?

Address comment

Allen removed a parent revision: D88595: [TableGen] Add not_all_same constraint check.Nov 9 2022, 11:34 PM

Harbormaster completed remote builds in B197029: Diff 474456.Nov 9 2022, 11:35 PM

Allen marked an inline comment as done.Nov 9 2022, 11:35 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
454–455 ↗	(On Diff #473641)	Apply your comment, thanks
558 ↗	(On Diff #473641)	Done
575–576 ↗	(On Diff #473641)	Yes, it works with your previous suggestion 'DOPRegIsUnique = DstReg != MI.getOperand(SrcIdx).getReg();'

paulwalker-arm accepted this revision.Nov 10 2022, 9:39 AM

This revision is now accepted and ready to land.Nov 10 2022, 9:39 AM

closed with commit ffb109b6852d248c9d2e3202477dccf20aac7151

rupprecht added a subscriber: rupprecht.Nov 10 2022, 9:04 PM

rupprecht added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
494 ↗	(On Diff #474456)	Is fall through intended here? I assume you want a `break;`? llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp:495:3: warning: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] case AArch64::DestructiveBinaryComm:

Allen marked an inline comment as done.Nov 10 2022, 10:06 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
494 ↗	(On Diff #474456)	Yes, I'll add break, thanks

rupprecht added inline comments.Nov 10 2022, 10:07 PM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
494 ↗	(On Diff #474456)	Added a break in 094c0eccdf959c3b9c85219e33c3fcfbab024b61 to avoid fallthrough. Please take a look if that's not what you want.

rupprecht added inline comments.Nov 10 2022, 10:14 PM

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
494 ↗	(On Diff #474456)	Ah, race condition -- I applied the break at the same time you responded. Glad I assumed correctly :)

Allen marked 3 inline comments as done.Nov 10 2022, 10:19 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
494 ↗	(On Diff #474456)	Thanks for fixing.

Allen mentioned this in D141471: [AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging.Jan 11 2023, 2:12 AM

Allen mentioned this in rG2deb10c10842: [AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging.Jan 17 2023, 4:46 AM

Allen added a child revision: D141471: [AArch64][SVE] Fix crash for DestructiveBinaryComm zero merging.Jan 17 2023, 5:15 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

29 lines

test/

CodeGen/

AArch64/

132 lines

12 lines

36 lines

30 lines

26 lines

umulo-128-legalisation-lowering.ll

8 lines

Diff 472527

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 19,484 Lines • ▼ Show 20 Lines	if (DCI.isBeforeLegalize() && VT.isScalarInteger() &&
if (FromVT.isFixedLengthVector() &&		if (FromVT.isFixedLengthVector() &&
FromVT.getVectorElementType() == MVT::i1) {		FromVT.getVectorElementType() == MVT::i1) {
LHS = DAG.getNode(ISD::VECREDUCE_OR, DL, MVT::i1, LHS->getOperand(0));		LHS = DAG.getNode(ISD::VECREDUCE_OR, DL, MVT::i1, LHS->getOperand(0));
LHS = DAG.getNode(ISD::ZERO_EXTEND, DL, ToVT, LHS);		LHS = DAG.getNode(ISD::ZERO_EXTEND, DL, ToVT, LHS);
return DAG.getSetCC(DL, VT, LHS, RHS, Cond);		return DAG.getSetCC(DL, VT, LHS, RHS, Cond);
}		}
}		}

		// Try to express conjunction "cmp 0 (or (xor A0 A1) (xor B0 B1))" as:
		// cmp A0, A0; ccmp A0, B1, 0, eq; cmp inv(Cond) flag
		if (!DCI.isBeforeLegalize() && VT.isScalarInteger() &&
		(Cond == ISD::SETEQ \|\| Cond == ISD::SETNE) && isNullConstant(RHS) &&
		LHS->getOpcode() == ISD::OR &&
		(LHS.getOperand(0)->getOpcode() == ISD::XOR &&
		LHS.getOperand(1)->getOpcode() == ISD::XOR) &&
		LHS.hasOneUse() && LHS.getOperand(0)->hasOneUse() &&
		LHS.getOperand(1)->hasOneUse()) {
		SDValue XOR0 = LHS.getOperand(0);
		SDValue XOR1 = LHS.getOperand(1);
		SDValue CCVal = DAG.getConstant(AArch64CC::EQ, DL, MVT_CC);
		EVT TstVT = LHS->getValueType(0);
		SDValue Cmp =
		DAG.getNode(AArch64ISD::SUBS, DL, DAG.getVTList(TstVT, MVT::i32),
		XOR0.getOperand(0), XOR0.getOperand(1));
		SDValue Overflow = Cmp.getValue(1);
		SDValue NZCVOp = DAG.getConstant(0, DL, MVT::i32);
		SDValue CCmp = DAG.getNode(AArch64ISD::CCMP, DL, MVT_CC, XOR1.getOperand(0),
		XOR1.getOperand(1), NZCVOp, CCVal, Overflow);
		// Invert CSEL's operands.
		SDValue TVal = DAG.getConstant(1, DL, VT);
		SDValue FVal = DAG.getConstant(0, DL, VT);
		AArch64CC::CondCode CC = changeIntCCToAArch64CC(Cond);
		AArch64CC::CondCode InvCC = AArch64CC::getInvertedCondCode(CC);
		return DAG.getNode(AArch64ISD::CSEL, DL, VT, FVal, TVal,
		DAG.getConstant(InvCC, DL, MVT::i32), CCmp);
		}

return SDValue();		return SDValue();
}		}

// Replace a flag-setting operator (eg ANDS) with the generic version		// Replace a flag-setting operator (eg ANDS) with the generic version
// (eg AND) if the flag is unused.		// (eg AND) if the flag is unused.
static SDValue performFlagSettingCombine(SDNode *N,		static SDValue performFlagSettingCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
unsigned GenericOpcode) {		unsigned GenericOpcode) {
▲ Show 20 Lines • Show All 3,514 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/atomicrmw-O0.ll

	Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
	; NOLSE-NEXT: ldr x9, [x0]			; NOLSE-NEXT: ldr x9, [x0]
	; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill			; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; NOLSE-NEXT: b .LBB4_1			; NOLSE-NEXT: b .LBB4_1
	; NOLSE-NEXT: .LBB4_1: // %atomicrmw.start			; NOLSE-NEXT: .LBB4_1: // %atomicrmw.start
	; NOLSE-NEXT: // =>This Loop Header: Depth=1			; NOLSE-NEXT: // =>This Loop Header: Depth=1
	; NOLSE-NEXT: // Child Loop BB4_2 Depth 2			; NOLSE-NEXT: // Child Loop BB4_2 Depth 2
	; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x8, [sp, #32] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x13, [sp, #24] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x10, [sp, #24] // 8-byte Folded Reload
	; NOLSE-NEXT: adds x14, x8, #1			; NOLSE-NEXT: adds x14, x13, #1
	; NOLSE-NEXT: cinc x15, x11, hs			; NOLSE-NEXT: cinc x15, x11, hs
	; NOLSE-NEXT: .LBB4_2: // %atomicrmw.start			; NOLSE-NEXT: .LBB4_2: // %atomicrmw.start
	; NOLSE-NEXT: // Parent Loop BB4_1 Depth=1			; NOLSE-NEXT: // Parent Loop BB4_1 Depth=1
	; NOLSE-NEXT: // => This Inner Loop Header: Depth=2			; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
	; NOLSE-NEXT: ldaxp x10, x9, [x13]			; NOLSE-NEXT: ldaxp x12, x8, [x10]
	; NOLSE-NEXT: cmp x10, x8			; NOLSE-NEXT: cmp x12, x13
	; NOLSE-NEXT: cset w12, ne			; NOLSE-NEXT: cset w9, ne
	; NOLSE-NEXT: cmp x9, x11			; NOLSE-NEXT: cmp x8, x11
	; NOLSE-NEXT: cinc w12, w12, ne			; NOLSE-NEXT: cinc w9, w9, ne
	; NOLSE-NEXT: cbnz w12, .LBB4_4			; NOLSE-NEXT: cbnz w9, .LBB4_4
	; NOLSE-NEXT: // %bb.3: // %atomicrmw.start			; NOLSE-NEXT: // %bb.3: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB4_2 Depth=2			; NOLSE-NEXT: // in Loop: Header=BB4_2 Depth=2
	; NOLSE-NEXT: stlxp w12, x14, x15, [x13]			; NOLSE-NEXT: stlxp w9, x14, x15, [x10]
	; NOLSE-NEXT: cbnz w12, .LBB4_2			; NOLSE-NEXT: cbnz w9, .LBB4_2
	; NOLSE-NEXT: b .LBB4_5			; NOLSE-NEXT: b .LBB4_5
	; NOLSE-NEXT: .LBB4_4: // %atomicrmw.start			; NOLSE-NEXT: .LBB4_4: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB4_2 Depth=2			; NOLSE-NEXT: // in Loop: Header=BB4_2 Depth=2
	; NOLSE-NEXT: stlxp w12, x10, x9, [x13]			; NOLSE-NEXT: stlxp w9, x12, x8, [x10]
	; NOLSE-NEXT: cbnz w12, .LBB4_2			; NOLSE-NEXT: cbnz w9, .LBB4_2
	; NOLSE-NEXT: .LBB4_5: // %atomicrmw.start			; NOLSE-NEXT: .LBB4_5: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB4_1 Depth=1			; NOLSE-NEXT: // in Loop: Header=BB4_1 Depth=1
	; NOLSE-NEXT: eor x11, x9, x11			; NOLSE-NEXT: mov x9, x8
	; NOLSE-NEXT: eor x8, x10, x8
	; NOLSE-NEXT: orr x8, x8, x11
	; NOLSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill
				; NOLSE-NEXT: mov x10, x12
	; NOLSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill			; NOLSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill
				; NOLSE-NEXT: subs x12, x12, x13
				; NOLSE-NEXT: ccmp x8, x11, #0, eq
				; NOLSE-NEXT: cset w8, ne
	; NOLSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill			; NOLSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill
	; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
	; NOLSE-NEXT: cbnz x8, .LBB4_1			; NOLSE-NEXT: tbnz w8, #0, .LBB4_1
	; NOLSE-NEXT: b .LBB4_6			; NOLSE-NEXT: b .LBB4_6
	; NOLSE-NEXT: .LBB4_6: // %atomicrmw.end			; NOLSE-NEXT: .LBB4_6: // %atomicrmw.end
	; NOLSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload
	; NOLSE-NEXT: add sp, sp, #48			; NOLSE-NEXT: add sp, sp, #48
	; NOLSE-NEXT: ret			; NOLSE-NEXT: ret
	;			;
	; LSE-LABEL: test_rmw_add_128:			; LSE-LABEL: test_rmw_add_128:
	; LSE: // %bb.0: // %entry			; LSE: // %bb.0: // %entry
	; LSE-NEXT: sub sp, sp, #48			; LSE-NEXT: sub sp, sp, #48
	; LSE-NEXT: .cfi_def_cfa_offset 48			; LSE-NEXT: .cfi_def_cfa_offset 48
	; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill			; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
	; LSE-NEXT: ldr x8, [x0, #8]			; LSE-NEXT: ldr x8, [x0, #8]
	; LSE-NEXT: ldr x9, [x0]			; LSE-NEXT: ldr x9, [x0]
	; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill			; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; LSE-NEXT: b .LBB4_1			; LSE-NEXT: b .LBB4_1
	; LSE-NEXT: .LBB4_1: // %atomicrmw.start			; LSE-NEXT: .LBB4_1: // %atomicrmw.start
	; LSE-NEXT: // =>This Inner Loop Header: Depth=1			; LSE-NEXT: // =>This Inner Loop Header: Depth=1
	; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload			; LSE-NEXT: ldr x8, [sp, #40] // 8-byte Folded Reload
	; LSE-NEXT: ldr x8, [sp, #32] // 8-byte Folded Reload			; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
	; LSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload			; LSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
	; LSE-NEXT: mov x0, x8			; LSE-NEXT: mov x0, x11
	; LSE-NEXT: mov x1, x10			; LSE-NEXT: mov x1, x8
	; LSE-NEXT: adds x2, x8, #1			; LSE-NEXT: adds x2, x11, #1
	; LSE-NEXT: cinc x11, x10, hs			; LSE-NEXT: cinc x10, x8, hs
	; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3			; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3
	; LSE-NEXT: mov x3, x11			; LSE-NEXT: mov x3, x10
	; LSE-NEXT: caspal x0, x1, x2, x3, [x9]			; LSE-NEXT: caspal x0, x1, x2, x3, [x9]
	; LSE-NEXT: mov x9, x1			; LSE-NEXT: mov x9, x1
	; LSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill
	; LSE-NEXT: eor x11, x9, x10
	; LSE-NEXT: mov x10, x0			; LSE-NEXT: mov x10, x0
	; LSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill			; LSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill
	; LSE-NEXT: eor x8, x10, x8			; LSE-NEXT: subs x11, x10, x11
	; LSE-NEXT: orr x8, x8, x11			; LSE-NEXT: ccmp x9, x8, #0, eq
				; LSE-NEXT: cset w8, ne
	; LSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill			; LSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill
	; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
	; LSE-NEXT: cbnz x8, .LBB4_1			; LSE-NEXT: tbnz w8, #0, .LBB4_1
	; LSE-NEXT: b .LBB4_2			; LSE-NEXT: b .LBB4_2
	; LSE-NEXT: .LBB4_2: // %atomicrmw.end			; LSE-NEXT: .LBB4_2: // %atomicrmw.end
	; LSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload			; LSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload
	; LSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload			; LSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload
	; LSE-NEXT: add sp, sp, #48			; LSE-NEXT: add sp, sp, #48
	; LSE-NEXT: ret			; LSE-NEXT: ret
	entry:			entry:
	%res = atomicrmw add i128* %dst, i128 1 seq_cst			%res = atomicrmw add i128* %dst, i128 1 seq_cst
	▲ Show 20 Lines • Show All 303 Lines • ▼ Show 20 Lines
	; NOLSE-NEXT: ldr x9, [x0]			; NOLSE-NEXT: ldr x9, [x0]
	; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill			; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; NOLSE-NEXT: b .LBB9_1			; NOLSE-NEXT: b .LBB9_1
	; NOLSE-NEXT: .LBB9_1: // %atomicrmw.start			; NOLSE-NEXT: .LBB9_1: // %atomicrmw.start
	; NOLSE-NEXT: // =>This Loop Header: Depth=1			; NOLSE-NEXT: // =>This Loop Header: Depth=1
	; NOLSE-NEXT: // Child Loop BB9_2 Depth 2			; NOLSE-NEXT: // Child Loop BB9_2 Depth 2
	; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x8, [sp, #32] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x13, [sp, #24] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x10, [sp, #24] // 8-byte Folded Reload
	; NOLSE-NEXT: mov w9, w8			; NOLSE-NEXT: mov w8, w13
	; NOLSE-NEXT: mvn w10, w9			; NOLSE-NEXT: mvn w9, w8
	; NOLSE-NEXT: // implicit-def: $x9			; NOLSE-NEXT: // implicit-def: $x8
	; NOLSE-NEXT: mov w9, w10			; NOLSE-NEXT: mov w8, w9
	; NOLSE-NEXT: orr x14, x9, #0xfffffffffffffffe			; NOLSE-NEXT: orr x14, x8, #0xfffffffffffffffe
	; NOLSE-NEXT: mov x15, #-1			; NOLSE-NEXT: mov x15, #-1
	; NOLSE-NEXT: .LBB9_2: // %atomicrmw.start			; NOLSE-NEXT: .LBB9_2: // %atomicrmw.start
	; NOLSE-NEXT: // Parent Loop BB9_1 Depth=1			; NOLSE-NEXT: // Parent Loop BB9_1 Depth=1
	; NOLSE-NEXT: // => This Inner Loop Header: Depth=2			; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
	; NOLSE-NEXT: ldaxp x10, x9, [x13]			; NOLSE-NEXT: ldaxp x12, x8, [x10]
	; NOLSE-NEXT: cmp x10, x8			; NOLSE-NEXT: cmp x12, x13
	; NOLSE-NEXT: cset w12, ne			; NOLSE-NEXT: cset w9, ne
	; NOLSE-NEXT: cmp x9, x11			; NOLSE-NEXT: cmp x8, x11
	; NOLSE-NEXT: cinc w12, w12, ne			; NOLSE-NEXT: cinc w9, w9, ne
	; NOLSE-NEXT: cbnz w12, .LBB9_4			; NOLSE-NEXT: cbnz w9, .LBB9_4
	; NOLSE-NEXT: // %bb.3: // %atomicrmw.start			; NOLSE-NEXT: // %bb.3: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB9_2 Depth=2			; NOLSE-NEXT: // in Loop: Header=BB9_2 Depth=2
	; NOLSE-NEXT: stlxp w12, x14, x15, [x13]			; NOLSE-NEXT: stlxp w9, x14, x15, [x10]
	; NOLSE-NEXT: cbnz w12, .LBB9_2			; NOLSE-NEXT: cbnz w9, .LBB9_2
	; NOLSE-NEXT: b .LBB9_5			; NOLSE-NEXT: b .LBB9_5
	; NOLSE-NEXT: .LBB9_4: // %atomicrmw.start			; NOLSE-NEXT: .LBB9_4: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB9_2 Depth=2			; NOLSE-NEXT: // in Loop: Header=BB9_2 Depth=2
	; NOLSE-NEXT: stlxp w12, x10, x9, [x13]			; NOLSE-NEXT: stlxp w9, x12, x8, [x10]
	; NOLSE-NEXT: cbnz w12, .LBB9_2			; NOLSE-NEXT: cbnz w9, .LBB9_2
	; NOLSE-NEXT: .LBB9_5: // %atomicrmw.start			; NOLSE-NEXT: .LBB9_5: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB9_1 Depth=1			; NOLSE-NEXT: // in Loop: Header=BB9_1 Depth=1
	; NOLSE-NEXT: eor x11, x9, x11			; NOLSE-NEXT: mov x9, x8
	; NOLSE-NEXT: eor x8, x10, x8
	; NOLSE-NEXT: orr x8, x8, x11
	; NOLSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill
				; NOLSE-NEXT: mov x10, x12
	; NOLSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill			; NOLSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill
				; NOLSE-NEXT: subs x12, x12, x13
				; NOLSE-NEXT: ccmp x8, x11, #0, eq
				; NOLSE-NEXT: cset w8, ne
	; NOLSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill			; NOLSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill
	; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
	; NOLSE-NEXT: cbnz x8, .LBB9_1			; NOLSE-NEXT: tbnz w8, #0, .LBB9_1
	; NOLSE-NEXT: b .LBB9_6			; NOLSE-NEXT: b .LBB9_6
	; NOLSE-NEXT: .LBB9_6: // %atomicrmw.end			; NOLSE-NEXT: .LBB9_6: // %atomicrmw.end
	; NOLSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload
	; NOLSE-NEXT: add sp, sp, #48			; NOLSE-NEXT: add sp, sp, #48
	; NOLSE-NEXT: ret			; NOLSE-NEXT: ret
	;			;
	; LSE-LABEL: test_rmw_nand_128:			; LSE-LABEL: test_rmw_nand_128:
	; LSE: // %bb.0: // %entry			; LSE: // %bb.0: // %entry
	; LSE-NEXT: sub sp, sp, #48			; LSE-NEXT: sub sp, sp, #48
	; LSE-NEXT: .cfi_def_cfa_offset 48			; LSE-NEXT: .cfi_def_cfa_offset 48
	; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill			; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
	; LSE-NEXT: ldr x8, [x0, #8]			; LSE-NEXT: ldr x8, [x0, #8]
	; LSE-NEXT: ldr x9, [x0]			; LSE-NEXT: ldr x9, [x0]
	; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill			; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; LSE-NEXT: b .LBB9_1			; LSE-NEXT: b .LBB9_1
	; LSE-NEXT: .LBB9_1: // %atomicrmw.start			; LSE-NEXT: .LBB9_1: // %atomicrmw.start
	; LSE-NEXT: // =>This Inner Loop Header: Depth=1			; LSE-NEXT: // =>This Inner Loop Header: Depth=1
	; LSE-NEXT: ldr x10, [sp, #40] // 8-byte Folded Reload			; LSE-NEXT: ldr x8, [sp, #40] // 8-byte Folded Reload
	; LSE-NEXT: ldr x8, [sp, #32] // 8-byte Folded Reload			; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
	; LSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload			; LSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
	; LSE-NEXT: mov x0, x8			; LSE-NEXT: mov x0, x11
	; LSE-NEXT: mov x1, x10			; LSE-NEXT: mov x1, x8
	; LSE-NEXT: mov w11, w8			; LSE-NEXT: mov w10, w11
	; LSE-NEXT: mvn w12, w11			; LSE-NEXT: mvn w12, w10
	; LSE-NEXT: // implicit-def: $x11			; LSE-NEXT: // implicit-def: $x10
	; LSE-NEXT: mov w11, w12			; LSE-NEXT: mov w10, w12
	; LSE-NEXT: orr x2, x11, #0xfffffffffffffffe			; LSE-NEXT: orr x2, x10, #0xfffffffffffffffe
	; LSE-NEXT: mov x11, #-1			; LSE-NEXT: mov x10, #-1
	; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3			; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3
	; LSE-NEXT: mov x3, x11			; LSE-NEXT: mov x3, x10
	; LSE-NEXT: caspal x0, x1, x2, x3, [x9]			; LSE-NEXT: caspal x0, x1, x2, x3, [x9]
	; LSE-NEXT: mov x9, x1			; LSE-NEXT: mov x9, x1
	; LSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill
	; LSE-NEXT: eor x11, x9, x10
	; LSE-NEXT: mov x10, x0			; LSE-NEXT: mov x10, x0
	; LSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill			; LSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill
	; LSE-NEXT: eor x8, x10, x8			; LSE-NEXT: subs x11, x10, x11
	; LSE-NEXT: orr x8, x8, x11			; LSE-NEXT: ccmp x9, x8, #0, eq
				; LSE-NEXT: cset w8, ne
	; LSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill			; LSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill
	; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill
	; LSE-NEXT: cbnz x8, .LBB9_1			; LSE-NEXT: tbnz w8, #0, .LBB9_1
	; LSE-NEXT: b .LBB9_2			; LSE-NEXT: b .LBB9_2
	; LSE-NEXT: .LBB9_2: // %atomicrmw.end			; LSE-NEXT: .LBB9_2: // %atomicrmw.end
	; LSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload			; LSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload
	; LSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload			; LSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload
	; LSE-NEXT: add sp, sp, #48			; LSE-NEXT: add sp, sp, #48
	; LSE-NEXT: ret			; LSE-NEXT: ret
	entry:			entry:
	%res = atomicrmw nand i128* %dst, i128 1 seq_cst			%res = atomicrmw nand i128* %dst, i128 1 seq_cst
	ret i128 %res			ret i128 %res
	}			}

llvm/test/CodeGen/AArch64/bcmp-inline-small.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O2 < %s -mtriple=aarch64-linux-gnu \| FileCheck %s --check-prefix=CHECKN			; RUN: llc -O2 < %s -mtriple=aarch64-linux-gnu \| FileCheck %s --check-prefix=CHECKN
	; RUN: llc -O2 < %s -mtriple=aarch64-linux-gnu -mattr=strict-align \| FileCheck %s --check-prefix=CHECKS			; RUN: llc -O2 < %s -mtriple=aarch64-linux-gnu -mattr=strict-align \| FileCheck %s --check-prefix=CHECKS

	declare i32 @bcmp(i8, i8, i64) nounwind readonly			declare i32 @bcmp(i8, i8, i64) nounwind readonly
	declare i32 @memcmp(i8, i8, i64) nounwind readonly			declare i32 @memcmp(i8, i8, i64) nounwind readonly

	define i1 @test_b2(i8* %s1, i8* %s2) {			define i1 @test_b2(i8* %s1, i8* %s2) {
	; CHECKN-LABEL: test_b2:			; CHECKN-LABEL: test_b2:
	; CHECKN: // %bb.0: // %entry			; CHECKN: // %bb.0: // %entry
	; CHECKN-NEXT: ldr x8, [x0]			; CHECKN-NEXT: ldr x8, [x0]
	; CHECKN-NEXT: ldr x9, [x1]			; CHECKN-NEXT: ldr x9, [x1]
	; CHECKN-NEXT: ldur x10, [x0, #7]			; CHECKN-NEXT: ldur x10, [x0, #7]
	; CHECKN-NEXT: ldur x11, [x1, #7]			; CHECKN-NEXT: ldur x11, [x1, #7]
	; CHECKN-NEXT: eor x8, x8, x9			; CHECKN-NEXT: cmp x8, x9
	; CHECKN-NEXT: eor x9, x10, x11			; CHECKN-NEXT: ccmp x10, x11, #0, eq
	; CHECKN-NEXT: orr x8, x8, x9
	; CHECKN-NEXT: cmp x8, #0
	; CHECKN-NEXT: cset w0, eq			; CHECKN-NEXT: cset w0, eq
	; CHECKN-NEXT: ret			; CHECKN-NEXT: ret
	;			;
	; CHECKS-LABEL: test_b2:			; CHECKS-LABEL: test_b2:
	; CHECKS: // %bb.0: // %entry			; CHECKS: // %bb.0: // %entry
	; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECKS-NEXT: .cfi_def_cfa_offset 16			; CHECKS-NEXT: .cfi_def_cfa_offset 16
	; CHECKS-NEXT: .cfi_offset w30, -16			; CHECKS-NEXT: .cfi_offset w30, -16
	Show All 12 Lines
	; TODO: Four loads should be within the limit, but the heuristic isn't implemented.			; TODO: Four loads should be within the limit, but the heuristic isn't implemented.
	define i1 @test_b2_align8(i8* align 8 %s1, i8* align 8 %s2) {			define i1 @test_b2_align8(i8* align 8 %s1, i8* align 8 %s2) {
	; CHECKN-LABEL: test_b2_align8:			; CHECKN-LABEL: test_b2_align8:
	; CHECKN: // %bb.0: // %entry			; CHECKN: // %bb.0: // %entry
	; CHECKN-NEXT: ldr x8, [x0]			; CHECKN-NEXT: ldr x8, [x0]
	; CHECKN-NEXT: ldr x9, [x1]			; CHECKN-NEXT: ldr x9, [x1]
	; CHECKN-NEXT: ldur x10, [x0, #7]			; CHECKN-NEXT: ldur x10, [x0, #7]
	; CHECKN-NEXT: ldur x11, [x1, #7]			; CHECKN-NEXT: ldur x11, [x1, #7]
	; CHECKN-NEXT: eor x8, x8, x9			; CHECKN-NEXT: cmp x8, x9
	; CHECKN-NEXT: eor x9, x10, x11			; CHECKN-NEXT: ccmp x10, x11, #0, eq
	; CHECKN-NEXT: orr x8, x8, x9
	; CHECKN-NEXT: cmp x8, #0
	; CHECKN-NEXT: cset w0, eq			; CHECKN-NEXT: cset w0, eq
	; CHECKN-NEXT: ret			; CHECKN-NEXT: ret
	;			;
	; CHECKS-LABEL: test_b2_align8:			; CHECKS-LABEL: test_b2_align8:
	; CHECKS: // %bb.0: // %entry			; CHECKS: // %bb.0: // %entry
	; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECKS-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECKS-NEXT: .cfi_def_cfa_offset 16			; CHECKS-NEXT: .cfi_def_cfa_offset 16
	; CHECKS-NEXT: .cfi_offset w30, -16			; CHECKS-NEXT: .cfi_offset w30, -16
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/bcmp.ll

	Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines

	define i1 @bcmp7(ptr %a, ptr %b) {			define i1 @bcmp7(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp7:			; CHECK-LABEL: bcmp7:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr w8, [x0]			; CHECK-NEXT: ldr w8, [x0]
	; CHECK-NEXT: ldr w9, [x1]			; CHECK-NEXT: ldr w9, [x1]
	; CHECK-NEXT: ldur w10, [x0, #3]			; CHECK-NEXT: ldur w10, [x0, #3]
	; CHECK-NEXT: ldur w11, [x1, #3]			; CHECK-NEXT: ldur w11, [x1, #3]
	; CHECK-NEXT: eor w8, w8, w9			; CHECK-NEXT: cmp w8, w9
	; CHECK-NEXT: eor w9, w10, w11			; CHECK-NEXT: ccmp w10, w11, #0, eq
	; CHECK-NEXT: orr w8, w8, w9
	; CHECK-NEXT: cmp w8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 7)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 7)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp8(ptr %a, ptr %b) {			define i1 @bcmp8(ptr %a, ptr %b) {
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines

	define i1 @bcmp11(ptr %a, ptr %b) {			define i1 @bcmp11(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp11:			; CHECK-LABEL: bcmp11:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldur x10, [x0, #3]			; CHECK-NEXT: ldur x10, [x0, #3]
	; CHECK-NEXT: ldur x11, [x1, #3]			; CHECK-NEXT: ldur x11, [x1, #3]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x10, x11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 11)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 11)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp12(ptr %a, ptr %b) {			define i1 @bcmp12(ptr %a, ptr %b) {
	Show All 16 Lines

	define i1 @bcmp13(ptr %a, ptr %b) {			define i1 @bcmp13(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp13:			; CHECK-LABEL: bcmp13:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldur x10, [x0, #5]			; CHECK-NEXT: ldur x10, [x0, #5]
	; CHECK-NEXT: ldur x11, [x1, #5]			; CHECK-NEXT: ldur x11, [x1, #5]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x10, x11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 13)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 13)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp14(ptr %a, ptr %b) {			define i1 @bcmp14(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp14:			; CHECK-LABEL: bcmp14:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldur x10, [x0, #6]			; CHECK-NEXT: ldur x10, [x0, #6]
	; CHECK-NEXT: ldur x11, [x1, #6]			; CHECK-NEXT: ldur x11, [x1, #6]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x10, x11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 14)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 14)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp15(ptr %a, ptr %b) {			define i1 @bcmp15(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp15:			; CHECK-LABEL: bcmp15:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldur x10, [x0, #7]			; CHECK-NEXT: ldur x10, [x0, #7]
	; CHECK-NEXT: ldur x11, [x1, #7]			; CHECK-NEXT: ldur x11, [x1, #7]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor x9, x10, x11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 15)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 15)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp16(ptr %a, ptr %b) {			define i1 @bcmp16(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp16:			; CHECK-LABEL: bcmp16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 16)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 16)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp20(ptr %a, ptr %b) {			define i1 @bcmp20(ptr %a, ptr %b) {
	▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/dag-combine-setcc.ll

	Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: fmov w8, s0			; CHECK-NEXT: fmov w8, s0
	; CHECK-NEXT: and w0, w8, #0x1			; CHECK-NEXT: and w0, w8, #0x1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp1 = icmp ne <64 x i8> %a, zeroinitializer			%cmp1 = icmp ne <64 x i8> %a, zeroinitializer
	%cast = bitcast <64 x i1> %cmp1 to i64			%cast = bitcast <64 x i1> %cmp1 to i64
	%cmp2 = icmp ne i64 %cast, zeroinitializer			%cmp2 = icmp ne i64 %cast, zeroinitializer
	ret i1 %cmp2			ret i1 %cmp2
	}			}

				define i1 @combine_setcc_eq0_conjunction_xor_or(ptr %a, ptr %b) {
				; CHECK-LABEL: combine_setcc_eq0_conjunction_xor_or:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp x8, x9, [x0]
				; CHECK-NEXT: ldp x10, x11, [x1]
				; CHECK-NEXT: cmp x8, x10
				; CHECK-NEXT: ccmp x9, x11, #0, eq
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%bcmp = tail call i32 @bcmp(ptr dereferenceable(16) %a, ptr dereferenceable(16) %b, i64 16)
				%cmp = icmp eq i32 %bcmp, 0
				ret i1 %cmp
				}

				define i1 @combine_setcc_ne0_conjunction_xor_or(ptr %a, ptr %b) {
				; CHECK-LABEL: combine_setcc_ne0_conjunction_xor_or:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp x8, x9, [x0]
				; CHECK-NEXT: ldp x10, x11, [x1]
				; CHECK-NEXT: cmp x8, x10
				; CHECK-NEXT: ccmp x9, x11, #0, eq
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%bcmp = tail call i32 @bcmp(ptr dereferenceable(16) %a, ptr dereferenceable(16) %b, i64 16)
				%cmp = icmp ne i32 %bcmp, 0
				ret i1 %cmp
				}

				declare i32 @bcmp(ptr nocapture, ptr nocapture, i64)

llvm/test/CodeGen/AArch64/i128-cmp.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-uknown-uknown -verify-machineinstrs -o - %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-uknown-uknown -verify-machineinstrs -o - %s \| FileCheck %s

	declare void @call()			declare void @call()

	define i1 @cmp_i128_eq(i128 %a, i128 %b) {			define i1 @cmp_i128_eq(i128 %a, i128 %b) {
	; CHECK-LABEL: cmp_i128_eq:			; CHECK-LABEL: cmp_i128_eq:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor x8, x1, x3			; CHECK-NEXT: cmp x0, x2
	; CHECK-NEXT: eor x9, x0, x2			; CHECK-NEXT: ccmp x1, x3, #0, eq
	; CHECK-NEXT: orr x8, x9, x8
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp = icmp eq i128 %a, %b			%cmp = icmp eq i128 %a, %b
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @cmp_i128_ne(i128 %a, i128 %b) {			define i1 @cmp_i128_ne(i128 %a, i128 %b) {
	; CHECK-LABEL: cmp_i128_ne:			; CHECK-LABEL: cmp_i128_ne:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor x8, x1, x3			; CHECK-NEXT: cmp x0, x2
	; CHECK-NEXT: eor x9, x0, x2			; CHECK-NEXT: ccmp x1, x3, #0, eq
	; CHECK-NEXT: orr x8, x9, x8
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, ne			; CHECK-NEXT: cset w0, ne
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp = icmp ne i128 %a, %b			%cmp = icmp ne i128 %a, %b
	ret i1 %cmp			ret i1 %cmp
	}			}

	define i1 @cmp_i128_ugt(i128 %a, i128 %b) {			define i1 @cmp_i128_ugt(i128 %a, i128 %b) {
	; CHECK-LABEL: cmp_i128_ugt:			; CHECK-LABEL: cmp_i128_ugt:
	▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp = icmp sle i128 %a, %b			%cmp = icmp sle i128 %a, %b
	ret i1 %cmp			ret i1 %cmp
	}			}

	define void @br_on_cmp_i128_eq(i128 %a, i128 %b) nounwind {			define void @br_on_cmp_i128_eq(i128 %a, i128 %b) nounwind {
	; CHECK-LABEL: br_on_cmp_i128_eq:			; CHECK-LABEL: br_on_cmp_i128_eq:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor x8, x1, x3			; CHECK-NEXT: cmp x0, x2
	; CHECK-NEXT: eor x9, x0, x2			; CHECK-NEXT: ccmp x1, x3, #0, eq
	; CHECK-NEXT: orr x8, x9, x8			; CHECK-NEXT: b.ne .LBB10_2
	; CHECK-NEXT: cbnz x8, .LBB10_2
	; CHECK-NEXT: // %bb.1: // %call			; CHECK-NEXT: // %bb.1: // %call
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: bl call			; CHECK-NEXT: bl call
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .LBB10_2: // %exit			; CHECK-NEXT: .LBB10_2: // %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp = icmp eq i128 %a, %b			%cmp = icmp eq i128 %a, %b
	br i1 %cmp, label %call, label %exit			br i1 %cmp, label %call, label %exit
	call:			call:
	call void @call()			call void @call()
	br label %exit			br label %exit
	exit:			exit:
	ret void			ret void
	}			}

	define void @br_on_cmp_i128_ne(i128 %a, i128 %b) nounwind {			define void @br_on_cmp_i128_ne(i128 %a, i128 %b) nounwind {
	; CHECK-LABEL: br_on_cmp_i128_ne:			; CHECK-LABEL: br_on_cmp_i128_ne:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: eor x8, x1, x3			; CHECK-NEXT: cmp x0, x2
	; CHECK-NEXT: eor x9, x0, x2			; CHECK-NEXT: ccmp x1, x3, #0, eq
	; CHECK-NEXT: orr x8, x9, x8			; CHECK-NEXT: b.eq .LBB11_2
	; CHECK-NEXT: cbz x8, .LBB11_2
	; CHECK-NEXT: // %bb.1: // %call			; CHECK-NEXT: // %bb.1: // %call
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: bl call			; CHECK-NEXT: bl call
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .LBB11_2: // %exit			; CHECK-NEXT: .LBB11_2: // %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp = icmp ne i128 %a, %b			%cmp = icmp ne i128 %a, %b
	br i1 %cmp, label %call, label %exit			br i1 %cmp, label %call, label %exit
	▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/umulo-128-legalisation-lowering.ll

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	; AARCH-NEXT: cinc x11, x14, hs			; AARCH-NEXT: cinc x11, x14, hs
	; AARCH-NEXT: mul x0, x0, x2			; AARCH-NEXT: mul x0, x0, x2
	; AARCH-NEXT: adds x11, x13, x11			; AARCH-NEXT: adds x11, x13, x11
	; AARCH-NEXT: umulh x13, x8, x3			; AARCH-NEXT: umulh x13, x8, x3
	; AARCH-NEXT: cset w14, hs			; AARCH-NEXT: cset w14, hs
	; AARCH-NEXT: adds x11, x12, x11			; AARCH-NEXT: adds x11, x12, x11
	; AARCH-NEXT: adc x12, x13, x14			; AARCH-NEXT: adc x12, x13, x14
	; AARCH-NEXT: adds x10, x11, x10			; AARCH-NEXT: adds x10, x11, x10
	; AARCH-NEXT: adc x9, x12, x9
	; AARCH-NEXT: asr x11, x1, #63			; AARCH-NEXT: asr x11, x1, #63
	; AARCH-NEXT: eor x9, x9, x11			; AARCH-NEXT: adc x9, x12, x9
	; AARCH-NEXT: eor x10, x10, x11			; AARCH-NEXT: cmp x10, x11
	; AARCH-NEXT: orr x9, x10, x9			; AARCH-NEXT: ccmp x9, x11, #0, eq
	; AARCH-NEXT: cmp x9, #0
	; AARCH-NEXT: cset w9, ne			; AARCH-NEXT: cset w9, ne
	; AARCH-NEXT: tbz x8, #63, .LBB1_2			; AARCH-NEXT: tbz x8, #63, .LBB1_2
	; AARCH-NEXT: // %bb.1: // %Entry			; AARCH-NEXT: // %bb.1: // %Entry
	; AARCH-NEXT: eor x8, x3, #0x8000000000000000			; AARCH-NEXT: eor x8, x3, #0x8000000000000000
	; AARCH-NEXT: orr x8, x2, x8			; AARCH-NEXT: orr x8, x2, x8
	; AARCH-NEXT: cbz x8, .LBB1_3			; AARCH-NEXT: cbz x8, .LBB1_3
	; AARCH-NEXT: .LBB1_2: // %Else2			; AARCH-NEXT: .LBB1_2: // %Else2
	; AARCH-NEXT: cbz w9, .LBB1_4			; AARCH-NEXT: cbz w9, .LBB1_4
	Show All 33 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Support logical operation BIC with DestructiveBinary patternsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 472527

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/atomicrmw-O0.ll

llvm/test/CodeGen/AArch64/bcmp-inline-small.ll

llvm/test/CodeGen/AArch64/bcmp.ll

llvm/test/CodeGen/AArch64/dag-combine-setcc.ll

llvm/test/CodeGen/AArch64/i128-cmp.ll

llvm/test/CodeGen/AArch64/umulo-128-legalisation-lowering.ll

[AArch64][SVE] Support logical operation BIC with DestructiveBinary patterns
ClosedPublic