This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
8/8
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
atomicrmw-O0.ll
2/2
bcmp.ll
2
dag-combine-setcc.ll
-
i128-cmp.ll

Differential D137936

[AArch64] Optimize cmp chain before legalization
ClosedPublic

Authored by Allen on Nov 14 2022, 4:47 AM.

Download Raw Diff

Details

Reviewers

dmgreen
bcl5980
efriedma
spatel
RKSimon

Commits

rG3651bc83b6f2: [AArch64] Optimize cmp chain before legalization

Summary

For case bcmp9, there is extras AND and EXTEND int the chain of OR/XOR, which prevent the transform, so enable the optimize before legalization.

The key IR frag related:

      t37: i32,ch = load<(load (s8) from %ir.4), anyext from i8> t0, t11, undef:i64
        t12: i64 = add t4, Constant:i64<8>
      t38: i32,ch = load<(load (s8) from %ir.5), anyext from i8> t0, t12, undef:i64
    t39: i32 = xor t37, t38
  t40: i64 = any_extend t39
t42: i64 = and t40, Constant:i64<255>

Depends on D138398 to fix combine_setcc_glue

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Allen created this revision.Nov 14 2022, 4:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 4:47 AM

Herald added subscribers: ecnelises, hiraditya, kristof.beyls. · View Herald Transcript

Allen requested review of this revision.Nov 14 2022, 4:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 4:47 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Allen edited the summary of this revision. (Show Details)Nov 14 2022, 4:48 AM

Harbormaster completed remote builds in B197502: Diff 475108.Nov 14 2022, 6:34 AM

There is code in DAGCombiner::BackwardsPropagateMask that can propagate And's back to loads, and would usually handle patterns like this but it can't look through any_extends. It has seemed to be useful in the past though.
You could also imagine transforming i64 and (any_extend(i32 x), mask) into i64 zext(i32 and(x, mask) under AArch64, as we know the zext will be free. I think that would run into other problems though, as the zext between the And isn't handled for all the BFI cases. Without improving BFI at the same time it would lead to other regressions.

So I'm not sure either of those methods would be better than this, even if they are more general. I think it would be useful to add deliberate tests for this though if we can, especially for the edge cases.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11982 ↗	(On Diff #475108)	Why "Dup" in the name? Because there are two loads? When I see Dup I think of vector splats.
12013 ↗	(On Diff #475108)	Does it not need to check other bits about the Mask?
12030 ↗	(On Diff #475108)	Should XOR be Logicop.getOpcode()?
20666 ↗	(On Diff #475108)	This can be done separately.

Allen mentioned this in D138124: [NFC] Fix typo in comment.Nov 16 2022, 5:11 AM

Allen mentioned this in rG8fbb6f8678a0: [NFC] Fix typo in comment.Nov 16 2022, 7:36 AM

Allen updated this revision to Diff 475825.Nov 16 2022, 7:38 AM

Allen marked an inline comment as done.

In D137936#3930036, @dmgreen wrote:

There is code in DAGCombiner::BackwardsPropagateMask that can propagate And's back to loads, and would usually handle patterns like this but it can't look through any_extends. It has seemed to be useful in the past though.
You could also imagine transforming i64 and (any_extend(i32 x), mask) into i64 zext(i32 and(x, mask) under AArch64, as we know the zext will be free. I think that would run into other problems though, as the zext between the And isn't handled for all the BFI cases. Without improving BFI at the same time it would lead to other regressions.

So I'm not sure either of those methods would be better than this, even if they are more general. I think it would be useful to add deliberate tests for this though if we can, especially for the edge cases.

a) visitAND already try the i64 and (any_extend(i32 x), mask) into i64 zext(i32 and(x, mask)`, but fail as checking MaskedValueIsZero, https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L6302
b) I also tryed to adjust ISD::ANY_EXTEND --> ISD::ZERO_EXTEND in DAGTypeLegalizer::PromoteIntOp_ZERO_EXTEND, then we'll also get the expect zext without AND, but some x86 cases regressions for chains of ANDs, so I limited to both operands are Loads.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11982 ↗	(On Diff #475108)	Yes, Both operands are Load. I'll rename it to CombineZExtLogicopDoubleExtLoad
12013 ↗	(On Diff #475108)	Done, thanks
12030 ↗	(On Diff #475108)	Oh, good catch. thanks
20666 ↗	(On Diff #475108)	ok, https://reviews.llvm.org/D138124

Harbormaster completed remote builds in B198003: Diff 475825.Nov 16 2022, 8:44 AM

Maybe you still can try to enable the optimize before legalization to fix these issues like:

if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && isNullConstant(RHS) &&
    LHS->getOpcode() == ISD::OR && LHS->hasOneUse() &&
    isOrXorChain(LHS, NumXors, WorkList)) {
  SDValue XOR0, XOR1;
  std::tie(XOR0, XOR1) = WorkList[0];
  SDValue Cmp = DAG.getSetCC(DL, VT, XOR0, XOR1, ISD::SETNE);
  for (unsigned I = 1; I < WorkList.size(); I++) {
    std::tie(XOR0, XOR1) = WorkList[I];
    SDValue CmpChain = DAG.getSetCC(DL, VT, XOR0, XOR1, ISD::SETNE);
    Cmp = DAG.getNode(ISD::OR, DL, VT, Cmp, CmpChain);
  }

  // Exit early by inverting the condition, which help reduce indentations.
  return DAG.getSetCC(DL, VT, Cmp, DAG.getConstant(0, DL, VT), Cond);
}

After that you need remove if (!DCI.isBeforeLegalize()) for performOrXorChainCombine and remove the function call in lowerSetCC to make the code cleaner.
As @dmgreen mentioned before, maybe you can try to combine to AArch64ISD::SUBS + AArch64ISD::CCMP. But you may need to fix some type legalization issues to make it works.
Current patch looks too specific, I think.

Allen updated this revision to Diff 476451.Nov 18 2022, 6:29 AM

Allen marked 2 inline comments as done.

Allen retitled this revision from [AArch64] Optimize cmp chain when the result is tested for [in]equality with 0 to [AArch64] Optimize cmp chain before legalization.

Allen edited the summary of this revision. (Show Details)

In D137936#3936125, @bcl5980 wrote:
Maybe you still can try to enable the optimize before legalization to fix these issues like:
if ((Cond == ISD::SETEQ || Cond == ISD::SETNE) && isNullConstant(RHS) &&
    LHS->getOpcode() == ISD::OR && LHS->hasOneUse() &&
    isOrXorChain(LHS, NumXors, WorkList)) {
  SDValue XOR0, XOR1;
  std::tie(XOR0, XOR1) = WorkList[0];
  SDValue Cmp = DAG.getSetCC(DL, VT, XOR0, XOR1, ISD::SETNE);
  for (unsigned I = 1; I < WorkList.size(); I++) {
    std::tie(XOR0, XOR1) = WorkList[I];
    SDValue CmpChain = DAG.getSetCC(DL, VT, XOR0, XOR1, ISD::SETNE);
    Cmp = DAG.getNode(ISD::OR, DL, VT, Cmp, CmpChain);
  }

  // Exit early by inverting the condition, which help reduce indentations.
  return DAG.getSetCC(DL, VT, Cmp, DAG.getConstant(0, DL, VT), Cond);
}
After that you need remove if (!DCI.isBeforeLegalize()) for performOrXorChainCombine and remove the function call in lowerSetCC to make the code cleaner.
As @dmgreen mentioned before, maybe you can try to combine to AArch64ISD::SUBS + AArch64ISD::CCMP. But you may need to fix some type legalization issues to make it works.
Current patch looks too specific, I think.

Thanks @bcl5980 for your idea, apply your comment.

Harbormaster completed remote builds in B198456: Diff 476451.Nov 18 2022, 7:32 AM

bcl5980 added inline comments.Nov 19 2022, 3:31 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8592	Maybe the ISD::ZERO_EXTEND can remain here?
llvm/test/CodeGen/AArch64/dag-combine-setcc.ll
224	Looks regression for this case?

bcl5980 added inline comments.Nov 20 2022, 10:37 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8633	It looks this can continue to be simplified to unsigned LogicOp = (Cond == ISD::SETEQ) ? ISD::AND : ISD::OR; SDValue Cmp = DAG.getSetCC(DL, VT, XOR0, XOR1, Cond); for (unsigned I = 1; I < WorkList.size(); I++) { std::tie(XOR0, XOR1) = WorkList[I]; SDValue CmpChain = DAG.getSetCC(DL, VT, XOR0, XOR1, Cond); Cmp = DAG.getNode(LogicOp, DL, VT, Cmp, CmpChain); } return Cmp; Looks more cases can get benefit from it.
llvm/test/CodeGen/AArch64/dag-combine-setcc.ll
224	This regression cmp x3, x10 part is a little tricky. It depends on the tree search sequence. If we search RHS first it will disappear. But I don't know how to fix it by a elegant way.

Allen added inline comments.Nov 21 2022, 6:26 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

8592

In the current implementation, we generate the AND and ANY_EXTEND sequence when , so the ISD::ZERO_EXTEND is not required?

8633

Yes, this change fix the case @PR58675, while regression on case combine_setcc_glue, so I'll need more work on it.

+++ b/llvm/test/CodeGen/AArch64/dag-combine-setcc.ll
@@ -191,9 +191,11 @@ define i32 @combine_setcc_glue(i128 noundef %x, i128 noundef %y) {
 ; CHECK-LABEL: combine_setcc_glue:
 ; CHECK:       // %bb.0: // %entry
 ; CHECK-NEXT:    cmp x1, x3
-; CHECK-NEXT:    ccmp x0, x2, #0, eq
-; CHECK-NEXT:    ccmp x0, x2, #4, ne
-; CHECK-NEXT:    cset w0, eq
+; CHECK-NEXT:    cset w8, eq
+; CHECK-NEXT:    cmp x0, x2
+; CHECK-NEXT:    cset w9, eq
+; CHECK-NEXT:    and w8, w9, w8
+; CHECK-NEXT:    orr w0, w8, w9
 ; CHECK-NEXT:    ret

benmxwl-arm added a subscriber: benmxwl-arm.Nov 21 2022, 6:47 AM

bcl5980 added inline comments.Nov 21 2022, 5:54 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8592	The case like that can trigger the zext part. https://www.godbolt.org/z/bPrT3G7bh And I think the cost is very low so we can add it.
8633	Don't worry about the combine_setcc_glue regression. D138401 already fixed that.

bcl5980 added inline comments.Nov 21 2022, 5:57 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8633	Sorry, the change fix combine_setcc_glue is D138398

rebase base on D138398

Allen removed a parent revision: D138398: [DAGCombiner] fold or (and x, y), x --> x.Nov 21 2022, 10:11 PM

Allen marked 5 inline comments as done.Nov 21 2022, 10:14 PM

Allen added inline comments.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8633	Thanks for the fixing, I rebase on that.

LGTM, but please waiting for @dmgreen or someone else to approve it.

This revision is now accepted and ready to land.Nov 21 2022, 10:48 PM

Harbormaster completed remote builds in B198900: Diff 477061.Nov 22 2022, 2:28 AM

Can you add tests for chains or or/xor with some different types. Maybe i8/i16/i128 (assuming we already have them for i32/i64). Then something strange like an i42. If we are allowing the transform in many more places then it would be good to test them.

Add 5 more cases with zero_extend and different types (i8/i16/i128/i42)

Harbormaster completed remote builds in B198965: Diff 477169.Nov 22 2022, 7:12 AM

dmgreen added inline comments.Nov 23 2022, 1:38 AM

llvm/test/CodeGen/AArch64/bcmp.ll
432	We don't usually put godbolt links in the source. I see the new code is a little larger. They wouldn't be if the inputs were loaded or known to be smaller already though. And they don't fail, which was the main point of adding them.

OK lets go with this patch. Its not quite perfect, but the gains are good to see and we can fix the other issues if they cause problems. If you remove the godbolt link then LGTM.

Remember to upload with context. thanks

Remove the godbolt as comment

Allen marked an inline comment as done.Nov 23 2022, 3:39 AM

Allen added inline comments.

llvm/test/CodeGen/AArch64/bcmp.ll
432	Deleted, thanks

This revision was landed with ongoing or failed builds.Nov 23 2022, 3:48 AM

Closed by commit rG3651bc83b6f2: [AArch64] Optimize cmp chain before legalization (authored by Allen). · Explain Why

This revision was automatically updated to reflect the committed changes.

Allen marked an inline comment as done.

Allen added a commit: rG3651bc83b6f2: [AArch64] Optimize cmp chain before legalization.

Harbormaster completed remote builds in B199160: Diff 477442.Nov 23 2022, 4:24 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

35 lines

test/

CodeGen/

AArch64/

178 lines

194 lines

16 lines

8 lines

Diff 477445

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,573 Lines • ▼ Show 20 Lines

// Check whether the continuous comparison sequence.		// Check whether the continuous comparison sequence.
static bool		static bool
isOrXorChain(SDValue N, unsigned &Num,		isOrXorChain(SDValue N, unsigned &Num,
SmallVector<std::pair<SDValue, SDValue>, 16> &WorkList) {		SmallVector<std::pair<SDValue, SDValue>, 16> &WorkList) {
if (Num == MaxXors)		if (Num == MaxXors)
return false;		return false;

		// Skip the one-use zext
		if (N->getOpcode() == ISD::ZERO_EXTEND && N->hasOneUse())
		N = N->getOperand(0);

// The leaf node must be XOR		// The leaf node must be XOR
if (N->getOpcode() == ISD::XOR) {		if (N->getOpcode() == ISD::XOR) {
WorkList.push_back(std::make_pair(N->getOperand(0), N->getOperand(1)));		WorkList.push_back(std::make_pair(N->getOperand(0), N->getOperand(1)));
Num++;		Num++;
return true;		return true;
}		}

		bcl5980Unsubmitted Done Reply Inline Actions Maybe the ISD::ZERO_EXTEND can remain here? bcl5980: Maybe the ISD::ZERO_EXTEND can remain here?
		AllenAuthorUnsubmitted Done Reply Inline Actions In the current implementation, we generate the AND and ANY_EXTEND sequence when , so the ISD::ZERO_EXTEND is not required? Allen: In the current implementation, we generate the AND and ANY_EXTEND sequence when , so the ISD…
		bcl5980Unsubmitted Done Reply Inline Actions The case like that can trigger the zext part. https://www.godbolt.org/z/bPrT3G7bh And I think the cost is very low so we can add it. bcl5980: The case like that can trigger the zext part. https://www.godbolt.org/z/bPrT3G7bh And I think…
// All the non-leaf nodes must be OR.		// All the non-leaf nodes must be OR.
if (N->getOpcode() != ISD::OR \|\| !N->hasOneUse())		if (N->getOpcode() != ISD::OR \|\| !N->hasOneUse())
return false;		return false;

if (isOrXorChain(N->getOperand(0), Num, WorkList) &&		if (isOrXorChain(N->getOperand(0), Num, WorkList) &&
isOrXorChain(N->getOperand(1), Num, WorkList))		isOrXorChain(N->getOperand(1), Num, WorkList))
return true;		return true;
return false;		return false;
Show All 13 Lines	static SDValue performOrXorChainCombine(SDNode *N, SelectionDAG &DAG) {

ISD::CondCode Cond = cast<CondCodeSDNode>(N->getOperand(2))->get();		ISD::CondCode Cond = cast<CondCodeSDNode>(N->getOperand(2))->get();
// Try to express conjunction "cmp 0 (or (xor A0 A1) (xor B0 B1))" as:		// Try to express conjunction "cmp 0 (or (xor A0 A1) (xor B0 B1))" as:
// sub A0, A1; ccmp B0, B1, 0, eq; cmp inv(Cond) flag		// sub A0, A1; ccmp B0, B1, 0, eq; cmp inv(Cond) flag
unsigned NumXors = 0;		unsigned NumXors = 0;
if ((Cond == ISD::SETEQ \|\| Cond == ISD::SETNE) && isNullConstant(RHS) &&		if ((Cond == ISD::SETEQ \|\| Cond == ISD::SETNE) && isNullConstant(RHS) &&
LHS->getOpcode() == ISD::OR && LHS->hasOneUse() &&		LHS->getOpcode() == ISD::OR && LHS->hasOneUse() &&
isOrXorChain(LHS, NumXors, WorkList)) {		isOrXorChain(LHS, NumXors, WorkList)) {
SDValue CCVal = DAG.getConstant(AArch64CC::EQ, DL, MVT_CC);
EVT TstVT = LHS->getValueType(0);
SDValue XOR0, XOR1;		SDValue XOR0, XOR1;
std::tie(XOR0, XOR1) = WorkList[0];		std::tie(XOR0, XOR1) = WorkList[0];
SDValue Cmp = DAG.getNode(AArch64ISD::SUBS, DL,		unsigned LogicOp = (Cond == ISD::SETEQ) ? ISD::AND : ISD::OR;
DAG.getVTList(TstVT, MVT::i32), XOR0, XOR1);		SDValue Cmp = DAG.getSetCC(DL, VT, XOR0, XOR1, Cond);
SDValue Overflow = Cmp.getValue(1);
SDValue CCmp;
for (unsigned I = 1; I < WorkList.size(); I++) {		for (unsigned I = 1; I < WorkList.size(); I++) {
std::tie(XOR0, XOR1) = WorkList[I];		std::tie(XOR0, XOR1) = WorkList[I];
SDValue NZCVOp = DAG.getConstant(0, DL, MVT::i32);		SDValue CmpChain = DAG.getSetCC(DL, VT, XOR0, XOR1, Cond);
CCmp = DAG.getNode(AArch64ISD::CCMP, DL, MVT_CC, XOR0, XOR1, NZCVOp,		Cmp = DAG.getNode(LogicOp, DL, VT, Cmp, CmpChain);
CCVal, Overflow);
Overflow = CCmp;
}		}

// Exit early by inverting the condition, which help reduce indentations.		// Exit early by inverting the condition, which help reduce indentations.
SDValue TVal = DAG.getConstant(1, DL, VT);		return Cmp;
		bcl5980Unsubmitted Done Reply Inline Actions It looks this can continue to be simplified to unsigned LogicOp = (Cond == ISD::SETEQ) ? ISD::AND : ISD::OR; SDValue Cmp = DAG.getSetCC(DL, VT, XOR0, XOR1, Cond); for (unsigned I = 1; I < WorkList.size(); I++) { std::tie(XOR0, XOR1) = WorkList[I]; SDValue CmpChain = DAG.getSetCC(DL, VT, XOR0, XOR1, Cond); Cmp = DAG.getNode(LogicOp, DL, VT, Cmp, CmpChain); } return Cmp; Looks more cases can get benefit from it. bcl5980: It looks this can continue to be simplified to ``` unsigned LogicOp = (Cond == ISD…
		AllenAuthorUnsubmitted Done Reply Inline Actions Yes, this change fix the case @PR58675, while regression on case combine_setcc_glue, so I'll need more work on it. +++ b/llvm/test/CodeGen/AArch64/dag-combine-setcc.ll @@ -191,9 +191,11 @@ define i32 @combine_setcc_glue(i128 noundef %x, i128 noundef %y) { ; CHECK-LABEL: combine_setcc_glue: ; CHECK: // %bb.0: // %entry ; CHECK-NEXT: cmp x1, x3 -; CHECK-NEXT: ccmp x0, x2, #0, eq -; CHECK-NEXT: ccmp x0, x2, #4, ne -; CHECK-NEXT: cset w0, eq +; CHECK-NEXT: cset w8, eq +; CHECK-NEXT: cmp x0, x2 +; CHECK-NEXT: cset w9, eq +; CHECK-NEXT: and w8, w9, w8 +; CHECK-NEXT: orr w0, w8, w9 ; CHECK-NEXT: ret Allen: Yes, this change fix the case @PR58675, while regression on case combine_setcc_glue, so I'll…
		bcl5980Unsubmitted Done Reply Inline Actions Don't worry about the combine_setcc_glue regression. D138401 already fixed that. bcl5980: Don't worry about the combine_setcc_glue regression. D138401 already fixed that.
		bcl5980Unsubmitted Done Reply Inline Actions Sorry, the change fix combine_setcc_glue is D138398 bcl5980: Sorry, the change fix combine_setcc_glue is D138398
		AllenAuthorUnsubmitted Done Reply Inline Actions Thanks for the fixing, I rebase on that. Allen: Thanks for the fixing, I rebase on that.
SDValue FVal = DAG.getConstant(0, DL, VT);
AArch64CC::CondCode CC = changeIntCCToAArch64CC(Cond);
AArch64CC::CondCode InvCC = AArch64CC::getInvertedCondCode(CC);
return DAG.getNode(AArch64ISD::CSEL, DL, VT, FVal, TVal,
DAG.getConstant(InvCC, DL, MVT::i32), CCmp);
}		}

return SDValue();		return SDValue();
}		}

SDValue AArch64TargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const {		SDValue AArch64TargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const {

if (Op.getValueType().isVector())		if (Op.getValueType().isVector())
Show All 24 Lines	if (LHS.getValueType() == MVT::f128) {
// If softenSetCCOperands returned a scalar, use it.		// If softenSetCCOperands returned a scalar, use it.
if (!RHS.getNode()) {		if (!RHS.getNode()) {
assert(LHS.getValueType() == Op.getValueType() &&		assert(LHS.getValueType() == Op.getValueType() &&
"Unexpected setcc expansion!");		"Unexpected setcc expansion!");
return IsStrict ? DAG.getMergeValues({LHS, Chain}, dl) : LHS;		return IsStrict ? DAG.getMergeValues({LHS, Chain}, dl) : LHS;
}		}
}		}

// Address some cases folded And in the stage of `Optimized type-legalized
// selection`
if (SDValue V = performOrXorChainCombine(Op.getNode(), DAG))
return V;

if (LHS.getValueType().isInteger()) {		if (LHS.getValueType().isInteger()) {
SDValue CCVal;		SDValue CCVal;
SDValue Cmp = getAArch64Cmp(		SDValue Cmp = getAArch64Cmp(
LHS, RHS, ISD::getSetCCInverse(CC, LHS.getValueType()), CCVal, DAG, dl);		LHS, RHS, ISD::getSetCCInverse(CC, LHS.getValueType()), CCVal, DAG, dl);

// Note that we inverted the condition above, so we reverse the order of		// Note that we inverted the condition above, so we reverse the order of
// the true and false operands here. This will allow the setcc to be		// the true and false operands here. This will allow the setcc to be
// matched to a single CSINC instruction.		// matched to a single CSINC instruction.
▲ Show 20 Lines • Show All 11,056 Lines • ▼ Show 20 Lines	if (FromVT.isFixedLengthVector() &&
FromVT.getVectorElementType() == MVT::i1) {		FromVT.getVectorElementType() == MVT::i1) {
LHS = DAG.getNode(ISD::VECREDUCE_OR, DL, MVT::i1, LHS->getOperand(0));		LHS = DAG.getNode(ISD::VECREDUCE_OR, DL, MVT::i1, LHS->getOperand(0));
LHS = DAG.getNode(ISD::ZERO_EXTEND, DL, ToVT, LHS);		LHS = DAG.getNode(ISD::ZERO_EXTEND, DL, ToVT, LHS);
return DAG.getSetCC(DL, VT, LHS, RHS, Cond);		return DAG.getSetCC(DL, VT, LHS, RHS, Cond);
}		}
}		}

// Try to perform the memcmp when the result is tested for [in]equality with 0		// Try to perform the memcmp when the result is tested for [in]equality with 0
if (!DCI.isBeforeLegalize())
if (SDValue V = performOrXorChainCombine(N, DAG))		if (SDValue V = performOrXorChainCombine(N, DAG))
return V;		return V;

return SDValue();		return SDValue();
}		}

// Replace a flag-setting operator (eg ANDS) with the generic version		// Replace a flag-setting operator (eg ANDS) with the generic version
// (eg AND) if the flag is unused.		// (eg AND) if the flag is unused.
static SDValue performFlagSettingCombine(SDNode *N,		static SDValue performFlagSettingCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
▲ Show 20 Lines • Show All 3,627 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/atomicrmw-O0.ll

	Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines
	; NOLSE-NEXT: ldr x8, [x0, #8]			; NOLSE-NEXT: ldr x8, [x0, #8]
	; NOLSE-NEXT: ldr x9, [x0]			; NOLSE-NEXT: ldr x9, [x0]
	; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill			; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; NOLSE-NEXT: b .LBB4_1			; NOLSE-NEXT: b .LBB4_1
	; NOLSE-NEXT: .LBB4_1: // %atomicrmw.start			; NOLSE-NEXT: .LBB4_1: // %atomicrmw.start
	; NOLSE-NEXT: // =>This Loop Header: Depth=1			; NOLSE-NEXT: // =>This Loop Header: Depth=1
	; NOLSE-NEXT: // Child Loop BB4_2 Depth 2			; NOLSE-NEXT: // Child Loop BB4_2 Depth 2
	; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x10, [sp, #24] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
	; NOLSE-NEXT: adds x14, x13, #1			; NOLSE-NEXT: adds x14, x11, #1
	; NOLSE-NEXT: cinc x15, x11, hs			; NOLSE-NEXT: cinc x15, x13, hs
	; NOLSE-NEXT: .LBB4_2: // %atomicrmw.start			; NOLSE-NEXT: .LBB4_2: // %atomicrmw.start
	; NOLSE-NEXT: // Parent Loop BB4_1 Depth=1			; NOLSE-NEXT: // Parent Loop BB4_1 Depth=1
	; NOLSE-NEXT: // => This Inner Loop Header: Depth=2			; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
	; NOLSE-NEXT: ldaxp x12, x8, [x10]			; NOLSE-NEXT: ldaxp x10, x12, [x9]
				; NOLSE-NEXT: cmp x10, x11
				; NOLSE-NEXT: cset w8, ne
	; NOLSE-NEXT: cmp x12, x13			; NOLSE-NEXT: cmp x12, x13
	; NOLSE-NEXT: cset w9, ne			; NOLSE-NEXT: cinc w8, w8, ne
	; NOLSE-NEXT: cmp x8, x11			; NOLSE-NEXT: cbnz w8, .LBB4_4
	; NOLSE-NEXT: cinc w9, w9, ne
	; NOLSE-NEXT: cbnz w9, .LBB4_4
	; NOLSE-NEXT: // %bb.3: // %atomicrmw.start			; NOLSE-NEXT: // %bb.3: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB4_2 Depth=2			; NOLSE-NEXT: // in Loop: Header=BB4_2 Depth=2
	; NOLSE-NEXT: stlxp w9, x14, x15, [x10]			; NOLSE-NEXT: stlxp w8, x14, x15, [x9]
	; NOLSE-NEXT: cbnz w9, .LBB4_2			; NOLSE-NEXT: cbnz w8, .LBB4_2
	; NOLSE-NEXT: b .LBB4_5			; NOLSE-NEXT: b .LBB4_5
	; NOLSE-NEXT: .LBB4_4: // %atomicrmw.start			; NOLSE-NEXT: .LBB4_4: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB4_2 Depth=2			; NOLSE-NEXT: // in Loop: Header=BB4_2 Depth=2
	; NOLSE-NEXT: stlxp w9, x12, x8, [x10]			; NOLSE-NEXT: stlxp w8, x10, x12, [x9]
	; NOLSE-NEXT: cbnz w9, .LBB4_2			; NOLSE-NEXT: cbnz w8, .LBB4_2
	; NOLSE-NEXT: .LBB4_5: // %atomicrmw.start			; NOLSE-NEXT: .LBB4_5: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB4_1 Depth=1			; NOLSE-NEXT: // in Loop: Header=BB4_1 Depth=1
	; NOLSE-NEXT: mov x9, x8			; NOLSE-NEXT: mov x8, x12
	; NOLSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill			; NOLSE-NEXT: str x8, [sp, #8] // 8-byte Folded Spill
	; NOLSE-NEXT: mov x10, x12			; NOLSE-NEXT: mov x9, x10
	; NOLSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
	; NOLSE-NEXT: subs x12, x12, x13			; NOLSE-NEXT: subs x12, x12, x13
	; NOLSE-NEXT: ccmp x8, x11, #0, eq			; NOLSE-NEXT: ccmp x10, x11, #0, eq
	; NOLSE-NEXT: cset w8, ne			; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; NOLSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill			; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill			; NOLSE-NEXT: b.ne .LBB4_1
	; NOLSE-NEXT: tbnz w8, #0, .LBB4_1
	; NOLSE-NEXT: b .LBB4_6			; NOLSE-NEXT: b .LBB4_6
	; NOLSE-NEXT: .LBB4_6: // %atomicrmw.end			; NOLSE-NEXT: .LBB4_6: // %atomicrmw.end
	; NOLSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload
	; NOLSE-NEXT: add sp, sp, #48			; NOLSE-NEXT: add sp, sp, #48
	; NOLSE-NEXT: ret			; NOLSE-NEXT: ret
	;			;
	; LSE-LABEL: test_rmw_add_128:			; LSE-LABEL: test_rmw_add_128:
	; LSE: // %bb.0: // %entry			; LSE: // %bb.0: // %entry
	; LSE-NEXT: sub sp, sp, #48			; LSE-NEXT: sub sp, sp, #48
	; LSE-NEXT: .cfi_def_cfa_offset 48			; LSE-NEXT: .cfi_def_cfa_offset 48
	; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill			; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
	; LSE-NEXT: ldr x8, [x0, #8]			; LSE-NEXT: ldr x8, [x0, #8]
	; LSE-NEXT: ldr x9, [x0]			; LSE-NEXT: ldr x9, [x0]
	; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill			; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; LSE-NEXT: b .LBB4_1			; LSE-NEXT: b .LBB4_1
	; LSE-NEXT: .LBB4_1: // %atomicrmw.start			; LSE-NEXT: .LBB4_1: // %atomicrmw.start
	; LSE-NEXT: // =>This Inner Loop Header: Depth=1			; LSE-NEXT: // =>This Inner Loop Header: Depth=1
	; LSE-NEXT: ldr x8, [sp, #40] // 8-byte Folded Reload			; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
	; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload			; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
	; LSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload			; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
	; LSE-NEXT: mov x0, x11			; LSE-NEXT: mov x0, x10
	; LSE-NEXT: mov x1, x8			; LSE-NEXT: mov x1, x11
	; LSE-NEXT: adds x2, x11, #1			; LSE-NEXT: adds x2, x10, #1
	; LSE-NEXT: cinc x10, x8, hs			; LSE-NEXT: cinc x9, x11, hs
	; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3			; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3
	; LSE-NEXT: mov x3, x10			; LSE-NEXT: mov x3, x9
	; LSE-NEXT: caspal x0, x1, x2, x3, [x9]			; LSE-NEXT: caspal x0, x1, x2, x3, [x8]
	; LSE-NEXT: mov x9, x1			; LSE-NEXT: mov x9, x0
	; LSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill
	; LSE-NEXT: mov x10, x0			; LSE-NEXT: mov x8, x1
	; LSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill			; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
	; LSE-NEXT: subs x11, x10, x11			; LSE-NEXT: subs x11, x8, x11
	; LSE-NEXT: ccmp x9, x8, #0, eq			; LSE-NEXT: ccmp x9, x10, #0, eq
	; LSE-NEXT: cset w8, ne			; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; LSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill			; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill			; LSE-NEXT: b.ne .LBB4_1
	; LSE-NEXT: tbnz w8, #0, .LBB4_1
	; LSE-NEXT: b .LBB4_2			; LSE-NEXT: b .LBB4_2
	; LSE-NEXT: .LBB4_2: // %atomicrmw.end			; LSE-NEXT: .LBB4_2: // %atomicrmw.end
	; LSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload			; LSE-NEXT: ldr x1, [sp, #16] // 8-byte Folded Reload
	; LSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload			; LSE-NEXT: ldr x0, [sp, #8] // 8-byte Folded Reload
	; LSE-NEXT: add sp, sp, #48			; LSE-NEXT: add sp, sp, #48
	; LSE-NEXT: ret			; LSE-NEXT: ret
	entry:			entry:
	%res = atomicrmw add i128* %dst, i128 1 seq_cst			%res = atomicrmw add i128* %dst, i128 1 seq_cst
	ret i128 %res			ret i128 %res
	}			}
	define i8 @test_rmw_nand_8(i8* %dst) {			define i8 @test_rmw_nand_8(i8* %dst) {
	; NOLSE-LABEL: test_rmw_nand_8:			; NOLSE-LABEL: test_rmw_nand_8:
	▲ Show 20 Lines • Show All 298 Lines • ▼ Show 20 Lines
	; NOLSE-NEXT: ldr x8, [x0, #8]			; NOLSE-NEXT: ldr x8, [x0, #8]
	; NOLSE-NEXT: ldr x9, [x0]			; NOLSE-NEXT: ldr x9, [x0]
	; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill			; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; NOLSE-NEXT: b .LBB9_1			; NOLSE-NEXT: b .LBB9_1
	; NOLSE-NEXT: .LBB9_1: // %atomicrmw.start			; NOLSE-NEXT: .LBB9_1: // %atomicrmw.start
	; NOLSE-NEXT: // =>This Loop Header: Depth=1			; NOLSE-NEXT: // =>This Loop Header: Depth=1
	; NOLSE-NEXT: // Child Loop BB9_2 Depth 2			; NOLSE-NEXT: // Child Loop BB9_2 Depth 2
	; NOLSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x13, [sp, #40] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x13, [sp, #32] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x10, [sp, #24] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload
	; NOLSE-NEXT: mov w8, w13			; NOLSE-NEXT: mov w8, w11
	; NOLSE-NEXT: mvn w9, w8			; NOLSE-NEXT: mvn w10, w8
	; NOLSE-NEXT: // implicit-def: $x8			; NOLSE-NEXT: // implicit-def: $x8
	; NOLSE-NEXT: mov w8, w9			; NOLSE-NEXT: mov w8, w10
	; NOLSE-NEXT: orr x14, x8, #0xfffffffffffffffe			; NOLSE-NEXT: orr x14, x8, #0xfffffffffffffffe
	; NOLSE-NEXT: mov x15, #-1			; NOLSE-NEXT: mov x15, #-1
	; NOLSE-NEXT: .LBB9_2: // %atomicrmw.start			; NOLSE-NEXT: .LBB9_2: // %atomicrmw.start
	; NOLSE-NEXT: // Parent Loop BB9_1 Depth=1			; NOLSE-NEXT: // Parent Loop BB9_1 Depth=1
	; NOLSE-NEXT: // => This Inner Loop Header: Depth=2			; NOLSE-NEXT: // => This Inner Loop Header: Depth=2
	; NOLSE-NEXT: ldaxp x12, x8, [x10]			; NOLSE-NEXT: ldaxp x10, x12, [x9]
				; NOLSE-NEXT: cmp x10, x11
				; NOLSE-NEXT: cset w8, ne
	; NOLSE-NEXT: cmp x12, x13			; NOLSE-NEXT: cmp x12, x13
	; NOLSE-NEXT: cset w9, ne			; NOLSE-NEXT: cinc w8, w8, ne
	; NOLSE-NEXT: cmp x8, x11			; NOLSE-NEXT: cbnz w8, .LBB9_4
	; NOLSE-NEXT: cinc w9, w9, ne
	; NOLSE-NEXT: cbnz w9, .LBB9_4
	; NOLSE-NEXT: // %bb.3: // %atomicrmw.start			; NOLSE-NEXT: // %bb.3: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB9_2 Depth=2			; NOLSE-NEXT: // in Loop: Header=BB9_2 Depth=2
	; NOLSE-NEXT: stlxp w9, x14, x15, [x10]			; NOLSE-NEXT: stlxp w8, x14, x15, [x9]
	; NOLSE-NEXT: cbnz w9, .LBB9_2			; NOLSE-NEXT: cbnz w8, .LBB9_2
	; NOLSE-NEXT: b .LBB9_5			; NOLSE-NEXT: b .LBB9_5
	; NOLSE-NEXT: .LBB9_4: // %atomicrmw.start			; NOLSE-NEXT: .LBB9_4: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB9_2 Depth=2			; NOLSE-NEXT: // in Loop: Header=BB9_2 Depth=2
	; NOLSE-NEXT: stlxp w9, x12, x8, [x10]			; NOLSE-NEXT: stlxp w8, x10, x12, [x9]
	; NOLSE-NEXT: cbnz w9, .LBB9_2			; NOLSE-NEXT: cbnz w8, .LBB9_2
	; NOLSE-NEXT: .LBB9_5: // %atomicrmw.start			; NOLSE-NEXT: .LBB9_5: // %atomicrmw.start
	; NOLSE-NEXT: // in Loop: Header=BB9_1 Depth=1			; NOLSE-NEXT: // in Loop: Header=BB9_1 Depth=1
	; NOLSE-NEXT: mov x9, x8			; NOLSE-NEXT: mov x8, x12
	; NOLSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill			; NOLSE-NEXT: str x8, [sp, #8] // 8-byte Folded Spill
	; NOLSE-NEXT: mov x10, x12			; NOLSE-NEXT: mov x9, x10
	; NOLSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill			; NOLSE-NEXT: str x9, [sp, #16] // 8-byte Folded Spill
	; NOLSE-NEXT: subs x12, x12, x13			; NOLSE-NEXT: subs x12, x12, x13
	; NOLSE-NEXT: ccmp x8, x11, #0, eq			; NOLSE-NEXT: ccmp x10, x11, #0, eq
	; NOLSE-NEXT: cset w8, ne			; NOLSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; NOLSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill			; NOLSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; NOLSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill			; NOLSE-NEXT: b.ne .LBB9_1
	; NOLSE-NEXT: tbnz w8, #0, .LBB9_1
	; NOLSE-NEXT: b .LBB9_6			; NOLSE-NEXT: b .LBB9_6
	; NOLSE-NEXT: .LBB9_6: // %atomicrmw.end			; NOLSE-NEXT: .LBB9_6: // %atomicrmw.end
	; NOLSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload
	; NOLSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload			; NOLSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload
	; NOLSE-NEXT: add sp, sp, #48			; NOLSE-NEXT: add sp, sp, #48
	; NOLSE-NEXT: ret			; NOLSE-NEXT: ret
	;			;
	; LSE-LABEL: test_rmw_nand_128:			; LSE-LABEL: test_rmw_nand_128:
	; LSE: // %bb.0: // %entry			; LSE: // %bb.0: // %entry
	; LSE-NEXT: sub sp, sp, #48			; LSE-NEXT: sub sp, sp, #48
	; LSE-NEXT: .cfi_def_cfa_offset 48			; LSE-NEXT: .cfi_def_cfa_offset 48
	; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill			; LSE-NEXT: str x0, [sp, #24] // 8-byte Folded Spill
	; LSE-NEXT: ldr x8, [x0, #8]			; LSE-NEXT: ldr x8, [x0, #8]
	; LSE-NEXT: ldr x9, [x0]			; LSE-NEXT: ldr x9, [x0]
	; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill			; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; LSE-NEXT: b .LBB9_1			; LSE-NEXT: b .LBB9_1
	; LSE-NEXT: .LBB9_1: // %atomicrmw.start			; LSE-NEXT: .LBB9_1: // %atomicrmw.start
	; LSE-NEXT: // =>This Inner Loop Header: Depth=1			; LSE-NEXT: // =>This Inner Loop Header: Depth=1
	; LSE-NEXT: ldr x8, [sp, #40] // 8-byte Folded Reload			; LSE-NEXT: ldr x11, [sp, #40] // 8-byte Folded Reload
	; LSE-NEXT: ldr x11, [sp, #32] // 8-byte Folded Reload			; LSE-NEXT: ldr x10, [sp, #32] // 8-byte Folded Reload
	; LSE-NEXT: ldr x9, [sp, #24] // 8-byte Folded Reload			; LSE-NEXT: ldr x8, [sp, #24] // 8-byte Folded Reload
	; LSE-NEXT: mov x0, x11			; LSE-NEXT: mov x0, x10
	; LSE-NEXT: mov x1, x8			; LSE-NEXT: mov x1, x11
	; LSE-NEXT: mov w10, w11			; LSE-NEXT: mov w9, w10
	; LSE-NEXT: mvn w12, w10			; LSE-NEXT: mvn w12, w9
	; LSE-NEXT: // implicit-def: $x10			; LSE-NEXT: // implicit-def: $x9
	; LSE-NEXT: mov w10, w12			; LSE-NEXT: mov w9, w12
	; LSE-NEXT: orr x2, x10, #0xfffffffffffffffe			; LSE-NEXT: orr x2, x9, #0xfffffffffffffffe
	; LSE-NEXT: mov x10, #-1			; LSE-NEXT: mov x9, #-1
	; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3			; LSE-NEXT: // kill: def $x2 killed $x2 def $x2_x3
	; LSE-NEXT: mov x3, x10			; LSE-NEXT: mov x3, x9
	; LSE-NEXT: caspal x0, x1, x2, x3, [x9]			; LSE-NEXT: caspal x0, x1, x2, x3, [x8]
	; LSE-NEXT: mov x9, x1			; LSE-NEXT: mov x9, x0
	; LSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill			; LSE-NEXT: str x9, [sp, #8] // 8-byte Folded Spill
	; LSE-NEXT: mov x10, x0			; LSE-NEXT: mov x8, x1
	; LSE-NEXT: str x10, [sp, #16] // 8-byte Folded Spill			; LSE-NEXT: str x8, [sp, #16] // 8-byte Folded Spill
	; LSE-NEXT: subs x11, x10, x11			; LSE-NEXT: subs x11, x8, x11
	; LSE-NEXT: ccmp x9, x8, #0, eq			; LSE-NEXT: ccmp x9, x10, #0, eq
	; LSE-NEXT: cset w8, ne			; LSE-NEXT: str x9, [sp, #32] // 8-byte Folded Spill
	; LSE-NEXT: str x10, [sp, #32] // 8-byte Folded Spill			; LSE-NEXT: str x8, [sp, #40] // 8-byte Folded Spill
	; LSE-NEXT: str x9, [sp, #40] // 8-byte Folded Spill			; LSE-NEXT: b.ne .LBB9_1
	; LSE-NEXT: tbnz w8, #0, .LBB9_1
	; LSE-NEXT: b .LBB9_2			; LSE-NEXT: b .LBB9_2
	; LSE-NEXT: .LBB9_2: // %atomicrmw.end			; LSE-NEXT: .LBB9_2: // %atomicrmw.end
	; LSE-NEXT: ldr x1, [sp, #8] // 8-byte Folded Reload			; LSE-NEXT: ldr x1, [sp, #16] // 8-byte Folded Reload
	; LSE-NEXT: ldr x0, [sp, #16] // 8-byte Folded Reload			; LSE-NEXT: ldr x0, [sp, #8] // 8-byte Folded Reload
	; LSE-NEXT: add sp, sp, #48			; LSE-NEXT: add sp, sp, #48
	; LSE-NEXT: ret			; LSE-NEXT: ret
	entry:			entry:
	%res = atomicrmw nand i128* %dst, i128 1 seq_cst			%res = atomicrmw nand i128* %dst, i128 1 seq_cst
	ret i128 %res			ret i128 %res
	}			}

llvm/test/CodeGen/AArch64/bcmp.ll

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: cmp x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 8)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 8)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	; TODO: or (xor a, b), (and (xor c, d), C2)			; or (xor a, b), (and (xor c, d), C2)
	define i1 @bcmp9(ptr %a, ptr %b) {			define i1 @bcmp9(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp9:			; CHECK-LABEL: bcmp9:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w9, [x0, #8]
	; CHECK-NEXT: ldrb w10, [x1, #8]
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x11, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: eor w9, w9, w10			; CHECK-NEXT: ldrb w10, [x0, #8]
	; CHECK-NEXT: and x9, x9, #0xff			; CHECK-NEXT: ldrb w11, [x1, #8]
	; CHECK-NEXT: eor x8, x8, x11			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 9)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 9)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp10(ptr %a, ptr %b) {			define i1 @bcmp10(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp10:			; CHECK-LABEL: bcmp10:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrh w9, [x0, #8]
	; CHECK-NEXT: ldrh w10, [x1, #8]
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x11, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: eor w9, w9, w10			; CHECK-NEXT: ldrh w10, [x0, #8]
	; CHECK-NEXT: and x9, x9, #0xffff			; CHECK-NEXT: ldrh w11, [x1, #8]
	; CHECK-NEXT: eor x8, x8, x11			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 10)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 10)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp11(ptr %a, ptr %b) {			define i1 @bcmp11(ptr %a, ptr %b) {
	Show All 14 Lines

	define i1 @bcmp12(ptr %a, ptr %b) {			define i1 @bcmp12(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp12:			; CHECK-LABEL: bcmp12:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldr x9, [x1]			; CHECK-NEXT: ldr x9, [x1]
	; CHECK-NEXT: ldr w10, [x0, #8]			; CHECK-NEXT: ldr w10, [x0, #8]
	; CHECK-NEXT: ldr w11, [x1, #8]			; CHECK-NEXT: ldr w11, [x1, #8]
	; CHECK-NEXT: eor x8, x8, x9			; CHECK-NEXT: cmp x8, x9
	; CHECK-NEXT: eor w9, w10, w11			; CHECK-NEXT: ccmp x10, x11, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 12)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 12)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp13(ptr %a, ptr %b) {			define i1 @bcmp13(ptr %a, ptr %b) {
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	}			}

	define i1 @bcmp20(ptr %a, ptr %b) {			define i1 @bcmp20(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp20:			; CHECK-LABEL: bcmp20:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: ldr w12, [x0, #16]			; CHECK-NEXT: ldr w12, [x0, #16]
	; CHECK-NEXT: ldr w13, [x1, #16]			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: ldr w8, [x1, #16]
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: eor w10, w12, w13			; CHECK-NEXT: ccmp x12, x8, #0, eq
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: orr x8, x8, x10
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 20)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 20)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp24(ptr %a, ptr %b) {			define i1 @bcmp24(ptr %a, ptr %b) {
	Show All 14 Lines
	}			}

	define i1 @bcmp28(ptr %a, ptr %b) {			define i1 @bcmp28(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp28:			; CHECK-LABEL: bcmp28:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: ldr x12, [x0, #16]			; CHECK-NEXT: ldr x12, [x0, #16]
	; CHECK-NEXT: ldr x13, [x1, #16]			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: ldr w14, [x0, #24]			; CHECK-NEXT: ldr x8, [x1, #16]
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: ldr w15, [x1, #24]			; CHECK-NEXT: ldr w9, [x0, #24]
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ldr w10, [x1, #24]
	; CHECK-NEXT: eor x10, x12, x13			; CHECK-NEXT: ccmp x12, x8, #0, eq
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ccmp x9, x10, #0, eq
	; CHECK-NEXT: eor w11, w14, w15
	; CHECK-NEXT: orr x9, x10, x11
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 28)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 28)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp33(ptr %a, ptr %b) {			define i1 @bcmp33(ptr %a, ptr %b) {
	; CHECK-LABEL: bcmp33:			; CHECK-LABEL: bcmp33:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x9, [x0]			; CHECK-NEXT: ldp x8, x9, [x0]
	; CHECK-NEXT: ldp x10, x11, [x1]			; CHECK-NEXT: ldp x10, x11, [x1]
	; CHECK-NEXT: ldp x12, x13, [x0, #16]			; CHECK-NEXT: cmp x8, x10
	; CHECK-NEXT: ldp x14, x15, [x1, #16]			; CHECK-NEXT: ccmp x9, x11, #0, eq
	; CHECK-NEXT: eor x8, x8, x10			; CHECK-NEXT: ldrb w11, [x1, #32]
	; CHECK-NEXT: eor x9, x9, x11			; CHECK-NEXT: ldp x8, x9, [x0, #16]
	; CHECK-NEXT: ldrb w16, [x0, #32]			; CHECK-NEXT: ldp x12, x10, [x1, #16]
	; CHECK-NEXT: orr x8, x8, x9			; CHECK-NEXT: ccmp x8, x12, #0, eq
	; CHECK-NEXT: ldrb w17, [x1, #32]			; CHECK-NEXT: ldrb w8, [x0, #32]
	; CHECK-NEXT: eor x10, x12, x14			; CHECK-NEXT: ccmp x9, x10, #0, eq
	; CHECK-NEXT: eor x11, x13, x15			; CHECK-NEXT: ccmp x8, x11, #0, eq
	; CHECK-NEXT: eor w12, w16, w17
	; CHECK-NEXT: orr x9, x10, x11
	; CHECK-NEXT: and x10, x12, #0xff
	; CHECK-NEXT: orr x8, x8, x9
	; CHECK-NEXT: orr x8, x8, x10
	; CHECK-NEXT: cmp x8, #0
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 33)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 33)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

	define i1 @bcmp38(ptr %a, ptr %b) {			define i1 @bcmp38(ptr %a, ptr %b) {
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: cset w0, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cr = call i32 @bcmp(ptr %a, ptr %b, i64 89)			%cr = call i32 @bcmp(ptr %a, ptr %b, i64 89)
	%r = icmp eq i32 %cr, 0			%r = icmp eq i32 %cr, 0
	ret i1 %r			ret i1 %r
	}			}

				define i1 @bcmp_zext(i32 %0, i32 %1, i8 %2, i8 %3) {
				dmgreenUnsubmitted Done Reply Inline Actions We don't usually put godbolt links in the source. I see the new code is a little larger. They wouldn't be if the inputs were loaded or known to be smaller already though. And they don't fail, which was the main point of adding them. dmgreen: We don't usually put godbolt links in the source. I see the new code is a little larger. They…
				AllenAuthorUnsubmitted Done Reply Inline Actions Deleted, thanks Allen: Deleted, thanks
				; CHECK-LABEL: bcmp_zext:
				; CHECK: // %bb.0:
				; CHECK-NEXT: and w8, w2, #0xff
				; CHECK-NEXT: and w9, w3, #0xff
				; CHECK-NEXT: cmp w1, w0
				; CHECK-NEXT: ccmp w9, w8, #0, eq
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%5 = xor i32 %1, %0
				%6 = xor i8 %3, %2
				%7 = zext i8 %6 to i32
				%8 = or i32 %5, %7
				%9 = icmp eq i32 %8, 0
				ret i1 %9
				}

				define i1 @bcmp_i8(i8 %a0, i8 %b0, i8 %a1, i8 %b1, i8 %a2, i8 %b2) {
				; CHECK-LABEL: bcmp_i8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: and w9, w1, #0xff
				; CHECK-NEXT: and w8, w2, #0xff
				; CHECK-NEXT: and w10, w3, #0xff
				; CHECK-NEXT: cmp w9, w0, uxtb
				; CHECK-NEXT: ccmp w10, w8, #0, eq
				; CHECK-NEXT: and w8, w4, #0xff
				; CHECK-NEXT: and w9, w5, #0xff
				; CHECK-NEXT: ccmp w9, w8, #0, eq
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%xor0 = xor i8 %b0, %a0
				%xor1 = xor i8 %b1, %a1
				%xor2 = xor i8 %b2, %a2
				%or0 = or i8 %xor0, %xor1
				%or1 = or i8 %or0, %xor2
				%r = icmp eq i8 %or1, 0
				ret i1 %r
				}

				define i1 @bcmp_i16(i16 %a0, i16 %b0, i16 %a1, i16 %b1, i16 %a2, i16 %b2) {
				; CHECK-LABEL: bcmp_i16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: and w9, w1, #0xffff
				; CHECK-NEXT: and w8, w2, #0xffff
				; CHECK-NEXT: and w10, w3, #0xffff
				; CHECK-NEXT: cmp w9, w0, uxth
				; CHECK-NEXT: ccmp w10, w8, #0, eq
				; CHECK-NEXT: and w8, w4, #0xffff
				; CHECK-NEXT: and w9, w5, #0xffff
				; CHECK-NEXT: ccmp w9, w8, #0, eq
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%xor0 = xor i16 %b0, %a0
				%xor1 = xor i16 %b1, %a1
				%xor2 = xor i16 %b2, %a2
				%or0 = or i16 %xor0, %xor1
				%or1 = or i16 %or0, %xor2
				%r = icmp eq i16 %or1, 0
				ret i1 %r
				}

				define i1 @bcmp_i128(i128 %a0, i128 %b0, i128 %a1, i128 %b1, i128 %a2, i128 %b2) {
				; CHECK-LABEL: bcmp_i128:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ldp x9, x8, [sp]
				; CHECK-NEXT: ldp x10, x11, [sp, #16]
				; CHECK-NEXT: cmp x10, x9
				; CHECK-NEXT: ccmp x11, x8, #0, eq
				; CHECK-NEXT: cset w8, ne
				; CHECK-NEXT: cmp x2, x0
				; CHECK-NEXT: ccmp x3, x1, #0, eq
				; CHECK-NEXT: ccmp x6, x4, #0, eq
				; CHECK-NEXT: ccmp x7, x5, #0, eq
				; CHECK-NEXT: cset w9, ne
				; CHECK-NEXT: orr w0, w9, w8
				; CHECK-NEXT: ret
				%xor0 = xor i128 %b0, %a0
				%xor1 = xor i128 %b1, %a1
				%xor2 = xor i128 %b2, %a2
				%or0 = or i128 %xor0, %xor1
				%or1 = or i128 %or0, %xor2
				%r = icmp ne i128 %or1, 0
				ret i1 %r
				}

				define i1 @bcmp_i42(i42 %a0, i42 %b0, i42 %a1, i42 %b1, i42 %a2, i42 %b2) {
				; CHECK-LABEL: bcmp_i42:
				; CHECK: // %bb.0:
				; CHECK-NEXT: and x9, x0, #0x3ffffffffff
				; CHECK-NEXT: and x10, x1, #0x3ffffffffff
				; CHECK-NEXT: and x8, x2, #0x3ffffffffff
				; CHECK-NEXT: and x11, x3, #0x3ffffffffff
				; CHECK-NEXT: cmp x10, x9
				; CHECK-NEXT: and x9, x5, #0x3ffffffffff
				; CHECK-NEXT: ccmp x11, x8, #0, eq
				; CHECK-NEXT: and x8, x4, #0x3ffffffffff
				; CHECK-NEXT: ccmp x9, x8, #0, eq
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%xor0 = xor i42 %b0, %a0
				%xor1 = xor i42 %b1, %a1
				%xor2 = xor i42 %b2, %a2
				%or0 = or i42 %xor0, %xor1
				%or1 = or i42 %or0, %xor2
				%r = icmp ne i42 %or1, 0
				ret i1 %r
				}

llvm/test/CodeGen/AArch64/dag-combine-setcc.ll

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	}			}

	; There may be issues with the CMP/CCMP with the scheduling of instructions			; There may be issues with the CMP/CCMP with the scheduling of instructions
	; that ISel will create out of the DAG			; that ISel will create out of the DAG
	define i32 @combine_setcc_glue(i128 noundef %x, i128 noundef %y) {			define i32 @combine_setcc_glue(i128 noundef %x, i128 noundef %y) {
	; CHECK-LABEL: combine_setcc_glue:			; CHECK-LABEL: combine_setcc_glue:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: cmp x0, x2			; CHECK-NEXT: cmp x0, x2
	; CHECK-NEXT: cset w8, eq			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ccmp x1, x3, #0, eq
	; CHECK-NEXT: cset w9, eq
	; CHECK-NEXT: orr w0, w9, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%cmp3 = icmp eq i128 %x, %y			%cmp3 = icmp eq i128 %x, %y
	%conv = trunc i128 %x to i64			%conv = trunc i128 %x to i64
	%conv1 = trunc i128 %y to i64			%conv1 = trunc i128 %y to i64
	%cmp = icmp eq i64 %conv, %conv1			%cmp = icmp eq i64 %conv, %conv1
	%or7 = or i1 %cmp3, %cmp			%or7 = or i1 %cmp3, %cmp
	%or = zext i1 %or7 to i32			%or = zext i1 %or7 to i32
	ret i32 %or			ret i32 %or
	}			}

	; Reduced test from https://github.com/llvm/llvm-project/issues/58675			; Reduced test from https://github.com/llvm/llvm-project/issues/58675
	define [2 x i64] @PR58675(i128 %a.addr, i128 %b.addr) {			define [2 x i64] @PR58675(i128 %a.addr, i128 %b.addr) {
	; CHECK-LABEL: PR58675:			; CHECK-LABEL: PR58675:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: mov x8, xzr			; CHECK-NEXT: mov x8, xzr
	; CHECK-NEXT: mov x9, xzr			; CHECK-NEXT: mov x9, xzr
	; CHECK-NEXT: .LBB12_1: // %do.body			; CHECK-NEXT: .LBB12_1: // %do.body
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: cmp x0, x8			; CHECK-NEXT: cmp x0, x8
	; CHECK-NEXT: csel x10, x0, x8, lo			; CHECK-NEXT: csel x10, x0, x8, lo
	; CHECK-NEXT: cmp x1, x9			; CHECK-NEXT: cmp x1, x9
	; CHECK-NEXT: csel x8, x0, x8, lo			; CHECK-NEXT: csel x8, x0, x8, lo
	; CHECK-NEXT: csel x8, x10, x8, eq			; CHECK-NEXT: csel x11, x1, x9, lo
	; CHECK-NEXT: csel x10, x1, x9, lo			; CHECK-NEXT: csel x10, x10, x8, eq
	; CHECK-NEXT: subs x8, x2, x8			; CHECK-NEXT: subs x8, x2, x10
	; CHECK-NEXT: sbc x9, x3, x10			; CHECK-NEXT: sbc x9, x3, x11
	; CHECK-NEXT: ccmp x3, x10, #0, eq			; CHECK-NEXT: cmp x3, x11
				; CHECK-NEXT: ccmp x2, x10, #0, eq
	; CHECK-NEXT: b.ne .LBB12_1			; CHECK-NEXT: b.ne .LBB12_1
				bcl5980Unsubmitted Not Done Reply Inline Actions Looks regression for this case? bcl5980: Looks regression for this case?
				bcl5980Unsubmitted Not Done Reply Inline Actions This regression cmp x3, x10 part is a little tricky. It depends on the tree search sequence. If we search RHS first it will disappear. But I don't know how to fix it by a elegant way. bcl5980: This regression cmp x3, x10 part is a little tricky. It depends on the tree search sequence. If…
	; CHECK-NEXT: // %bb.2: // %do.end			; CHECK-NEXT: // %bb.2: // %do.end
	; CHECK-NEXT: mov x0, xzr			; CHECK-NEXT: mov x0, xzr
	; CHECK-NEXT: mov x1, xzr			; CHECK-NEXT: mov x1, xzr
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	br label %do.body			br label %do.body

	do.body: ; preds = %do.body, %entry			do.body: ; preds = %do.body, %entry
	Show All 16 Lines

llvm/test/CodeGen/AArch64/i128-cmp.ll

	Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp = icmp sle i128 %a, %b			%cmp = icmp sle i128 %a, %b
	ret i1 %cmp			ret i1 %cmp
	}			}

	define void @br_on_cmp_i128_eq(i128 %a, i128 %b) nounwind {			define void @br_on_cmp_i128_eq(i128 %a, i128 %b) nounwind {
	; CHECK-LABEL: br_on_cmp_i128_eq:			; CHECK-LABEL: br_on_cmp_i128_eq:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: cmp x0, x2			; CHECK-NEXT: cmp x1, x3
	; CHECK-NEXT: ccmp x1, x3, #0, eq			; CHECK-NEXT: ccmp x0, x2, #0, eq
	; CHECK-NEXT: b.ne .LBB10_2			; CHECK-NEXT: b.ne .LBB10_2
	; CHECK-NEXT: // %bb.1: // %call			; CHECK-NEXT: // %bb.1: // %call
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: bl call			; CHECK-NEXT: bl call
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .LBB10_2: // %exit			; CHECK-NEXT: .LBB10_2: // %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp = icmp eq i128 %a, %b			%cmp = icmp eq i128 %a, %b
	br i1 %cmp, label %call, label %exit			br i1 %cmp, label %call, label %exit
	call:			call:
	call void @call()			call void @call()
	br label %exit			br label %exit
	exit:			exit:
	ret void			ret void
	}			}

	define void @br_on_cmp_i128_ne(i128 %a, i128 %b) nounwind {			define void @br_on_cmp_i128_ne(i128 %a, i128 %b) nounwind {
	; CHECK-LABEL: br_on_cmp_i128_ne:			; CHECK-LABEL: br_on_cmp_i128_ne:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: cmp x0, x2			; CHECK-NEXT: cmp x1, x3
	; CHECK-NEXT: ccmp x1, x3, #0, eq			; CHECK-NEXT: ccmp x0, x2, #0, eq
	; CHECK-NEXT: b.eq .LBB11_2			; CHECK-NEXT: b.eq .LBB11_2
	; CHECK-NEXT: // %bb.1: // %call			; CHECK-NEXT: // %bb.1: // %call
	; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill			; CHECK-NEXT: str x30, [sp, #-16]! // 8-byte Folded Spill
	; CHECK-NEXT: bl call			; CHECK-NEXT: bl call
	; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
	; CHECK-NEXT: .LBB11_2: // %exit			; CHECK-NEXT: .LBB11_2: // %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%cmp = icmp ne i128 %a, %b			%cmp = icmp ne i128 %a, %b
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Optimize cmp chain before legalizationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 477445

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/atomicrmw-O0.ll

llvm/test/CodeGen/AArch64/bcmp.ll

llvm/test/CodeGen/AArch64/dag-combine-setcc.ll

llvm/test/CodeGen/AArch64/i128-cmp.ll

[AArch64] Optimize cmp chain before legalization
ClosedPublic