This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
6
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1
arm64-ccmp.ll
-
arm64-fp128.ll
1
cmp-chains.ll
-
select-with-and-or.ll
-
umulo-128-legalisation-lowering.ll
-
vec_umulo.ll

Differential D120422

[AArch64] Optimize comparison chains
ClosedPublic

Authored by Kmeakin on Feb 23 2022, 10:52 AM.

Download Raw Diff

Details

Reviewers

dmgreen
efriedma
sdesmalen
david-arm
fhahn

Commits

rG43a0016f3dcf: Extend `performANDCSELCombine` to `performANDORCSELCombine`

Summary

LLVM generates sub-optimal code for sequences of chained comparions - ie code of the form
(x1 relop1 y1) boolop1 (x2 relop2 y2) boolop2 ... (xn relopn yn)
where relop is one of {<=, <, >=, >, ==, !=} and boolop is one of {&&, ||}.

LLVM currently emits chains of CMP+CSET for each comparison, and then AND/ORR to combine the results of the comparisons. This can be replaced by a chain of CCMPs and a single CSET at the end.

For example.
(x1 < x2) && (x3 > x4) && (x5 != x6) && (x7 == x8)
generates

cmp     w2, w3
cset    w8, hi
cmp     w0, w1
cset    w9, lo
cmp     w4, w5
and     w8, w8, w9
cset    w9, ne
cmp     w6, w7
and     w8, w8, w9
cset    w9, eq
and     w0, w8, w9

but the more efficient code would be:

cmp w2, w3
ccmp w0, w1, #2, hi
ccmp w4, w5, #4, lo
ccmp w6, w7, #0, ne
cset w0, eq

This patch generalizes https://reviews.llvm.org/D118327 to cases where results of comparisons are ORRed together, and where the comparison is performed with CCMP instead of CMP/SUBS

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Kmeakin created this revision.Feb 23 2022, 10:52 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 23 2022, 10:52 AM

Kmeakin requested review of this revision.Feb 23 2022, 10:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 10:52 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Kmeakin retitled this revision from rebase ontop of David Green's patch to Optimize comparison chains on AArch64.Feb 23 2022, 11:00 AM

Kmeakin edited the summary of this revision. (Show Details)

Kmeakin removed a subscriber: hiraditya.

Herald added a subscriber: kristof.beyls. · View Herald TranscriptFeb 23 2022, 11:00 AM

Kmeakin retitled this revision from Optimize comparison chains on AArch64 to [AArch64] Optimize comparison chains.Feb 23 2022, 11:04 AM

Kmeakin added a reviewer: dmgreen.

Kmeakin removed a subscriber: kristof.beyls.

Kmeakin edited the summary of this revision. (Show Details)Feb 23 2022, 11:06 AM

Harbormaster completed remote builds in B151094: Diff 410867.Feb 23 2022, 11:40 AM

Do you have any cases where the existing And lowering wasn't already performing the folds that the new method does? I feel like it was already working OK, and it may be better to work on top of it as opposed to writing it from scratch. It doesn't seem to be needed for any of the tests, and this example from the commit message already looks OK: https://godbolt.org/z/Mdf8nj5ae. I'm not sure if it's worth sharing the method between And and Or, but this might be better focussing on the Or code.

This patch generalizes https://reviews.llvm.org/D118327 to cases where results of comparisons are ORRed together,

Cool, that's great to see. It looks like it will be very useful.

and where the comparison is performed with CCMP instead of CMP/SUBS

I think the existing PerformANDCSELCombine method already handled that. It only requires one of the two operands to be a SUBS, the other can be anything that sets flags, which it re-uses directly. The SUBS gets converted to a CCMP, the other flag-setting instruction is uses as-is as the input.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14101	AL and NV conditions are inverses of one another, but they shouldn't come up, as they would just always pick one of the operands without needing result of the comparison. You could probably change it to an assert, but they shouldn't be generated from anywhere in practice.
14251	I think this code was fine before. Checking for CSEL will inherently check that the type is legal..

Extend performANDORCSELCombine instead of replacing it

Harbormaster completed remote builds in B151936: Diff 412075.Mar 1 2022, 7:27 AM

Thanks for the updates. Looks like a good patch, if we can clean up the details a little.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14051	This is only used in one place? If so it might be simpler inline, or at least closer to the use.
14105–14106	Can you move this comment next to tryCombineToEXTR whilst you are here too?
14115	This can be removed?
14230	Leave this in, I would think?
llvm/test/CodeGen/AArch64/arm64-ccmp.ll
756–757	I think this comment can be removed now. The code looks fine.
llvm/test/CodeGen/AArch64/cmp-chains.ll
2	It can be good to pre-commit the tests, so just the differences get shown in the review. It makes it easier to see what changed.

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 2:59 AM

dmgreen added reviewers: efriedma, sdesmalen, david-arm, fhahn.Mar 2 2022, 3:00 AM

Inline IsBool, move comment next to tryCombineToEXTR, remove obsolete comments from tests

Harbormaster completed remote builds in B152178: Diff 412435.Mar 2 2022, 10:21 AM

On a general note, I'm a bit concerned that we now have two different codepaths for generating ccmp during isel: emitConjunction, and performANDORCSELCombine. As far as I can tell, they're largely overlapping; emitConjunction is more general, and performANDORCSELCombine triggers in more cases, but they're both doing basically the same transform. Can we consolidate this code somehow?

In D120422#3355281, @efriedma wrote:

On a general note, I'm a bit concerned that we now have two different codepaths for generating ccmp during isel: emitConjunction, and performANDORCSELCombine. As far as I can tell, they're largely overlapping; emitConjunction is more general, and performANDORCSELCombine triggers in more cases, but they're both doing basically the same transform. Can we consolidate this code somehow?

Yep I was thinking of that when I ended up using emitConjunction from branches recently. I think that this method of peephole optimizing bit at a time is a more sensible way to go than trying to do things all at once. It's a more "ISel" way of doing things, and should capture more cases. But a few more folds might be needed before we get to the point that we don't need emitConjunction any more.

As far as this patch goes, it LGTM. Thanks for the updates.

This revision is now accepted and ready to land.Mar 3 2022, 12:17 AM

This revision was landed with ongoing or failed builds.Mar 4 2022, 7:11 AM

Closed by commit rG43a0016f3dcf: Extend `performANDCSELCombine` to `performANDORCSELCombine` (authored by Kmeakin). · Explain Why

This revision was automatically updated to reflect the committed changes.

Kmeakin added a commit: rG43a0016f3dcf: Extend `performANDCSELCombine` to `performANDORCSELCombine`.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

121 lines

test/

CodeGen/

AArch64/

arm64-ccmp.ll

28 lines

arm64-fp128.ll

7 lines

cmp-chains.ll

145 lines

select-with-and-or.ll

12 lines

umulo-128-legalisation-lowering.ll

27 lines

vec_umulo.ll

62 lines

Diff 413000

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,028 Lines • ▼ Show 20 Lines	for (int j = 1; j >= 0; --j) {
if (FoundMatch)		if (FoundMatch)
return DAG.getNode(AArch64ISD::BSP, DL, VT, SDValue(BVN0, 0),		return DAG.getNode(AArch64ISD::BSP, DL, VT, SDValue(BVN0, 0),
N0->getOperand(1 - i), N1->getOperand(1 - j));		N0->getOperand(1 - i), N1->getOperand(1 - j));
}		}

return SDValue();		return SDValue();
}		}

		// Given a tree of and/or(csel(0, 1, cc0), csel(0, 1, cc1)), we may be able to
		// convert to csel(ccmp(.., cc0)), depending on cc1:

		// (AND (CSET cc0 cmp0) (CSET cc1 (CMP x1 y1)))
		// =>
		// (CSET cc1 (CCMP x1 y1 !cc1 cc0 cmp0))
		//
		// (OR (CSET cc0 cmp0) (CSET cc1 (CMP x1 y1)))
		// =>
		// (CSET cc1 (CCMP x1 y1 cc1 !cc0 cmp0))
		static SDValue performANDORCSELCombine(SDNode *N, SelectionDAG &DAG) {
		EVT VT = N->getValueType(0);
		SDValue CSel0 = N->getOperand(0);
		SDValue CSel1 = N->getOperand(1);

		dmgreenUnsubmitted Not Done Reply Inline Actions This is only used in one place? If so it might be simpler inline, or at least closer to the use. dmgreen: This is only used in one place? If so it might be simpler inline, or at least closer to the use.
		if (CSel0.getOpcode() != AArch64ISD::CSEL \|\|
		CSel1.getOpcode() != AArch64ISD::CSEL)
		return SDValue();

		if (!CSel0->hasOneUse() \|\| !CSel1->hasOneUse())
		return SDValue();

		if (!isNullConstant(CSel0.getOperand(0)) \|\|
		!isOneConstant(CSel0.getOperand(1)) \|\|
		!isNullConstant(CSel1.getOperand(0)) \|\|
		!isOneConstant(CSel1.getOperand(1)))
		return SDValue();

		SDValue Cmp0 = CSel0.getOperand(3);
		SDValue Cmp1 = CSel1.getOperand(3);
		AArch64CC::CondCode CC0 = (AArch64CC::CondCode)CSel0.getConstantOperandVal(2);
		AArch64CC::CondCode CC1 = (AArch64CC::CondCode)CSel1.getConstantOperandVal(2);
		if (!Cmp0->hasOneUse() \|\| !Cmp1->hasOneUse())
		return SDValue();
		if (Cmp1.getOpcode() != AArch64ISD::SUBS &&
		Cmp0.getOpcode() == AArch64ISD::SUBS) {
		std::swap(Cmp0, Cmp1);
		std::swap(CC0, CC1);
		}

		if (Cmp1.getOpcode() != AArch64ISD::SUBS)
		return SDValue();

		SDLoc DL(N);
		SDValue CCmp;

		if (N->getOpcode() == ISD::AND) {
		AArch64CC::CondCode InvCC0 = AArch64CC::getInvertedCondCode(CC0);
		SDValue Condition = DAG.getConstant(InvCC0, DL, MVT_CC);
		unsigned NZCV = AArch64CC::getNZCVToSatisfyCondCode(CC1);
		SDValue NZCVOp = DAG.getConstant(NZCV, DL, MVT::i32);
		CCmp = DAG.getNode(AArch64ISD::CCMP, DL, MVT_CC, Cmp1.getOperand(0),
		Cmp1.getOperand(1), NZCVOp, Condition, Cmp0);
		} else {
		SDLoc DL(N);
		AArch64CC::CondCode InvCC1 = AArch64CC::getInvertedCondCode(CC1);
		SDValue Condition = DAG.getConstant(CC0, DL, MVT_CC);
		unsigned NZCV = AArch64CC::getNZCVToSatisfyCondCode(InvCC1);
		SDValue NZCVOp = DAG.getConstant(NZCV, DL, MVT::i32);
		CCmp = DAG.getNode(AArch64ISD::CCMP, DL, MVT_CC, Cmp1.getOperand(0),
		Cmp1.getOperand(1), NZCVOp, Condition, Cmp0);
		}
		return DAG.getNode(AArch64ISD::CSEL, DL, VT, CSel0.getOperand(0),
		CSel0.getOperand(1), DAG.getConstant(CC1, DL, MVT::i32),
		CCmp);
		dmgreenUnsubmitted Not Done Reply Inline Actions AL and NV conditions are inverses of one another, but they shouldn't come up, as they would just always pick one of the operands without needing result of the comparison. You could probably change it to an assert, but they shouldn't be generated from anywhere in practice. dmgreen: AL and NV conditions are inverses of one another, but they shouldn't come up, as they would…
		}

static SDValue performORCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,		static SDValue performORCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI,
const AArch64Subtarget *Subtarget) {		const AArch64Subtarget *Subtarget) {
// Attempt to form an EXTR from (or (shl VAL1, #N), (srl VAL2, #RegWidth-N))
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
		dmgreenUnsubmitted Not Done Reply Inline Actions Can you move this comment next to tryCombineToEXTR whilst you are here too? dmgreen: Can you move this comment next to tryCombineToEXTR whilst you are here too?
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

		if (SDValue R = performANDORCSELCombine(N, DAG))
		return R;

if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))		if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
return SDValue();		return SDValue();

		// Attempt to form an EXTR from (or (shl VAL1, #N), (srl VAL2, #RegWidth-N))
		dmgreenUnsubmitted Not Done Reply Inline Actions This can be removed? dmgreen: This can be removed?
if (SDValue Res = tryCombineToEXTR(N, DCI))		if (SDValue Res = tryCombineToEXTR(N, DCI))
return Res;		return Res;

if (SDValue Res = tryCombineToBSL(N, DCI))		if (SDValue Res = tryCombineToBSL(N, DCI))
return Res;		return Res;

return SDValue();		return SDValue();
}		}
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	static SDValue performSVEAndCombine(SDNode *N,
}		}

if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))		if (isConstantSplatVectorMaskForType(Mask.getNode(), MemVT))
return Src;		return Src;

return SDValue();		return SDValue();
}		}

// Given a tree of and(csel(0, 1, cc0), csel(0, 1, cc1)), we may be able to
// convert to csel(ccmp(.., cc0)), depending on cc1.
static SDValue PerformANDCSELCombine(SDNode *N, SelectionDAG &DAG) {
EVT VT = N->getValueType(0);
SDValue CSel0 = N->getOperand(0);
SDValue CSel1 = N->getOperand(1);

if (CSel0.getOpcode() != AArch64ISD::CSEL \|\|
CSel1.getOpcode() != AArch64ISD::CSEL)
return SDValue();

if (!CSel0->hasOneUse() \|\| !CSel1->hasOneUse())
return SDValue();

if (!isNullConstant(CSel0.getOperand(0)) \|\|
!isOneConstant(CSel0.getOperand(1)) \|\|
!isNullConstant(CSel1.getOperand(0)) \|\|
!isOneConstant(CSel1.getOperand(1)))
return SDValue();

SDValue Cmp0 = CSel0.getOperand(3);
SDValue Cmp1 = CSel1.getOperand(3);
AArch64CC::CondCode CC0 = (AArch64CC::CondCode)CSel0.getConstantOperandVal(2);
AArch64CC::CondCode CC1 = (AArch64CC::CondCode)CSel1.getConstantOperandVal(2);
if (!Cmp0->hasOneUse() \|\| !Cmp1->hasOneUse())
return SDValue();
if (Cmp1.getOpcode() != AArch64ISD::SUBS &&
Cmp0.getOpcode() == AArch64ISD::SUBS) {
std::swap(Cmp0, Cmp1);
std::swap(CC0, CC1);
}

if (Cmp1.getOpcode() != AArch64ISD::SUBS)
return SDValue();

SDLoc DL(N);
AArch64CC::CondCode InvCC0 = AArch64CC::getInvertedCondCode(CC0);
SDValue Condition = DAG.getConstant(InvCC0, DL, MVT_CC);
unsigned NZCV = AArch64CC::getNZCVToSatisfyCondCode(CC1);
SDValue NZCVOp = DAG.getConstant(NZCV, DL, MVT::i32);
SDValue CCmp = DAG.getNode(AArch64ISD::CCMP, DL, MVT_CC, Cmp1.getOperand(0),
Cmp1.getOperand(1), NZCVOp, Condition, Cmp0);
return DAG.getNode(AArch64ISD::CSEL, DL, VT, CSel0.getOperand(0),
CSel0.getOperand(1), DAG.getConstant(CC1, DL, MVT::i32),
CCmp);
}

static SDValue performANDCombine(SDNode *N,		static SDValue performANDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

if (SDValue R = PerformANDCSELCombine(N, DAG))		if (SDValue R = performANDORCSELCombine(N, DAG))
return R;		return R;
		dmgreenUnsubmitted Not Done Reply Inline Actions I think this code was fine before. Checking for CSEL will inherently check that the type is legal.. dmgreen: I think this code was fine before. Checking for CSEL will inherently check that the type is…

if (!VT.isVector() \|\| !DAG.getTargetLoweringInfo().isTypeLegal(VT))		if (!VT.isVector() \|\| !DAG.getTargetLoweringInfo().isTypeLegal(VT))
dmgreenUnsubmitted Not Done Reply Inline Actions Leave this in, I would think? dmgreen: Leave this in, I would think?
return SDValue();		return SDValue();

if (VT.isScalableVector())		if (VT.isScalableVector())
return performSVEAndCombine(N, DCI);		return performSVEAndCombine(N, DCI);

// The combining code below works only for NEON vectors. In particular, it		// The combining code below works only for NEON vectors. In particular, it
// does not work for SVE when dealing with vectors wider than 128 bits.		// does not work for SVE when dealing with vectors wider than 128 bits.
if (!(VT.is64BitVector() \|\| VT.is128BitVector()))		if (!(VT.is64BitVector() \|\| VT.is128BitVector()))
▲ Show 20 Lines • Show All 6,291 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-ccmp.ll

Show First 20 Lines • Show All 747 Lines • ▼ Show 20 Lines	; GISEL-NEXT: ret
%and0 = and i1 %c0, %c1		%and0 = and i1 %c0, %c1
%and1 = and i1 %c2, %c4		%and1 = and i1 %c2, %c4
%or = or i1 %and0, %and1		%or = or i1 %and0, %and1
%sel = select i1 %or, i64 0, i64 %r		%sel = select i1 %or, i64 0, i64 %r
ret i64 %sel		ret i64 %sel
}		}

@g = global i32 0		@g = global i32 0

; Should not use ccmp if we have to compute the or expression in an integer
; register anyway because of other users.
define i64 @select_noccmp2(i64 %v1, i64 %v2, i64 %v3, i64 %r) {		define i64 @select_noccmp2(i64 %v1, i64 %v2, i64 %v3, i64 %r) {
		dmgreenUnsubmitted Not Done Reply Inline Actions I think this comment can be removed now. The code looks fine. dmgreen: I think this comment can be removed now. The code looks fine.
; CHECK-LABEL: select_noccmp2:		; CHECK-LABEL: select_noccmp2:
; CHECK: ; %bb.0:		; CHECK: ; %bb.0:
; CHECK-NEXT: cmp x0, #0		; CHECK-NEXT: cmp x0, #0
; CHECK-NEXT: cset w8, lt		; CHECK-NEXT: ccmp x0, #13, #0, ge
; CHECK-NEXT: cmp x0, #13		; CHECK-NEXT: cset w8, gt
; CHECK-NEXT: cset w9, gt
; CHECK-NEXT: orr w8, w8, w9
; CHECK-NEXT: cmp w8, #0		; CHECK-NEXT: cmp w8, #0
; CHECK-NEXT: csel x0, xzr, x3, ne		; CHECK-NEXT: csel x0, xzr, x3, ne
; CHECK-NEXT: sbfx w8, w8, #0, #1		; CHECK-NEXT: sbfx w8, w8, #0, #1
; CHECK-NEXT: adrp x9, _g@PAGE		; CHECK-NEXT: adrp x9, _g@PAGE
; CHECK-NEXT: str w8, [x9, _g@PAGEOFF]		; CHECK-NEXT: str w8, [x9, _g@PAGEOFF]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; GISEL-LABEL: select_noccmp2:		; GISEL-LABEL: select_noccmp2:
Show All 19 Lines
}		}

; The following is not possible to implement with a single cmp;ccmp;csel		; The following is not possible to implement with a single cmp;ccmp;csel
; sequence.		; sequence.
define i32 @select_noccmp3(i32 %v0, i32 %v1, i32 %v2) {		define i32 @select_noccmp3(i32 %v0, i32 %v1, i32 %v2) {
; CHECK-LABEL: select_noccmp3:		; CHECK-LABEL: select_noccmp3:
; CHECK: ; %bb.0:		; CHECK: ; %bb.0:
; CHECK-NEXT: cmp w0, #0		; CHECK-NEXT: cmp w0, #0
; CHECK-NEXT: cset w8, lt		; CHECK-NEXT: ccmp w0, #13, #0, ge
; CHECK-NEXT: cmp w0, #13		; CHECK-NEXT: cset w8, gt
; CHECK-NEXT: cset w9, gt
; CHECK-NEXT: cmp w0, #22		; CHECK-NEXT: cmp w0, #22
; CHECK-NEXT: cset w10, lt		; CHECK-NEXT: mov w9, #44
; CHECK-NEXT: cmp w0, #44		; CHECK-NEXT: ccmp w0, w9, #0, ge
; CHECK-NEXT: cset w11, gt		; CHECK-NEXT: cset w9, gt
; CHECK-NEXT: cmp w0, #99		; CHECK-NEXT: cmp w0, #99
; CHECK-NEXT: cset w12, eq
; CHECK-NEXT: cmp w0, #77
; CHECK-NEXT: cset w13, eq
; CHECK-NEXT: orr w8, w8, w9
; CHECK-NEXT: orr w9, w10, w11
; CHECK-NEXT: and w8, w8, w9		; CHECK-NEXT: and w8, w8, w9
; CHECK-NEXT: orr w9, w12, w13		; CHECK-NEXT: mov w9, #77
		; CHECK-NEXT: ccmp w0, w9, #4, ne
		; CHECK-NEXT: cset w9, eq
; CHECK-NEXT: tst w8, w9		; CHECK-NEXT: tst w8, w9
; CHECK-NEXT: csel w0, w1, w2, ne		; CHECK-NEXT: csel w0, w1, w2, ne
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; GISEL-LABEL: select_noccmp3:		; GISEL-LABEL: select_noccmp3:
; GISEL: ; %bb.0:		; GISEL: ; %bb.0:
; GISEL-NEXT: cmp w0, #0		; GISEL-NEXT: cmp w0, #0
; GISEL-NEXT: cset w8, lt		; GISEL-NEXT: cset w8, lt
▲ Show 20 Lines • Show All 612 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-fp128.ll

	Show First 20 Lines • Show All 251 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: .cfi_offset w19, -8			; CHECK-NEXT: .cfi_offset w19, -8
	; CHECK-NEXT: .cfi_offset w30, -16			; CHECK-NEXT: .cfi_offset w30, -16
	; CHECK-NEXT: adrp x8, lhs			; CHECK-NEXT: adrp x8, lhs
	; CHECK-NEXT: ldr q0, [x8, :lo12:lhs]			; CHECK-NEXT: ldr q0, [x8, :lo12:lhs]
	; CHECK-NEXT: adrp x8, rhs			; CHECK-NEXT: adrp x8, rhs
	; CHECK-NEXT: ldr q1, [x8, :lo12:rhs]			; CHECK-NEXT: ldr q1, [x8, :lo12:rhs]
	; CHECK-NEXT: stp q1, q0, [sp] // 32-byte Folded Spill			; CHECK-NEXT: stp q1, q0, [sp] // 32-byte Folded Spill
	; CHECK-NEXT: bl __eqtf2			; CHECK-NEXT: bl __eqtf2
	; CHECK-NEXT: cmp w0, #0			; CHECK-NEXT: mov x19, x0
	; CHECK-NEXT: cset w19, eq
	; CHECK-NEXT: ldp q1, q0, [sp] // 32-byte Folded Reload			; CHECK-NEXT: ldp q1, q0, [sp] // 32-byte Folded Reload
	; CHECK-NEXT: bl __unordtf2			; CHECK-NEXT: bl __unordtf2
	; CHECK-NEXT: cmp w0, #0			; CHECK-NEXT: cmp w0, #0
	; CHECK-NEXT: cset w8, ne			; CHECK-NEXT: ccmp w19, #0, #4, eq
	; CHECK-NEXT: orr w0, w8, w19			; CHECK-NEXT: cset w0, eq
	; CHECK-NEXT: ldp x30, x19, [sp, #32] // 16-byte Folded Reload			; CHECK-NEXT: ldp x30, x19, [sp, #32] // 16-byte Folded Reload
	; CHECK-NEXT: add sp, sp, #48			; CHECK-NEXT: add sp, sp, #48
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	%lhs = load fp128, fp128* @lhs, align 16			%lhs = load fp128, fp128* @lhs, align 16
	%rhs = load fp128, fp128* @rhs, align 16			%rhs = load fp128, fp128* @rhs, align 16

	%val = fcmp ueq fp128 %lhs, %rhs			%val = fcmp ueq fp128 %lhs, %rhs
	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/cmp-chains.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=aarch64-- \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions It can be good to pre-commit the tests, so just the differences get shown in the review. It makes it easier to see what changed. dmgreen: It can be good to pre-commit the tests, so just the differences get shown in the review. It…

				; Ensure chains of comparisons produce chains of `ccmp`

				; (x0 < x1) && (x2 > x3)
				define i32 @cmp_and2(i32 %0, i32 %1, i32 %2, i32 %3) {
				; CHECK-LABEL: cmp_and2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cmp w0, w1
				; CHECK-NEXT: ccmp w2, w3, #0, lo
				; CHECK-NEXT: cset w0, hi
				; CHECK-NEXT: ret
				%5 = icmp ult i32 %0, %1
				%6 = icmp ugt i32 %2, %3
				%7 = select i1 %5, i1 %6, i1 false
				%8 = zext i1 %7 to i32
				ret i32 %8
				}

				; (x0 < x1) && (x2 > x3) && (x4 != x5)
				define i32 @cmp_and3(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5) {
				; CHECK-LABEL: cmp_and3:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cmp w0, w1
				; CHECK-NEXT: ccmp w2, w3, #0, lo
				; CHECK-NEXT: ccmp w4, w5, #4, hi
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%7 = icmp ult i32 %0, %1
				%8 = icmp ugt i32 %2, %3
				%9 = select i1 %7, i1 %8, i1 false
				%10 = icmp ne i32 %4, %5
				%11 = select i1 %9, i1 %10, i1 false
				%12 = zext i1 %11 to i32
				ret i32 %12
				}

				; (x0 < x1) && (x2 > x3) && (x4 != x5) && (x6 == x7)
				define i32 @cmp_and4(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7) {
				; CHECK-LABEL: cmp_and4:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cmp w2, w3
				; CHECK-NEXT: ccmp w0, w1, #2, hi
				; CHECK-NEXT: ccmp w4, w5, #4, lo
				; CHECK-NEXT: ccmp w6, w7, #0, ne
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%9 = icmp ugt i32 %2, %3
				%10 = icmp ult i32 %0, %1
				%11 = select i1 %9, i1 %10, i1 false
				%12 = icmp ne i32 %4, %5
				%13 = select i1 %11, i1 %12, i1 false
				%14 = icmp eq i32 %6, %7
				%15 = select i1 %13, i1 %14, i1 false
				%16 = zext i1 %15 to i32
				ret i32 %16
				}

				; (x0 < x1) \|\| (x2 > x3)
				define i32 @cmp_or2(i32 %0, i32 %1, i32 %2, i32 %3) {
				; CHECK-LABEL: cmp_or2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cmp w0, w1
				; CHECK-NEXT: ccmp w2, w3, #0, hs
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%5 = icmp ult i32 %0, %1
				%6 = icmp ne i32 %2, %3
				%7 = select i1 %5, i1 true, i1 %6
				%8 = zext i1 %7 to i32
				ret i32 %8
				}

				; (x0 < x1) \|\| (x2 > x3) \|\| (x4 != x5)
				define i32 @cmp_or3(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5) {
				; CHECK-LABEL: cmp_or3:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cmp w0, w1
				; CHECK-NEXT: ccmp w2, w3, #2, hs
				; CHECK-NEXT: ccmp w4, w5, #0, ls
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%7 = icmp ult i32 %0, %1
				%8 = icmp ugt i32 %2, %3
				%9 = select i1 %7, i1 true, i1 %8
				%10 = icmp ne i32 %4, %5
				%11 = select i1 %9, i1 true, i1 %10
				%12 = zext i1 %11 to i32
				ret i32 %12
				}

				; (x0 < x1) \|\| (x2 > x3) \|\| (x4 != x5) \|\| (x6 == x7)
				define i32 @cmp_or4(i32 %0, i32 %1, i32 %2, i32 %3, i32 %4, i32 %5, i32 %6, i32 %7) {
				; CHECK-LABEL: cmp_or4:
				; CHECK: // %bb.0:
				; CHECK-NEXT: cmp w0, w1
				; CHECK-NEXT: ccmp w2, w3, #2, hs
				; CHECK-NEXT: ccmp w4, w5, #0, ls
				; CHECK-NEXT: ccmp w6, w7, #4, eq
				; CHECK-NEXT: cset w0, eq
				; CHECK-NEXT: ret
				%9 = icmp ult i32 %0, %1
				%10 = icmp ugt i32 %2, %3
				%11 = select i1 %9, i1 true, i1 %10
				%12 = icmp ne i32 %4, %5
				%13 = select i1 %11, i1 true, i1 %12
				%14 = icmp eq i32 %6, %7
				%15 = select i1 %13, i1 true, i1 %14
				%16 = zext i1 %15 to i32
				ret i32 %16
				}

				; (x0 != 0) \|\| (x1 != 0)
				define i32 @true_or2(i32 %0, i32 %1) {
				; CHECK-LABEL: true_or2:
				; CHECK: // %bb.0:
				; CHECK-NEXT: orr w8, w0, w1
				; CHECK-NEXT: cmp w8, #0
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%3 = icmp ne i32 %0, 0
				%4 = icmp ne i32 %1, 0
				%5 = select i1 %3, i1 true, i1 %4
				%6 = zext i1 %5 to i32
				ret i32 %6
				}

				; (x0 != 0) \|\| (x1 != 0) \|\| (x2 != 0)
				define i32 @true_or3(i32 %0, i32 %1, i32 %2) {
				; CHECK-LABEL: true_or3:
				; CHECK: // %bb.0:
				; CHECK-NEXT: orr w8, w0, w1
				; CHECK-NEXT: orr w8, w8, w2
				; CHECK-NEXT: cmp w8, #0
				; CHECK-NEXT: cset w0, ne
				; CHECK-NEXT: ret
				%4 = icmp ne i32 %0, 0
				%5 = icmp ne i32 %1, 0
				%6 = select i1 %4, i1 true, i1 %5
				%7 = icmp ne i32 %2, 0
				%8 = select i1 %6, i1 true, i1 %7
				%9 = zext i1 %8 to i32
				ret i32 %9
				}

llvm/test/CodeGen/AArch64/select-with-and-or.ll

Show All 12 Lines	; CHECK-NEXT: ret
%s = select i1 %a, i1 %b, i1 false		%s = select i1 %a, i1 %b, i1 false
ret i1 %s		ret i1 %s
}		}

define i1 @or(i32 %x, i32 %y, i32 %z, i32 %w) {		define i1 @or(i32 %x, i32 %y, i32 %z, i32 %w) {
; CHECK-LABEL: or:		; CHECK-LABEL: or:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: cmp w0, w1		; CHECK-NEXT: cmp w0, w1
; CHECK-NEXT: cset w8, eq		; CHECK-NEXT: ccmp w2, w3, #0, ne
; CHECK-NEXT: cmp w2, w3		; CHECK-NEXT: cset w0, gt
; CHECK-NEXT: cset w9, gt
; CHECK-NEXT: orr w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a = icmp eq i32 %x, %y		%a = icmp eq i32 %x, %y
%b = icmp sgt i32 %z, %w		%b = icmp sgt i32 %z, %w
%s = select i1 %a, i1 true, i1 %b		%s = select i1 %a, i1 true, i1 %b
ret i1 %s		ret i1 %s
}		}

define i1 @and_not(i32 %x, i32 %y, i32 %z, i32 %w) {		define i1 @and_not(i32 %x, i32 %y, i32 %z, i32 %w) {
; CHECK-LABEL: and_not:		; CHECK-LABEL: and_not:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: cmp w0, w1		; CHECK-NEXT: cmp w0, w1
; CHECK-NEXT: ccmp w2, w3, #4, ne		; CHECK-NEXT: ccmp w2, w3, #4, ne
; CHECK-NEXT: cset w0, gt		; CHECK-NEXT: cset w0, gt
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a = icmp eq i32 %x, %y		%a = icmp eq i32 %x, %y
%b = icmp sgt i32 %z, %w		%b = icmp sgt i32 %z, %w
%s = select i1 %a, i1 false, i1 %b		%s = select i1 %a, i1 false, i1 %b
ret i1 %s		ret i1 %s
}		}

define i1 @or_not(i32 %x, i32 %y, i32 %z, i32 %w) {		define i1 @or_not(i32 %x, i32 %y, i32 %z, i32 %w) {
; CHECK-LABEL: or_not:		; CHECK-LABEL: or_not:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: cmp w0, w1		; CHECK-NEXT: cmp w0, w1
; CHECK-NEXT: cset w8, ne		; CHECK-NEXT: ccmp w2, w3, #0, eq
; CHECK-NEXT: cmp w2, w3		; CHECK-NEXT: cset w0, gt
; CHECK-NEXT: cset w9, gt
; CHECK-NEXT: orr w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a = icmp eq i32 %x, %y		%a = icmp eq i32 %x, %y
%b = icmp sgt i32 %z, %w		%b = icmp sgt i32 %z, %w
%s = select i1 %a, i1 %b, i1 true		%s = select i1 %a, i1 %b, i1 true
ret i1 %s		ret i1 %s
}		}

define <4 x i1> @and_vec(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z, <4 x i32> %w) {		define <4 x i1> @and_vec(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z, <4 x i32> %w) {
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/umulo-128-legalisation-lowering.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64-unknown-linux-gnu \| FileCheck %s --check-prefixes=AARCH			; RUN: llc < %s -mtriple=aarch64-unknown-linux-gnu \| FileCheck %s --check-prefixes=AARCH

	define { i128, i8 } @muloti_test(i128 %l, i128 %r) unnamed_addr #0 {			define { i128, i8 } @muloti_test(i128 %l, i128 %r) unnamed_addr #0 {
	; AARCH-LABEL: muloti_test:			; AARCH-LABEL: muloti_test:
	; AARCH: // %bb.0: // %start			; AARCH: // %bb.0: // %start
	; AARCH-NEXT: umulh x8, x1, x2			; AARCH-NEXT: mul x8, x3, x0
	; AARCH-NEXT: mul x9, x3, x0			; AARCH-NEXT: umulh x9, x0, x2
	; AARCH-NEXT: cmp xzr, x8			; AARCH-NEXT: madd x8, x1, x2, x8
	; AARCH-NEXT: umulh x10, x3, x0			; AARCH-NEXT: umulh x10, x1, x2
	; AARCH-NEXT: cset w8, ne			; AARCH-NEXT: adds x8, x9, x8
				; AARCH-NEXT: cset w9, hs
	; AARCH-NEXT: cmp x1, #0			; AARCH-NEXT: cmp x1, #0
	; AARCH-NEXT: ccmp x3, #0, #4, ne			; AARCH-NEXT: ccmp x3, #0, #4, ne
	; AARCH-NEXT: madd x9, x1, x2, x9			; AARCH-NEXT: mov x1, x8
	; AARCH-NEXT: cset w11, ne			; AARCH-NEXT: ccmp xzr, x10, #0, eq
	; AARCH-NEXT: cmp xzr, x10			; AARCH-NEXT: umulh x10, x3, x0
	; AARCH-NEXT: umulh x10, x0, x2
	; AARCH-NEXT: orr w8, w11, w8
	; AARCH-NEXT: cset w11, ne
	; AARCH-NEXT: mul x0, x0, x2			; AARCH-NEXT: mul x0, x0, x2
	; AARCH-NEXT: adds x1, x10, x9			; AARCH-NEXT: ccmp xzr, x10, #0, eq
	; AARCH-NEXT: orr w8, w8, w11			; AARCH-NEXT: cset w10, ne
	; AARCH-NEXT: cset w9, hs			; AARCH-NEXT: orr w2, w10, w9
	; AARCH-NEXT: orr w2, w8, w9
	; AARCH-NEXT: ret			; AARCH-NEXT: ret
	start:			start:
	%0 = tail call { i128, i1 } @llvm.umul.with.overflow.i128(i128 %l, i128 %r) #2			%0 = tail call { i128, i1 } @llvm.umul.with.overflow.i128(i128 %l, i128 %r) #2
	%1 = extractvalue { i128, i1 } %0, 0			%1 = extractvalue { i128, i1 } %0, 0
	%2 = extractvalue { i128, i1 } %0, 1			%2 = extractvalue { i128, i1 } %0, 1
	%3 = zext i1 %2 to i8			%3 = zext i1 %2 to i8
	%4 = insertvalue { i128, i8 } undef, i128 %1, 0			%4 = insertvalue { i128, i8 } undef, i128 %1, 0
	%5 = insertvalue { i128, i8 } %4, i8 %3, 1			%5 = insertvalue { i128, i8 } %4, i8 %3, 1
	Show All 9 Lines

llvm/test/CodeGen/AArch64/vec_umulo.ll

Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%res = sext <4 x i1> %obit to <4 x i32>		%res = sext <4 x i1> %obit to <4 x i32>
store <4 x i1> %val, <4 x i1>* %p2		store <4 x i1> %val, <4 x i1>* %p2
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <2 x i32> @umulo_v2i128(<2 x i128> %a0, <2 x i128> %a1, <2 x i128>* %p2) nounwind {		define <2 x i32> @umulo_v2i128(<2 x i128> %a0, <2 x i128> %a1, <2 x i128>* %p2) nounwind {
; CHECK-LABEL: umulo_v2i128:		; CHECK-LABEL: umulo_v2i128:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: umulh x8, x3, x6		; CHECK-NEXT: mul x8, x7, x2
; CHECK-NEXT: mul x10, x7, x2		; CHECK-NEXT: umulh x9, x2, x6
; CHECK-NEXT: cmp xzr, x8		; CHECK-NEXT: madd x8, x3, x6, x8
; CHECK-NEXT: umulh x8, x7, x2		; CHECK-NEXT: umulh x10, x3, x6
; CHECK-NEXT: cset w9, ne		; CHECK-NEXT: adds x8, x9, x8
		; CHECK-NEXT: umulh x11, x7, x2
		; CHECK-NEXT: cset w9, hs
; CHECK-NEXT: cmp x3, #0		; CHECK-NEXT: cmp x3, #0
; CHECK-NEXT: ccmp x7, #0, #4, ne		; CHECK-NEXT: ccmp x7, #0, #4, ne
; CHECK-NEXT: umulh x11, x2, x6		; CHECK-NEXT: umulh x13, x1, x4
; CHECK-NEXT: madd x10, x3, x6, x10		; CHECK-NEXT: ccmp xzr, x10, #0, eq
; CHECK-NEXT: umulh x12, x1, x4		; CHECK-NEXT: mul x10, x5, x0
; CHECK-NEXT: cset w13, ne		; CHECK-NEXT: madd x10, x1, x4, x10
; CHECK-NEXT: cmp xzr, x8		; CHECK-NEXT: ccmp xzr, x11, #0, eq
; CHECK-NEXT: cset w8, ne		; CHECK-NEXT: umulh x11, x0, x4
		; CHECK-NEXT: cset w12, ne
; CHECK-NEXT: adds x10, x11, x10		; CHECK-NEXT: adds x10, x11, x10
; CHECK-NEXT: cset w11, hs		; CHECK-NEXT: cset w11, hs
; CHECK-NEXT: cmp xzr, x12
; CHECK-NEXT: cset w12, ne
; CHECK-NEXT: cmp x1, #0		; CHECK-NEXT: cmp x1, #0
; CHECK-NEXT: ccmp x5, #0, #4, ne		; CHECK-NEXT: ccmp x5, #0, #4, ne
; CHECK-NEXT: mul x15, x5, x0		; CHECK-NEXT: orr w9, w12, w9
; CHECK-NEXT: umulh x14, x5, x0		; CHECK-NEXT: mul x12, x0, x4
; CHECK-NEXT: orr w9, w13, w9		; CHECK-NEXT: ccmp xzr, x13, #0, eq
; CHECK-NEXT: umulh x16, x0, x4		; CHECK-NEXT: umulh x13, x5, x0
; CHECK-NEXT: orr w8, w9, w8		; CHECK-NEXT: ccmp xzr, x13, #0, eq
; CHECK-NEXT: madd x15, x1, x4, x15		; CHECK-NEXT: cset w13, ne
; CHECK-NEXT: cset w17, ne		; CHECK-NEXT: orr w11, w13, w11
; CHECK-NEXT: cmp xzr, x14		; CHECK-NEXT: fmov s0, w11
; CHECK-NEXT: orr w12, w17, w12		; CHECK-NEXT: ldr x11, [sp]
; CHECK-NEXT: cset w14, ne		; CHECK-NEXT: mov v0.s[1], w9
; CHECK-NEXT: adds x15, x16, x15		; CHECK-NEXT: mul x9, x2, x6
; CHECK-NEXT: orr w12, w12, w14		; CHECK-NEXT: stp x12, x10, [x11]
; CHECK-NEXT: cset w14, hs
; CHECK-NEXT: orr w12, w12, w14
; CHECK-NEXT: orr w8, w8, w11
; CHECK-NEXT: mul x11, x0, x4
; CHECK-NEXT: ldr x9, [sp]
; CHECK-NEXT: fmov s0, w12
; CHECK-NEXT: stp x11, x15, [x9]
; CHECK-NEXT: mov v0.s[1], w8
; CHECK-NEXT: mul x8, x2, x6
; CHECK-NEXT: shl v0.2s, v0.2s, #31		; CHECK-NEXT: shl v0.2s, v0.2s, #31
; CHECK-NEXT: stp x8, x10, [x9, #16]		; CHECK-NEXT: stp x9, x8, [x11, #16]
; CHECK-NEXT: cmlt v0.2s, v0.2s, #0		; CHECK-NEXT: cmlt v0.2s, v0.2s, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t = call {<2 x i128>, <2 x i1>} @llvm.umul.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)		%t = call {<2 x i128>, <2 x i1>} @llvm.umul.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)
%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0		%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0
%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1		%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1
%res = sext <2 x i1> %obit to <2 x i32>		%res = sext <2 x i1> %obit to <2 x i32>
store <2 x i128> %val, <2 x i128>* %p2		store <2 x i128> %val, <2 x i128>* %p2
ret <2 x i32> %res		ret <2 x i32> %res
}		}