This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
2/4
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
i128-math.ll
-
usub_sat_vec.ll

Differential D124976

[AArch64] Fix sub with carry
ClosedPublic

Authored by kazu on May 4 2022, 9:16 PM.

Download Raw Diff

Details

Reviewers

efriedma
paulwalker-arm
t.p.northover
Kmeakin

Commits

rGfffb6e6afdba: [AArch64] Fix sub with carry

Summary

[AArch64] Fix sub with carry

13403a70e45b2d22878ba59fc211f8dba3a8deba introduced a bug where we
generate the outgoing carry inverted, which in turn breaks the
lowering of @llvm.usub.sat.i128, returning the normal difference on
saturation and zero otherwise.

Note that AArch64 has peculiar semantics where the subtraction
instructions generate borrow inverted.  The problem is that we mix the
two forms of semantics -- the normal carry and inverted carry -- in
the area of extended precision subtractions.  Specifically, we have
three problems:

- lowerADDSUBCARRY takes the non-inverted incoming carry from a
  subtraction and feeds it to SBCS without inverting it first.

- lowerADDSUBCARRY makes available the outgoing carry from SBCS
  without inverting it.

- foldOverflowCheck folds:

  (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry)

  When the incoming carry flag is set, CSET LO results in zero.  CMP
  in turn generates a borrow, *clearing* the carry flag.  Instead, we
  should fold:

  (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry)

  When the incoming carry flag is set, CSET LO results in zero.  CMP
  does not generate a borrow, *setting* the carry flag.

IIUC, we should use the normal (that is, non-inverted) semantics for
carry everywhere.

This patch fixes the three problems above.

This patch does not add any new testcases because we have a plenty of
them covering the instruction in question.  In particular,
@u128_saturating_sub is identical to the testcase in the motivating
issue.

Fixes: #55253

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kazu created this revision.May 4 2022, 9:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 9:16 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

kazu requested review of this revision.May 4 2022, 9:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 9:16 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B162825: Diff 427190.May 4 2022, 9:50 PM

kazu mentioned this in D123322: [AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY.May 4 2022, 11:09 PM

efriedma added inline comments.May 5 2022, 10:52 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3324–3327	The comment here could be improved. Maybe spell out "if Invert is true, value is 0 if 'C' bit is set, and 1 if it is not set".
3371	It's a bit concerning to me that you're changing OutFlag, but not OpCarryIn. If we're inverting the output flag, don't we need to invert the input flag too? Or is it already getting inverted correctly? It might make sense to also add tests for i256 addition and subtraction.

efriedma added inline comments.May 5 2022, 10:56 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3371	Just checked; this is what we currently generate for i256 subtraction: subs x0, x0, x4 sbcs x1, x1, x5 cset w8, hs cmp w8, #1 sbcs x2, x2, x6 cset w8, hs cmp w8, #1 sbcs x3, x3, x7 ret I guess maybe the input is already handled correctly.

Update the comment.

LGTM

This revision is now accepted and ready to land.May 5 2022, 2:06 PM

Harbormaster completed remote builds in B163001: Diff 427442.May 5 2022, 2:25 PM

Fix foldOverflowCheck.

kazu edited the summary of this revision. (Show Details)May 6 2022, 12:56 AM

I've fixed the treatment of the incoming carry as well as foldOverflowCheck. Please take a look. Thanks!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3371	It turns out that we have to invert the incoming carry, too. We appear to generate correct code -- `subs` immediately followed by `sbcs`, but that's because wrong semantics is canceled by another piece of wrong semantics. `lowerADDSUBCARRY` generates wrong code, taking in the incoming carry and feeding it to take in the incoming carry without inverting it. Meanwhile, `foldOverflowCheck` expects the wrong code from `lowerADDSUBCARRY` and folds it away. That is, if you disable `foldOverflowCheck` by teaching it to always return `SDValue()`, then you would get wrong code between `subs and` sbcs`. In the latest revision, of the patch I've corrected both -- the treatment of the incoming carry and the folding done in `foldOverflowCheck`.

Harbormaster completed remote builds in B163080: Diff 427547.May 6 2022, 2:07 AM

Is it worth updating the ISD node description in ISDOpcodes.h? because this is how the bug happened in the first place, with a literal interpretation of "incoming carry" and "and the output carry".

kazu edited the summary of this revision. (Show Details)May 6 2022, 9:35 AM

In D124976#3496267, @paulwalker-arm wrote:

Is it worth updating the ISD node description in ISDOpcodes.h? because this is how the bug happened in the first place, with a literal interpretation of "incoming carry" and "and the output carry".

Sure, I am happy to post a separate patch for that.

@efriedma, I understand that you've LGTMed an earlier revision, but I'd appreciate a review on the latest one. As you suspected, the incoming carry also needs fixing. Thanks in advance!

LGTM

I agree it makes sense to update the comment in ISDOpcodes.h; might as well do it here. And I still want to see an i256 sub testcase. But you can do that in a followup.

This revision is now accepted and ready to land.May 6 2022, 10:13 AM

jgorbe added a subscriber: jgorbe.May 6 2022, 10:16 AM

This revision was landed with ongoing or failed builds.May 6 2022, 11:04 AM

Closed by commit rGfffb6e6afdba: [AArch64] Fix sub with carry (authored by kazu). · Explain Why

This revision was automatically updated to reflect the committed changes.

kazu added a commit: rGfffb6e6afdba: [AArch64] Fix sub with carry.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

44 lines

test/

CodeGen/

AArch64/

i128-math.ll

6 lines

usub_sat_vec.ll

4 lines

Diff 427682

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,302 Lines • ▼ Show 20 Lines	static SDValue LowerADDC_ADDE_SUBC_SUBE(SDValue Op, SelectionDAG &DAG) {
}		}

if (!ExtraOp)		if (!ExtraOp)
return DAG.getNode(Opc, SDLoc(Op), VTs, Op.getOperand(0), Op.getOperand(1));		return DAG.getNode(Opc, SDLoc(Op), VTs, Op.getOperand(0), Op.getOperand(1));
return DAG.getNode(Opc, SDLoc(Op), VTs, Op.getOperand(0), Op.getOperand(1),		return DAG.getNode(Opc, SDLoc(Op), VTs, Op.getOperand(0), Op.getOperand(1),
Op.getOperand(2));		Op.getOperand(2));
}		}

// Sets 'C' bit of NZCV to 0 if value is 0, else sets 'C' bit to 1		// If Invert is false, sets 'C' bit of NZCV to 0 if value is 0, else sets 'C'
static SDValue valueToCarryFlag(SDValue Value, SelectionDAG &DAG) {		// bit to 1. If Invert is true, sets 'C' bit of NZCV to 1 if value is 0, else
		// sets 'C' bit to 0.
		static SDValue valueToCarryFlag(SDValue Value, SelectionDAG &DAG, bool Invert) {
SDLoc DL(Value);		SDLoc DL(Value);
SDValue One = DAG.getConstant(1, DL, Value.getValueType());		EVT VT = Value.getValueType();
		SDValue Op0 = Invert ? DAG.getConstant(0, DL, VT) : Value;
		SDValue Op1 = Invert ? Value : DAG.getConstant(1, DL, VT);
SDValue Cmp =		SDValue Cmp =
DAG.getNode(AArch64ISD::SUBS, DL,		DAG.getNode(AArch64ISD::SUBS, DL, DAG.getVTList(VT, MVT::Glue), Op0, Op1);
DAG.getVTList(Value.getValueType(), MVT::Glue), Value, One);
return Cmp.getValue(1);		return Cmp.getValue(1);
}		}

// Value is 1 if 'C' bit of NZCV is 1, else 0		// If Invert is false, value is 1 if 'C' bit of NZCV is 1, else 0.
static SDValue carryFlagToValue(SDValue Flag, EVT VT, SelectionDAG &DAG) {		// If Invert is true, value is 0 if 'C' bit of NZCV is 1, else 1.
		static SDValue carryFlagToValue(SDValue Flag, EVT VT, SelectionDAG &DAG,
		bool Invert) {
		efriedmaUnsubmitted Done Reply Inline Actions The comment here could be improved. Maybe spell out "if Invert is true, value is 0 if 'C' bit is set, and 1 if it is not set". efriedma: The comment here could be improved. Maybe spell out "if Invert is true, value is 0 if 'C' bit…
assert(Flag.getResNo() == 1);		assert(Flag.getResNo() == 1);
SDLoc DL(Flag);		SDLoc DL(Flag);
SDValue Zero = DAG.getConstant(0, DL, VT);		SDValue Zero = DAG.getConstant(0, DL, VT);
SDValue One = DAG.getConstant(1, DL, VT);		SDValue One = DAG.getConstant(1, DL, VT);
SDValue CC = DAG.getConstant(AArch64CC::HS, DL, MVT::i32);		unsigned Cond = Invert ? AArch64CC::LO : AArch64CC::HS;
		SDValue CC = DAG.getConstant(Cond, DL, MVT::i32);
return DAG.getNode(AArch64ISD::CSEL, DL, VT, One, Zero, CC, Flag);		return DAG.getNode(AArch64ISD::CSEL, DL, VT, One, Zero, CC, Flag);
}		}

// Value is 1 if 'V' bit of NZCV is 1, else 0		// Value is 1 if 'V' bit of NZCV is 1, else 0
static SDValue overflowFlagToValue(SDValue Flag, EVT VT, SelectionDAG &DAG) {		static SDValue overflowFlagToValue(SDValue Flag, EVT VT, SelectionDAG &DAG) {
assert(Flag.getResNo() == 1);		assert(Flag.getResNo() == 1);
SDLoc DL(Flag);		SDLoc DL(Flag);
SDValue Zero = DAG.getConstant(0, DL, VT);		SDValue Zero = DAG.getConstant(0, DL, VT);
SDValue One = DAG.getConstant(1, DL, VT);		SDValue One = DAG.getConstant(1, DL, VT);
SDValue CC = DAG.getConstant(AArch64CC::VS, DL, MVT::i32);		SDValue CC = DAG.getConstant(AArch64CC::VS, DL, MVT::i32);
return DAG.getNode(AArch64ISD::CSEL, DL, VT, One, Zero, CC, Flag);		return DAG.getNode(AArch64ISD::CSEL, DL, VT, One, Zero, CC, Flag);
}		}

// This lowering is inefficient, but it will get cleaned up by		// This lowering is inefficient, but it will get cleaned up by
// `foldOverflowCheck`		// `foldOverflowCheck`
static SDValue lowerADDSUBCARRY(SDValue Op, SelectionDAG &DAG, unsigned Opcode,		static SDValue lowerADDSUBCARRY(SDValue Op, SelectionDAG &DAG, unsigned Opcode,
bool IsSigned) {		bool IsSigned) {
EVT VT0 = Op.getValue(0).getValueType();		EVT VT0 = Op.getValue(0).getValueType();
EVT VT1 = Op.getValue(1).getValueType();		EVT VT1 = Op.getValue(1).getValueType();

if (VT0 != MVT::i32 && VT0 != MVT::i64)		if (VT0 != MVT::i32 && VT0 != MVT::i64)
return SDValue();		return SDValue();

		bool InvertCarry = Opcode == AArch64ISD::SBCS;
SDValue OpLHS = Op.getOperand(0);		SDValue OpLHS = Op.getOperand(0);
SDValue OpRHS = Op.getOperand(1);		SDValue OpRHS = Op.getOperand(1);
SDValue OpCarryIn = valueToCarryFlag(Op.getOperand(2), DAG);		SDValue OpCarryIn = valueToCarryFlag(Op.getOperand(2), DAG, InvertCarry);

SDLoc DL(Op);		SDLoc DL(Op);
SDVTList VTs = DAG.getVTList(VT0, VT1);		SDVTList VTs = DAG.getVTList(VT0, VT1);

SDValue Sum = DAG.getNode(Opcode, DL, DAG.getVTList(VT0, MVT::Glue), OpLHS,		SDValue Sum = DAG.getNode(Opcode, DL, DAG.getVTList(VT0, MVT::Glue), OpLHS,
OpRHS, OpCarryIn);		OpRHS, OpCarryIn);

SDValue OutFlag = IsSigned ? overflowFlagToValue(Sum.getValue(1), VT1, DAG)		SDValue OutFlag =
: carryFlagToValue(Sum.getValue(1), VT1, DAG);		IsSigned ? overflowFlagToValue(Sum.getValue(1), VT1, DAG)
		: carryFlagToValue(Sum.getValue(1), VT1, DAG, InvertCarry);

		efriedmaUnsubmitted Not Done Reply Inline Actions It's a bit concerning to me that you're changing OutFlag, but not OpCarryIn. If we're inverting the output flag, don't we need to invert the input flag too? Or is it already getting inverted correctly? It might make sense to also add tests for i256 addition and subtraction. efriedma: It's a bit concerning to me that you're changing OutFlag, but not OpCarryIn. If we're…
		efriedmaUnsubmitted Not Done Reply Inline Actions Just checked; this is what we currently generate for i256 subtraction: subs x0, x0, x4 sbcs x1, x1, x5 cset w8, hs cmp w8, #1 sbcs x2, x2, x6 cset w8, hs cmp w8, #1 sbcs x3, x3, x7 ret I guess maybe the input is already handled correctly. efriedma: Just checked; this is what we currently generate for i256 subtraction: ``` subs x0…
		kazuAuthorUnsubmitted Done Reply Inline Actions It turns out that we have to invert the incoming carry, too. We appear to generate correct code -- `subs` immediately followed by `sbcs`, but that's because wrong semantics is canceled by another piece of wrong semantics. `lowerADDSUBCARRY` generates wrong code, taking in the incoming carry and feeding it to take in the incoming carry without inverting it. Meanwhile, `foldOverflowCheck` expects the wrong code from `lowerADDSUBCARRY` and folds it away. That is, if you disable `foldOverflowCheck` by teaching it to always return `SDValue()`, then you would get wrong code between `subs and` sbcs`. In the latest revision, of the patch I've corrected both -- the treatment of the incoming carry and the folding done in `foldOverflowCheck`. kazu: It turns out that we have to invert the incoming carry, too. We appear to generate correct…
return DAG.getNode(ISD::MERGE_VALUES, DL, VTs, Sum, OutFlag);		return DAG.getNode(ISD::MERGE_VALUES, DL, VTs, Sum, OutFlag);
}		}

static SDValue LowerXALUO(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerXALUO(SDValue Op, SelectionDAG &DAG) {
// Let legalize expand this if it isn't a legal type yet.		// Let legalize expand this if it isn't a legal type yet.
if (!DAG.getTargetLoweringInfo().isTypeLegal(Op.getValueType()))		if (!DAG.getTargetLoweringInfo().isTypeLegal(Op.getValueType()))
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 12,140 Lines • ▼ Show 20 Lines	if (isOneConstant(OpLHS) && isNullConstant(OpRHS))
return CC;		return CC;
if (isNullConstant(OpLHS) && isOneConstant(OpRHS))		if (isNullConstant(OpLHS) && isOneConstant(OpRHS))
return getInvertedCondCode(CC);		return getInvertedCondCode(CC);

return None;		return None;
}		}

// (ADC{S} l r (CMP (CSET HS carry) 1)) => (ADC{S} l r carry)		// (ADC{S} l r (CMP (CSET HS carry) 1)) => (ADC{S} l r carry)
// (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry)		// (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry)
static SDValue foldOverflowCheck(SDNode *Op, SelectionDAG &DAG, bool IsAdd) {		static SDValue foldOverflowCheck(SDNode *Op, SelectionDAG &DAG, bool IsAdd) {
SDValue CmpOp = Op->getOperand(2);		SDValue CmpOp = Op->getOperand(2);
if (!(isCMP(CmpOp) && isOneConstant(CmpOp.getOperand(1))))		if (!isCMP(CmpOp))
		return SDValue();

		if (IsAdd) {
		if (!isOneConstant(CmpOp.getOperand(1)))
return SDValue();		return SDValue();
		} else {
		if (!isNullConstant(CmpOp.getOperand(0)))
		return SDValue();
		}

SDValue CsetOp = CmpOp->getOperand(0);		SDValue CsetOp = CmpOp->getOperand(IsAdd ? 0 : 1);
auto CC = getCSETCondCode(CsetOp);		auto CC = getCSETCondCode(CsetOp);
if (CC != (IsAdd ? AArch64CC::HS : AArch64CC::LO))		if (CC != (IsAdd ? AArch64CC::HS : AArch64CC::LO))
return SDValue();		return SDValue();

return DAG.getNode(Op->getOpcode(), SDLoc(Op), Op->getVTList(),		return DAG.getNode(Op->getOpcode(), SDLoc(Op), Op->getVTList(),
Op->getOperand(0), Op->getOperand(1),		Op->getOperand(0), Op->getOperand(1),
CsetOp.getOperand(3));		CsetOp.getOperand(3));
}		}
▲ Show 20 Lines • Show All 5,571 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/i128-math.ll

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret i128 %1		ret i128 %1
}		}

define { i128, i8 } @u128_checked_sub(i128 %x, i128 %y) {		define { i128, i8 } @u128_checked_sub(i128 %x, i128 %y) {
; CHECK-LABEL: u128_checked_sub:		; CHECK-LABEL: u128_checked_sub:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: subs x0, x0, x2		; CHECK-NEXT: subs x0, x0, x2
; CHECK-NEXT: sbcs x1, x1, x3		; CHECK-NEXT: sbcs x1, x1, x3
; CHECK-NEXT: cset w8, hs		; CHECK-NEXT: cset w8, lo
; CHECK-NEXT: eor w2, w8, #0x1		; CHECK-NEXT: eor w2, w8, #0x1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call { i128, i1 } @llvm.usub.with.overflow.i128(i128 %x, i128 %y)		%1 = tail call { i128, i1 } @llvm.usub.with.overflow.i128(i128 %x, i128 %y)
%2 = extractvalue { i128, i1 } %1, 0		%2 = extractvalue { i128, i1 } %1, 0
%3 = extractvalue { i128, i1 } %1, 1		%3 = extractvalue { i128, i1 } %1, 1
%4 = xor i1 %3, true		%4 = xor i1 %3, true
%5 = zext i1 %4 to i8		%5 = zext i1 %4 to i8
%6 = insertvalue { i128, i8 } undef, i128 %2, 0		%6 = insertvalue { i128, i8 } undef, i128 %2, 0
%7 = insertvalue { i128, i8 } %6, i8 %5, 1		%7 = insertvalue { i128, i8 } %6, i8 %5, 1
ret { i128, i8 } %7		ret { i128, i8 } %7
}		}

define { i128, i8 } @u128_overflowing_sub(i128 %x, i128 %y) {		define { i128, i8 } @u128_overflowing_sub(i128 %x, i128 %y) {
; CHECK-LABEL: u128_overflowing_sub:		; CHECK-LABEL: u128_overflowing_sub:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: subs x0, x0, x2		; CHECK-NEXT: subs x0, x0, x2
; CHECK-NEXT: sbcs x1, x1, x3		; CHECK-NEXT: sbcs x1, x1, x3
; CHECK-NEXT: cset w2, hs		; CHECK-NEXT: cset w2, lo
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call { i128, i1 } @llvm.usub.with.overflow.i128(i128 %x, i128 %y)		%1 = tail call { i128, i1 } @llvm.usub.with.overflow.i128(i128 %x, i128 %y)
%2 = extractvalue { i128, i1 } %1, 0		%2 = extractvalue { i128, i1 } %1, 0
%3 = extractvalue { i128, i1 } %1, 1		%3 = extractvalue { i128, i1 } %1, 1
%4 = zext i1 %3 to i8		%4 = zext i1 %3 to i8
%5 = insertvalue { i128, i8 } undef, i128 %2, 0		%5 = insertvalue { i128, i8 } undef, i128 %2, 0
%6 = insertvalue { i128, i8 } %5, i8 %4, 1		%6 = insertvalue { i128, i8 } %5, i8 %4, 1
ret { i128, i8 } %6		ret { i128, i8 } %6
}		}

define i128 @u128_saturating_sub(i128 %x, i128 %y) {		define i128 @u128_saturating_sub(i128 %x, i128 %y) {
; CHECK-LABEL: u128_saturating_sub:		; CHECK-LABEL: u128_saturating_sub:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: subs x8, x0, x2		; CHECK-NEXT: subs x8, x0, x2
; CHECK-NEXT: sbcs x9, x1, x3		; CHECK-NEXT: sbcs x9, x1, x3
; CHECK-NEXT: cset w10, hs		; CHECK-NEXT: cset w10, lo
; CHECK-NEXT: cmp w10, #0		; CHECK-NEXT: cmp w10, #0
; CHECK-NEXT: csel x0, xzr, x8, ne		; CHECK-NEXT: csel x0, xzr, x8, ne
; CHECK-NEXT: csel x1, xzr, x9, ne		; CHECK-NEXT: csel x1, xzr, x9, ne
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = tail call i128 @llvm.usub.sat.i128(i128 %x, i128 %y)		%1 = tail call i128 @llvm.usub.sat.i128(i128 %x, i128 %y)
ret i128 %1		ret i128 %1
}		}

▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/usub_sat_vec.ll

Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret <8 x i64> %z		ret <8 x i64> %z
}		}

define <2 x i128> @v2i128(<2 x i128> %x, <2 x i128> %y) nounwind {		define <2 x i128> @v2i128(<2 x i128> %x, <2 x i128> %y) nounwind {
; CHECK-LABEL: v2i128:		; CHECK-LABEL: v2i128:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: subs x8, x2, x6		; CHECK-NEXT: subs x8, x2, x6
; CHECK-NEXT: sbcs x9, x3, x7		; CHECK-NEXT: sbcs x9, x3, x7
; CHECK-NEXT: cset w10, hs		; CHECK-NEXT: cset w10, lo
; CHECK-NEXT: cmp w10, #0		; CHECK-NEXT: cmp w10, #0
; CHECK-NEXT: csel x2, xzr, x8, ne		; CHECK-NEXT: csel x2, xzr, x8, ne
; CHECK-NEXT: csel x3, xzr, x9, ne		; CHECK-NEXT: csel x3, xzr, x9, ne
; CHECK-NEXT: subs x8, x0, x4		; CHECK-NEXT: subs x8, x0, x4
; CHECK-NEXT: sbcs x9, x1, x5		; CHECK-NEXT: sbcs x9, x1, x5
; CHECK-NEXT: cset w10, hs		; CHECK-NEXT: cset w10, lo
; CHECK-NEXT: cmp w10, #0		; CHECK-NEXT: cmp w10, #0
; CHECK-NEXT: csel x8, xzr, x8, ne		; CHECK-NEXT: csel x8, xzr, x8, ne
; CHECK-NEXT: csel x1, xzr, x9, ne		; CHECK-NEXT: csel x1, xzr, x9, ne
; CHECK-NEXT: fmov d0, x8		; CHECK-NEXT: fmov d0, x8
; CHECK-NEXT: mov v0.d[1], x1		; CHECK-NEXT: mov v0.d[1], x1
; CHECK-NEXT: fmov x0, d0		; CHECK-NEXT: fmov x0, d0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%z = call <2 x i128> @llvm.usub.sat.v2i128(<2 x i128> %x, <2 x i128> %y)		%z = call <2 x i128> @llvm.usub.sat.v2i128(<2 x i128> %x, <2 x i128> %y)
ret <2 x i128> %z		ret <2 x i128> %z
}		}