This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
8/8
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-xaluo.ll

Differential D121449

[AArch64] Combine ISD::SETCC into AArch64ISD::ANDS
ClosedPublic

Authored by bcl5980 on Mar 11 2022, 1:14 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
samtebbs
jaykang10
david-arm
paulwalker-arm
t.p.northover
sdesmalen
efriedma

Commits

rGdd3b90e4d77b: [AArch64] Combine ISD::SETCC into AArch64ISD::ANDS

Summary

When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false).
ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction.
So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison
Fix https://github.com/llvm/llvm-project/issues/54283

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bcl5980 created this revision.Mar 11 2022, 1:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 1:14 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

bcl5980 requested review of this revision.Mar 11 2022, 1:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 1:14 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

bcl5980 added a reviewer: efriedma.Mar 11 2022, 1:28 AM

Harbormaster completed remote builds in B153737: Diff 414607.Mar 11 2022, 1:56 AM

Can someone help to review this? This is the first time I try to contribute to LLVM so it will be grateful if someone give me any suggestion.

I've added some comments on the patch itself but these are likely redundant based on my top level question, which is whether the optimisation would be better placed in emitComparison which is used in several places including the lowering code for ISD::SETCC. It's responsible for deciding how best to set the flags and already includes emitting AArch64ISD::ANDS so your work feels like a natural extension.

As a side comment the diff itself lacks context and so, for example, things like the name of the changed function is unknown without searching a local checkout. When manually uploading a patch to phabricator if you use git show HEAD -U999999 the patch will include the context, which will be visible on phabricator. Alternatively you can use the arcanist tool to upload patches. See https://llvm.org/docs/Phabricator.html

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17286	Personally I wouldn't bother with factoring out this condition because it makes the diff look bigger than it actually is and causes more indentation. I'd just add your new combine test in isolation. i.e. if (Cond == ISD::SETNE && isOneConstant(RHS) && LHS->getOpcode() == AArch64ISD::CSEL && isNullConstant(LHS->getOperand(0)) && isOneConstant(LHS->getOperand(1)) && LHS->hasOneUse()) {
17307–17308	It's worth restricting this combine until later in the code generation pipeline because introducing target specific nodes (i.e. `AArch64ISD::`) early can result in loosing useful optimisations and/or sometimes introduces legalisation issues. The above block get's away with it because it requires one of its inputs to be a a target specific node and so will naturally only trigger late. I doubt it's necessary to restrict the combine until after legalisation, so it's likely just a case of adding if (DCI.isBeforeLegalize()) return SDValue(); at the top of this function so that you can be sure the types are legal and DAGCombine has had enough chance to do all the target independent combines.
17310–17311	Do you need to descend into the operands of `ISD::SRL`? Does `LHS->getValueType(0)` work? because the input and output types should match.

An alternative suggested is to perhaps keep the DAGCombine but have it just canonicalise the shift pattern to use ISD::AND [1], that way I believe the existing code in emitComparison will do what you need. If that works then that's my preferred option as it keeps things simple and relatively target agnostic.

[1]
setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne

In D121449#3379104, @paulwalker-arm wrote:

An alternative suggested is to perhaps keep the DAGCombine but have it just canonicalise the shift pattern to use ISD::AND [1], that way I believe the existing code in emitComparison will do what you need. If that works then that's my preferred option as it keeps things simple and relatively target agnostic.

[1]
setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne

It looks when I do this in combine , and will be replaced to shift again soon.

Combining: t31: i32 = setcc t14, Constant:i64<0>, setne:ch
Is 0 legal add imm: yes
Creating constant: t34: i64 = Constant<-8192>
Creating new node: t35: i64 = and t2, Constant:i64<-8192>

Combining: t25: ch = setne

Combining: t20: i64 = Constant<0>

Combining: t14: i64 = srl t2, Constant:i64<13>

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17307–17308	It looks ISD::SETCC will lower to AArch64ISD::CSEL in custom legalize. Do we need to add a new function like performCSELCombine to do combine on AArch64ISD::CSEL?

bcl5980 marked an inline comment as not done.Mar 14 2022, 7:55 AM

bcl5980 added inline comments.Mar 14 2022, 8:04 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17307–17308	I'm sorry I find the function performCSELCombine. And another way is combine AArch64ISD::SUBS SUBS (srl x, imm), 0 --> ANDS x, (-1 << imm) which way do you think is better?

Not sure I understand. The combining: .... text just means it's visiting a node. There's no following text to suggest it has created anything new like there is when the AND is created. My guess is that DAGCombine continues to visit the operands of the original node despite it being replace.

I just tried

// setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
if (Cond == ISD::SETNE && isNullConstant(RHS) &&
    LHS->getOpcode() == ISD::SRL && isa<ConstantSDNode>(LHS->getOperand(1)) &&
    LHS->hasOneUse()) {
  EVT TstVT = LHS->getValueType(0);
  if (TstVT == MVT::i32 || TstVT == MVT::i64) {
    uint64_t TstImm = -1ULL << LHS->getConstantOperandVal(1);
    SDValue TST = DAG.getNode(ISD::AND, DL, TstVT, LHS->getOperand(0),
                              DAG.getConstant(TstImm, DL, TstVT));
    return DAG.getNode(ISD::SETCC, DL, VT, TST, RHS, N->getOperand(2));
  }
}

using origin/main from a few minutes ago and I get the same new output for llvm/test/CodeGen/AArch64/arm64-xaluo.ll as you do. Are you seeing something different?

In D121449#3379479, @paulwalker-arm wrote:
Not sure I understand. The combining: .... text just means it's visiting a node. There's no following text to suggest it has created anything new like there is when the AND is created. My guess is that DAGCombine continues to visit the operands of the original node despite it being replace.

I just tried
// setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
if (Cond == ISD::SETNE && isNullConstant(RHS) &&
    LHS->getOpcode() == ISD::SRL && isa<ConstantSDNode>(LHS->getOperand(1)) &&
    LHS->hasOneUse()) {
  EVT TstVT = LHS->getValueType(0);
  if (TstVT == MVT::i32 || TstVT == MVT::i64) {
    uint64_t TstImm = -1ULL << LHS->getConstantOperandVal(1);
    SDValue TST = DAG.getNode(ISD::AND, DL, TstVT, LHS->getOperand(0),
                              DAG.getConstant(TstImm, DL, TstVT));
    return DAG.getNode(ISD::SETCC, DL, VT, TST, RHS, N->getOperand(2));
  }
}
using origin/main from a few minutes ago and I get the same new output for llvm/test/CodeGen/AArch64/arm64-xaluo.ll as you do. Are you seeing something different?

Yeah, it works. Test failed because this code generate tst w8, 0xff00(not 0xffffff00). That is also correct as knownbits is 16 bit for i8 mul.
So the question is which way do you think is better?

performSUBSCombine: subs (srl x, imm), 0 ==> ands x, (-1 << imm)
or
performSETCCCombine: setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne

Both of them keep ISD before legalize.

efriedma added subscribers: craig.topper, Allen.Mar 14 2022, 10:35 AM

Thanks for the investigation. I think the options are:

DAG combine for setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
update emitComparison to replace replace single use srl with ands.

I don't see "performSUBSCombine: subs (srl x, imm), 0 ==> ands x, (-1 << imm)" as an option as the only reason we would see this idiom is when emitComparison emits it.

Of the two approaches (1) is my favourite purely because it tries to canonicalise the DAG, which is always beneficial. I say this because I guess if we're seeing srl as something to support then how far away are we from wanting to support shl and thus I'd rather canonicalise the DAG rather than make emitComparison progressively more complex.

Older pattern: srl x, imm ==> ands x, (-1 << imm)
New pattern: srl x, imm ==> and x, (-1 << imm)

bcl5980 marked 3 inline comments as done.Mar 14 2022, 9:25 PM

Harbormaster completed remote builds in B154244: Diff 415313.Mar 14 2022, 10:48 PM

Allen mentioned this in D121355: [WIP][SelectionDAG] Fold shift constants into cmp.Mar 15 2022, 6:50 PM

bcl5980 edited the summary of this revision. (Show Details)Mar 15 2022, 10:47 PM

paulwalker-arm added inline comments.Mar 16 2022, 10:00 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17392	This test is fragile because for example it will assert when `TstVT` is a scalable vector, whilst also return true for `v2i32`, which I doubt you want. I'm guessing it's only the `isa<ConstantSDNode>(LHS->getOperand(1))` condition that's saving you here. Given you're emitting general ISD nodes do you really care what the type is? If not then perhaps you just need `if (TstVTisScalarInteger())`?

bcl5980 added inline comments.Mar 16 2022, 7:11 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17392	Thanks for the mention. ScalarInterger check is necessary. We can get benifit from SRL pattern when the data type is i128. This check is only used for avoid i128. How about TstVT.getFixedSizeInBits() <= 64 && TstVT.isScalarInteger() ?

paulwalker-arm added inline comments.Mar 17 2022, 11:38 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17392	Works for me but please reverse the tests so that we only call `getFixedSizeInBits` once we know `TstVT` is a scalar integer.

Disable the combination when op type is vector

bcl5980 marked 2 inline comments as done.Mar 17 2022, 6:33 PM

Harbormaster completed remote builds in B154964: Diff 416372.Mar 17 2022, 7:34 PM

paulwalker-arm accepted this revision.Mar 18 2022, 4:22 AM

This revision is now accepted and ready to land.Mar 18 2022, 4:22 AM

Hi paulwalker-arm, thanks for your patience. Since it is my first change, I don't have commit access yet. Could you help to commit it please? Thank you.
user.name: chenglin.bi
user.email: chenglin.bi@cixcomputing.com

Closed by commit rGdd3b90e4d77b: [AArch64] Combine ISD::SETCC into AArch64ISD::ANDS (authored by bcl5980, committed by paulwalker-arm). · Explain WhyMar 19 2022, 6:07 AM

This revision was automatically updated to reflect the committed changes.

paulwalker-arm added a commit: rGdd3b90e4d77b: [AArch64] Combine ISD::SETCC into AArch64ISD::ANDS.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

20 lines

test/

CodeGen/

AArch64/

arm64-xaluo.ll

12 lines

Diff 416692

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,277 Lines • ▼ Show 20 Lines	SDVTList VTs = DAG.getVTList(SubsNode->getValueType(0),
SubsNode->getValueType(1));		SubsNode->getValueType(1));
SDValue Ops[] = { AddValue, SubsNode->getOperand(1) };		SDValue Ops[] = { AddValue, SubsNode->getOperand(1) };

SDValue NewValue = DAG.getNode(CondOpcode, SDLoc(SubsNode), VTs, Ops);		SDValue NewValue = DAG.getNode(CondOpcode, SDLoc(SubsNode), VTs, Ops);
DAG.ReplaceAllUsesWith(SubsNode, NewValue.getNode());		DAG.ReplaceAllUsesWith(SubsNode, NewValue.getNode());

return SDValue(N, 0);		return SDValue(N, 0);
}		}

		paulwalker-armUnsubmitted Done Reply Inline Actions Personally I wouldn't bother with factoring out this condition because it makes the diff look bigger than it actually is and causes more indentation. I'd just add your new combine test in isolation. i.e. if (Cond == ISD::SETNE && isOneConstant(RHS) && LHS->getOpcode() == AArch64ISD::CSEL && isNullConstant(LHS->getOperand(0)) && isOneConstant(LHS->getOperand(1)) && LHS->hasOneUse()) { paulwalker-arm: Personally I wouldn't bother with factoring out this condition because it makes the diff look…
// Optimize compare with zero and branch.		// Optimize compare with zero and branch.
static SDValue performBRCONDCombine(SDNode *N,		static SDValue performBRCONDCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
// Speculation tracking/SLH assumes that optimized TB(N)Z/CB(N)Z instructions		// Speculation tracking/SLH assumes that optimized TB(N)Z/CB(N)Z instructions
// will not be produced, as they are conditional branch instructions that do		// will not be produced, as they are conditional branch instructions that do
// not set flags.		// not set flags.
if (MF.getFunction().hasFnAttribute(Attribute::SpeculativeLoadHardening))		if (MF.getFunction().hasFnAttribute(Attribute::SpeculativeLoadHardening))
return SDValue();		return SDValue();

if (SDValue NV = performCONDCombine(N, DCI, DAG, 2, 3))		if (SDValue NV = performCONDCombine(N, DCI, DAG, 2, 3))
N = NV.getNode();		N = NV.getNode();
SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
SDValue Dest = N->getOperand(1);		SDValue Dest = N->getOperand(1);
SDValue CCVal = N->getOperand(2);		SDValue CCVal = N->getOperand(2);
SDValue Cmp = N->getOperand(3);		SDValue Cmp = N->getOperand(3);

assert(isa<ConstantSDNode>(CCVal) && "Expected a ConstantSDNode here!");		assert(isa<ConstantSDNode>(CCVal) && "Expected a ConstantSDNode here!");
unsigned CC = cast<ConstantSDNode>(CCVal)->getZExtValue();		unsigned CC = cast<ConstantSDNode>(CCVal)->getZExtValue();
if (CC != AArch64CC::EQ && CC != AArch64CC::NE)		if (CC != AArch64CC::EQ && CC != AArch64CC::NE)
return SDValue();		return SDValue();
		paulwalker-armUnsubmitted Done Reply Inline Actions It's worth restricting this combine until later in the code generation pipeline because introducing target specific nodes (i.e. `AArch64ISD::`) early can result in loosing useful optimisations and/or sometimes introduces legalisation issues. The above block get's away with it because it requires one of its inputs to be a a target specific node and so will naturally only trigger late. I doubt it's necessary to restrict the combine until after legalisation, so it's likely just a case of adding if (DCI.isBeforeLegalize()) return SDValue(); at the top of this function so that you can be sure the types are legal and DAGCombine has had enough chance to do all the target independent combines. paulwalker-arm: It's worth restricting this combine until later in the code generation pipeline because…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions It looks ISD::SETCC will lower to AArch64ISD::CSEL in custom legalize. Do we need to add a new function like performCSELCombine to do combine on AArch64ISD::CSEL? bcl5980: It looks ISD::SETCC will lower to AArch64ISD::CSEL in custom legalize. Do we need to add a new…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions I'm sorry I find the function performCSELCombine. And another way is combine AArch64ISD::SUBS SUBS (srl x, imm), 0 --> ANDS x, (-1 << imm) which way do you think is better? bcl5980: I'm sorry I find the function performCSELCombine. And another way is combine AArch64ISD::SUBS…

unsigned CmpOpc = Cmp.getOpcode();		unsigned CmpOpc = Cmp.getOpcode();
if (CmpOpc != AArch64ISD::ADDS && CmpOpc != AArch64ISD::SUBS)		if (CmpOpc != AArch64ISD::ADDS && CmpOpc != AArch64ISD::SUBS)
		paulwalker-armUnsubmitted Done Reply Inline Actions Do you need to descend into the operands of `ISD::SRL`? Does `LHS->getValueType(0)` work? because the input and output types should match. paulwalker-arm: Do you need to descend into the operands of `ISD::SRL`? Does `LHS->getValueType(0)` work?
return SDValue();		return SDValue();

// Only attempt folding if there is only one use of the flag and no use of the		// Only attempt folding if there is only one use of the flag and no use of the
// value.		// value.
if (!Cmp->hasNUsesOfValue(0, 0) \|\| !Cmp->hasNUsesOfValue(1, 1))		if (!Cmp->hasNUsesOfValue(0, 0) \|\| !Cmp->hasNUsesOfValue(1, 1))
return SDValue();		return SDValue();

SDValue LHS = Cmp.getOperand(0);		SDValue LHS = Cmp.getOperand(0);
Show All 38 Lines	static SDValue performCSELCombine(SDNode *N,
return performCONDCombine(N, DCI, DAG, 2, 3);		return performCONDCombine(N, DCI, DAG, 2, 3);
}		}

static SDValue performSETCCCombine(SDNode *N, SelectionDAG &DAG) {		static SDValue performSETCCCombine(SDNode *N, SelectionDAG &DAG) {
assert(N->getOpcode() == ISD::SETCC && "Unexpected opcode!");		assert(N->getOpcode() == ISD::SETCC && "Unexpected opcode!");
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);		SDValue RHS = N->getOperand(1);
ISD::CondCode Cond = cast<CondCodeSDNode>(N->getOperand(2))->get();		ISD::CondCode Cond = cast<CondCodeSDNode>(N->getOperand(2))->get();
		SDLoc DL(N);
		EVT VT = N->getValueType(0);

// setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X		// setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X
if (Cond == ISD::SETNE && isOneConstant(RHS) &&		if (Cond == ISD::SETNE && isOneConstant(RHS) &&
LHS->getOpcode() == AArch64ISD::CSEL &&		LHS->getOpcode() == AArch64ISD::CSEL &&
isNullConstant(LHS->getOperand(0)) && isOneConstant(LHS->getOperand(1)) &&		isNullConstant(LHS->getOperand(0)) && isOneConstant(LHS->getOperand(1)) &&
LHS->hasOneUse()) {		LHS->hasOneUse()) {
SDLoc DL(N);

// Invert CSEL's condition.		// Invert CSEL's condition.
auto *OpCC = cast<ConstantSDNode>(LHS.getOperand(2));		auto *OpCC = cast<ConstantSDNode>(LHS.getOperand(2));
auto OldCond = static_cast<AArch64CC::CondCode>(OpCC->getZExtValue());		auto OldCond = static_cast<AArch64CC::CondCode>(OpCC->getZExtValue());
auto NewCond = getInvertedCondCode(OldCond);		auto NewCond = getInvertedCondCode(OldCond);

// csel 0, 1, !cond, X		// csel 0, 1, !cond, X
SDValue CSEL =		SDValue CSEL =
DAG.getNode(AArch64ISD::CSEL, DL, LHS.getValueType(), LHS.getOperand(0),		DAG.getNode(AArch64ISD::CSEL, DL, LHS.getValueType(), LHS.getOperand(0),
LHS.getOperand(1), DAG.getConstant(NewCond, DL, MVT::i32),		LHS.getOperand(1), DAG.getConstant(NewCond, DL, MVT::i32),
LHS.getOperand(3));		LHS.getOperand(3));
return DAG.getZExtOrTrunc(CSEL, DL, N->getValueType(0));		return DAG.getZExtOrTrunc(CSEL, DL, VT);
		}

		// setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
		if (Cond == ISD::SETNE && isNullConstant(RHS) &&
		LHS->getOpcode() == ISD::SRL && isa<ConstantSDNode>(LHS->getOperand(1)) &&
		LHS->hasOneUse()) {
		EVT TstVT = LHS->getValueType(0);
		if (TstVT.isScalarInteger() && TstVT.getFixedSizeInBits() <= 64) {
		paulwalker-armUnsubmitted Done Reply Inline Actions This test is fragile because for example it will assert when `TstVT` is a scalable vector, whilst also return true for `v2i32`, which I doubt you want. I'm guessing it's only the `isa<ConstantSDNode>(LHS->getOperand(1))` condition that's saving you here. Given you're emitting general ISD nodes do you really care what the type is? If not then perhaps you just need `if (TstVTisScalarInteger())`? paulwalker-arm: This test is fragile because for example it will assert when `TstVT` is a scalable vector…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Thanks for the mention. ScalarInterger check is necessary. We can get benifit from SRL pattern when the data type is i128. This check is only used for avoid i128. How about TstVT.getFixedSizeInBits() <= 64 && TstVT.isScalarInteger() ? bcl5980: Thanks for the mention. ScalarInterger check is necessary. We can get benifit from SRL pattern…
		paulwalker-armUnsubmitted Done Reply Inline Actions Works for me but please reverse the tests so that we only call `getFixedSizeInBits` once we know `TstVT` is a scalar integer. paulwalker-arm: Works for me but please reverse the tests so that we only call `getFixedSizeInBits` once we…
		// this pattern will get better opt in emitComparison
		uint64_t TstImm = -1ULL << LHS->getConstantOperandVal(1);
		SDValue TST = DAG.getNode(ISD::AND, DL, TstVT, LHS->getOperand(0),
		DAG.getConstant(TstImm, DL, TstVT));
		return DAG.getNode(ISD::SETCC, DL, VT, TST, RHS, N->getOperand(2));
		}
}		}

return SDValue();		return SDValue();
}		}

// Combines for S forms of generic opcodes (AArch64ISD::ANDS into ISD::AND for		// Combines for S forms of generic opcodes (AArch64ISD::ANDS into ISD::AND for
// example). NOTE: This could be used for ADDS and SUBS too, if we can find test		// example). NOTE: This could be used for ADDS and SUBS too, if we can find test
// cases.		// cases.
▲ Show 20 Lines • Show All 3,388 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-xaluo.ll

	Show First 20 Lines • Show All 1,841 Lines • ▼ Show 20 Lines


	define i8 @umulo.selectboth.i8(i8 %a, i8 %b) {			define i8 @umulo.selectboth.i8(i8 %a, i8 %b) {
	; SDAG-LABEL: umulo.selectboth.i8:			; SDAG-LABEL: umulo.selectboth.i8:
	; SDAG: // %bb.0: // %entry			; SDAG: // %bb.0: // %entry
	; SDAG-NEXT: and w8, w1, #0xff			; SDAG-NEXT: and w8, w1, #0xff
	; SDAG-NEXT: and w9, w0, #0xff			; SDAG-NEXT: and w9, w0, #0xff
	; SDAG-NEXT: mul w8, w9, w8			; SDAG-NEXT: mul w8, w9, w8
	; SDAG-NEXT: lsr w9, w8, #8
	; SDAG-NEXT: cmp w9, #0
	; SDAG-NEXT: mov w9, #10			; SDAG-NEXT: mov w9, #10
				; SDAG-NEXT: tst w8, #0xff00
	; SDAG-NEXT: csel w0, w8, w9, ne			; SDAG-NEXT: csel w0, w8, w9, ne
	; SDAG-NEXT: ret			; SDAG-NEXT: ret
	;			;
	; FAST-LABEL: umulo.selectboth.i8:			; FAST-LABEL: umulo.selectboth.i8:
	; FAST: // %bb.0: // %entry			; FAST: // %bb.0: // %entry
	; FAST-NEXT: and w8, w1, #0xff			; FAST-NEXT: and w8, w1, #0xff
	; FAST-NEXT: and w9, w0, #0xff			; FAST-NEXT: and w9, w0, #0xff
	; FAST-NEXT: mul w8, w9, w8			; FAST-NEXT: mul w8, w9, w8
	; FAST-NEXT: lsr w9, w8, #8
	; FAST-NEXT: cmp w9, #0
	; FAST-NEXT: mov w9, #10			; FAST-NEXT: mov w9, #10
				; FAST-NEXT: tst w8, #0xff00
	; FAST-NEXT: csel w0, w8, w9, ne			; FAST-NEXT: csel w0, w8, w9, ne
	; FAST-NEXT: ret			; FAST-NEXT: ret
	;			;
	; GISEL-LABEL: umulo.selectboth.i8:			; GISEL-LABEL: umulo.selectboth.i8:
	; GISEL: // %bb.0: // %entry			; GISEL: // %bb.0: // %entry
	; GISEL-NEXT: and w8, w0, #0xff			; GISEL-NEXT: and w8, w0, #0xff
	; GISEL-NEXT: and w9, w1, #0xff			; GISEL-NEXT: and w9, w1, #0xff
	; GISEL-NEXT: mul w8, w8, w9			; GISEL-NEXT: mul w8, w8, w9
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	}			}

	define i16 @umulo.selectboth.i16(i16 %a, i16 %b) {			define i16 @umulo.selectboth.i16(i16 %a, i16 %b) {
	; SDAG-LABEL: umulo.selectboth.i16:			; SDAG-LABEL: umulo.selectboth.i16:
	; SDAG: // %bb.0: // %entry			; SDAG: // %bb.0: // %entry
	; SDAG-NEXT: and w8, w1, #0xffff			; SDAG-NEXT: and w8, w1, #0xffff
	; SDAG-NEXT: and w9, w0, #0xffff			; SDAG-NEXT: and w9, w0, #0xffff
	; SDAG-NEXT: mul w8, w9, w8			; SDAG-NEXT: mul w8, w9, w8
	; SDAG-NEXT: lsr w9, w8, #16
	; SDAG-NEXT: cmp w9, #0
	; SDAG-NEXT: mov w9, #10			; SDAG-NEXT: mov w9, #10
				; SDAG-NEXT: tst w8, #0xffff0000
	; SDAG-NEXT: csel w0, w8, w9, ne			; SDAG-NEXT: csel w0, w8, w9, ne
	; SDAG-NEXT: ret			; SDAG-NEXT: ret
	;			;
	; FAST-LABEL: umulo.selectboth.i16:			; FAST-LABEL: umulo.selectboth.i16:
	; FAST: // %bb.0: // %entry			; FAST: // %bb.0: // %entry
	; FAST-NEXT: and w8, w1, #0xffff			; FAST-NEXT: and w8, w1, #0xffff
	; FAST-NEXT: and w9, w0, #0xffff			; FAST-NEXT: and w9, w0, #0xffff
	; FAST-NEXT: mul w8, w9, w8			; FAST-NEXT: mul w8, w9, w8
	; FAST-NEXT: lsr w9, w8, #16
	; FAST-NEXT: cmp w9, #0
	; FAST-NEXT: mov w9, #10			; FAST-NEXT: mov w9, #10
				; FAST-NEXT: tst w8, #0xffff0000
	; FAST-NEXT: csel w0, w8, w9, ne			; FAST-NEXT: csel w0, w8, w9, ne
	; FAST-NEXT: ret			; FAST-NEXT: ret
	;			;
	; GISEL-LABEL: umulo.selectboth.i16:			; GISEL-LABEL: umulo.selectboth.i16:
	; GISEL: // %bb.0: // %entry			; GISEL: // %bb.0: // %entry
	; GISEL-NEXT: and w8, w0, #0xffff			; GISEL-NEXT: and w8, w0, #0xffff
	; GISEL-NEXT: and w9, w1, #0xffff			; GISEL-NEXT: and w9, w1, #0xffff
	; GISEL-NEXT: mul w8, w8, w9			; GISEL-NEXT: mul w8, w8, w9
	▲ Show 20 Lines • Show All 735 Lines • Show Last 20 Lines