This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/2
DAGCombiner.cpp
-
TargetLowering.cpp
-
Target/ARM/
-
ARM/
-
ARMISelLowering.cpp
-
test/CodeGen/Thumb2/
-
CodeGen/
-
Thumb2/
-
mve-vabdus.ll
-
mve-vhadd.ll
-
mve-vmulh.ll

Differential D119075

[DAGCombine][ARM] Custom lower smaller-than-legal MULH/AVG/ABD
Needs ReviewPublic

Authored by dmgreen on Feb 6 2022, 2:06 AM.

Download Raw Diff

Details

Reviewers

RKSimon
craig.topper
efriedma
samtebbs

Summary

MVE only has 128bit legal vectors, no 64bit vectors. There are a number of combines for nodes (MULH/AVG/ABD) that are beneficial for these smaller-than-legal vectors, and often created by the vectorizers, but are not currently transformed. There is no way to tell the target independent dag-combiner that it should, allowing the ARM backend to legalize them.

This changes the legality check in those nodes from TLI.isOperationLegalOrCustom(Opc, VT) (which inherently checks isTypeLegal(VT)) to TLI.isOperationLegal(Opc, VT) || TLI.isOperationCustom(Opc, VT), which allows the backend to mark the nodes as Custom for illegal types, legalising the nodes as it requires. The actual legalisation on MVE uses any_extends and vector casts to perform the operation on a legal vector, truncating the result back to the original type.

There may be other ways to do this - let me know if there is a better way. We can do the same for AArch64 for small types but I've not done that in this same patch.

Diff Detail

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,050 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test

Event Timeline

dmgreen created this revision.Feb 6 2022, 2:06 AM

Herald added subscribers: ecnelises, hiraditya, kristof.beyls. · View Herald TranscriptFeb 6 2022, 2:06 AM

dmgreen requested review of this revision.Feb 6 2022, 2:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2022, 2:06 AM

Harbormaster completed remote builds in B147812: Diff 406238.Feb 6 2022, 2:06 AM

dmgreen added a parent revision: D106238: [ARM] MVE hadd and rhadd.Feb 6 2022, 2:07 AM

Why not handle this generically - accept AVG patterns of any type pre-legalization and expand back during legalization if necessary?

In D119075#3299246, @RKSimon wrote:

Why not handle this generically - accept AVG patterns of any type pre-legalization and expand back during legalization if necessary?

Hmm. Won't that be quite inefficient? Creating nodes for architectures that don't even expect to use them? A mulh for example - I wouldn't expect the generic DAG combiner to create i8 mulh on Arm targets - it's not something it will ever need, and I feel it would just get in the way of the optimisations it is trying to make.

Fair enough - I was mainly thinking in terms of the AVG opcodes, but I can understand that it might cause other issues.

efriedma added inline comments.Feb 7 2022, 10:26 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9454–9457	I think the predicate you actually need here is something like `!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT) && (LegalTypes \|\| !TLI.isOperationCustom(MulhOpcode, NarrowVT))`. i.e. before type legalization, allow custom lowering. After type legalization, only allow custom lowering if the type is legal. Constructing an operator with an illegal type after type legalization is likely to crash the compiler.

dmgreen mentioned this in D115739: [SVE][DAGCombiner] Enable ISD::ABDS and ISD::ABDU for SVE..Feb 8 2022, 6:43 AM

In D119075#3301157, @RKSimon wrote:

Fair enough - I was mainly thinking in terms of the AVG opcodes, but I can understand that it might cause other issues.

I often find myself fighting against DAG combines more than I would like. I am a fan of more control from the target.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9454–9457	Thanks for the suggestion.

Harbormaster completed remote builds in B149007: Diff 407913.Feb 11 2022, 9:27 AM

dmgreen mentioned this in D119556: [AArch64] Custom lower smaller than legal abd/hadd.Feb 11 2022, 9:34 AM

dmgreen added a child revision: D119556: [AArch64] Custom lower smaller than legal abd/hadd.

I'd like to avoid target-specific custom lowering here, i.e. copy-pasting LowerBinopWithBitcast into every target. A few possible alternatives.

Make the combine detect whatever pattern shows up after type legalization, and use the legal nodes. Not sure how hard this is; I guess type legalization splits the nodes in ways that make it hard to detect the relevant pattern?
Shove something equivalent to LowerBinopWithBitcast directly into the relevant DAGCombines. I guess the difficulty here is mostly that there isn't any target-independent equivalent to VECTOR_REG_CAST?
Teach type legalization to legalize these nodes, then make DAGCombine produce illegal nodes if it predicts the legalized result is cheap. (Not sure this approach is best, though; "predicting" is sort of hard, and we don't really want to produce these nodes if they aren't going to lower to a single instruction.)

Rebase the current patch. I would like to try and get something like this in if we can.

Unfortunately you seem to have given some pretty good reasons why the suggestions you made will not easily work. The lowering is target specific to uses the NVCast. I was attempting to make use of larger (potentially legal) hadds in combineShiftToAVG, but x86 wants to widen, not promote, and really wants to use the existing custom lowering it has. It would be better for Arm to use extend(hadd) than hadd(extend, extend), but as far as I understand that needs NVCast to be correct for BE.

Any suggestions on how to proceed?

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 2:41 AM

Herald added a subscriber: StephenFan. · View Herald Transcript

Harbormaster completed remote builds in B203296: Diff 483097.Dec 15 2022, 4:45 AM

dmgreen mentioned this in D142288: [X86] Add basic vector handling for ISD::ABDS/ABDU (absolute difference) nodes.Jan 29 2023, 3:38 AM

dmgreen mentioned this in D148229: [DAGCombine][AArch64][CodeGen] Allow tranformable vectors to a legal for MULH lowering and use SVE's MULH for fixed vector types..Apr 17 2023, 8:27 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

16 lines

TargetLowering.cpp

14 lines

Target/

ARM/

ARMISelLowering.cpp

45 lines

test/

CodeGen/

Thumb2/

mve-vabdus.ll

33 lines

mve-vhadd.ll

38 lines

mve-vmulh.ll

29 lines

Diff 483097

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,353 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSHL(SDNode *N) {
return SDValue();		return SDValue();
}		}

// Transform a right shift of a multiply into a multiply-high.		// Transform a right shift of a multiply into a multiply-high.
// Examples:		// Examples:
// (srl (mul (zext i32:$a to i64), (zext i32:$a to i64)), 32) -> (mulhu $a, $b)		// (srl (mul (zext i32:$a to i64), (zext i32:$a to i64)), 32) -> (mulhu $a, $b)
// (sra (mul (sext i32:$a to i64), (sext i32:$a to i64)), 32) -> (mulhs $a, $b)		// (sra (mul (sext i32:$a to i64), (sext i32:$a to i64)), 32) -> (mulhs $a, $b)
static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,		static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
const TargetLowering &TLI) {		const TargetLowering &TLI, bool LegalTypes) {
assert((N->getOpcode() == ISD::SRL \|\| N->getOpcode() == ISD::SRA) &&		assert((N->getOpcode() == ISD::SRL \|\| N->getOpcode() == ISD::SRA) &&
"SRL or SRA node is required here!");		"SRL or SRA node is required here!");

// Check the shift amount. Proceed with the transformation if the shift		// Check the shift amount. Proceed with the transformation if the shift
// amount is constant.		// amount is constant.
ConstantSDNode *ShiftAmtSrc = isConstOrConstSplat(N->getOperand(1));		ConstantSDNode *ShiftAmtSrc = isConstOrConstSplat(N->getOperand(1));
if (!ShiftAmtSrc)		if (!ShiftAmtSrc)
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
unsigned ShiftAmt = ShiftAmtSrc->getZExtValue();		unsigned ShiftAmt = ShiftAmtSrc->getZExtValue();
if (ShiftAmt != NarrowVTSize)		if (ShiftAmt != NarrowVTSize)
return SDValue();		return SDValue();

// If the operation feeding into the MUL is a sign extend (sext),		// If the operation feeding into the MUL is a sign extend (sext),
// we use mulhs. Othewise, zero extends (zext) use mulhu.		// we use mulhs. Othewise, zero extends (zext) use mulhu.
unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;		unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

// Combine to mulh if mulh is legal/custom for the narrow type on the target.		// Combine to mulh if mulh is legal/custom for the narrow type on the target.
if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT))		if (!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT) &&
		(LegalTypes \|\| !TLI.isOperationCustom(MulhOpcode, NarrowVT)))
return SDValue();		return SDValue();
		efriedmaUnsubmitted Not Done Reply Inline Actions I think the predicate you actually need here is something like `!TLI.isOperationLegalOrCustom(MulhOpcode, NarrowVT) && (LegalTypes \|\| !TLI.isOperationCustom(MulhOpcode, NarrowVT))`. i.e. before type legalization, allow custom lowering. After type legalization, only allow custom lowering if the type is legal. Constructing an operator with an illegal type after type legalization is likely to crash the compiler. efriedma: I think the predicate you actually need here is something like `!TLI.isOperationLegalOrCustom…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Thanks for the suggestion. dmgreen: Thanks for the suggestion.

SDValue Result =		SDValue Result =
DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0), MulhRightOp);		DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0), MulhRightOp);
return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT)		return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT)
: DAG.getZExtOrTrunc(Result, DL, WideVT));		: DAG.getZExtOrTrunc(Result, DL, WideVT));
}		}

SDValue DAGCombiner::visitSRA(SDNode *N) {		SDValue DAGCombiner::visitSRA(SDNode *N) {
▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	if (DAG.SignBitIsZero(N0))
return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0, N1);		return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0, N1);

if (N1C && !N1C->isOpaque())		if (N1C && !N1C->isOpaque())
if (SDValue NewSRA = visitShiftByConstant(N))		if (SDValue NewSRA = visitShiftByConstant(N))
return NewSRA;		return NewSRA;

// Try to transform this shift into a multiply-high if		// Try to transform this shift into a multiply-high if
// it matches the appropriate pattern detected in combineShiftToMULH.		// it matches the appropriate pattern detected in combineShiftToMULH.
if (SDValue MULH = combineShiftToMULH(N, DAG, TLI))		if (SDValue MULH = combineShiftToMULH(N, DAG, TLI, LegalTypes))
return MULH;		return MULH;

// Attempt to convert a sra of a load into a narrower sign-extending load.		// Attempt to convert a sra of a load into a narrower sign-extending load.
if (SDValue NarrowLoad = reduceLoadWidth(N))		if (SDValue NarrowLoad = reduceLoadWidth(N))
return NarrowLoad;		return NarrowLoad;

return SDValue();		return SDValue();
}		}
▲ Show 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	else if (Use->getOpcode() == ISD::TRUNCATE && Use->hasOneUse()) {
Use = *Use->use_begin();		Use = *Use->use_begin();
if (Use->getOpcode() == ISD::BRCOND)		if (Use->getOpcode() == ISD::BRCOND)
AddToWorklist(Use);		AddToWorklist(Use);
}		}
}		}

// Try to transform this shift into a multiply-high if		// Try to transform this shift into a multiply-high if
// it matches the appropriate pattern detected in combineShiftToMULH.		// it matches the appropriate pattern detected in combineShiftToMULH.
if (SDValue MULH = combineShiftToMULH(N, DAG, TLI))		if (SDValue MULH = combineShiftToMULH(N, DAG, TLI, LegalTypes))
return MULH;		return MULH;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFunnelShift(SDNode *N) {		SDValue DAGCombiner::visitFunnelShift(SDNode *N) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSHLSAT(SDNode *N) {

return SDValue();		return SDValue();
}		}

// Given a ABS node, detect the following pattern:		// Given a ABS node, detect the following pattern:
// (ABS (SUB (EXTEND a), (EXTEND b))).		// (ABS (SUB (EXTEND a), (EXTEND b))).
// Generates UABD/SABD instruction.		// Generates UABD/SABD instruction.
static SDValue combineABSToABD(SDNode *N, SelectionDAG &DAG,		static SDValue combineABSToABD(SDNode *N, SelectionDAG &DAG,
const TargetLowering &TLI) {		const TargetLowering &TLI, bool LegalTypes) {
SDValue AbsOp1 = N->getOperand(0);		SDValue AbsOp1 = N->getOperand(0);
SDValue Op0, Op1;		SDValue Op0, Op1;

if (AbsOp1.getOpcode() != ISD::SUB)		if (AbsOp1.getOpcode() != ISD::SUB)
return SDValue();		return SDValue();

Op0 = AbsOp1.getOperand(0);		Op0 = AbsOp1.getOperand(0);
Op1 = AbsOp1.getOperand(1);		Op1 = AbsOp1.getOperand(1);

unsigned Opc0 = Op0.getOpcode();		unsigned Opc0 = Op0.getOpcode();
// Check if the operands of the sub are (zero\|sign)-extended.		// Check if the operands of the sub are (zero\|sign)-extended.
if (Opc0 != Op1.getOpcode() \|\|		if (Opc0 != Op1.getOpcode() \|\|
(Opc0 != ISD::ZERO_EXTEND && Opc0 != ISD::SIGN_EXTEND))		(Opc0 != ISD::ZERO_EXTEND && Opc0 != ISD::SIGN_EXTEND))
return SDValue();		return SDValue();

EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT VT1 = Op0.getOperand(0).getValueType();		EVT VT1 = Op0.getOperand(0).getValueType();
EVT VT2 = Op1.getOperand(0).getValueType();		EVT VT2 = Op1.getOperand(0).getValueType();
unsigned ABDOpcode = (Opc0 == ISD::SIGN_EXTEND) ? ISD::ABDS : ISD::ABDU;		unsigned ABDOpcode = (Opc0 == ISD::SIGN_EXTEND) ? ISD::ABDS : ISD::ABDU;

// fold abs(sext(x) - sext(y)) -> zext(abds(x, y))		// fold abs(sext(x) - sext(y)) -> zext(abds(x, y))
// fold abs(zext(x) - zext(y)) -> zext(abdu(x, y))		// fold abs(zext(x) - zext(y)) -> zext(abdu(x, y))
// NOTE: Extensions must be equivalent.		// NOTE: Extensions must be equivalent.
if (VT1 == VT2 && TLI.isOperationLegalOrCustom(ABDOpcode, VT1)) {		if (VT1 == VT2 && (TLI.isOperationLegalOrCustom(ABDOpcode, VT1) \|\|
		(!LegalTypes && TLI.isOperationCustom(ABDOpcode, VT1)))) {
Op0 = Op0.getOperand(0);		Op0 = Op0.getOperand(0);
Op1 = Op1.getOperand(0);		Op1 = Op1.getOperand(0);
SDValue ABD = DAG.getNode(ABDOpcode, SDLoc(N), VT1, Op0, Op1);		SDValue ABD = DAG.getNode(ABDOpcode, SDLoc(N), VT1, Op0, Op1);
return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, ABD);		return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, ABD);
}		}

// fold abs(sext(x) - sext(y)) -> abds(sext(x), sext(y))		// fold abs(sext(x) - sext(y)) -> abds(sext(x), sext(y))
// fold abs(zext(x) - zext(y)) -> abdu(zext(x), zext(y))		// fold abs(zext(x) - zext(y)) -> abdu(zext(x), zext(y))
Show All 12 Lines	if (DAG.isConstantIntBuildVectorOrConstantInt(N0))
return DAG.getNode(ISD::ABS, SDLoc(N), VT, N0);		return DAG.getNode(ISD::ABS, SDLoc(N), VT, N0);
// fold (abs (abs x)) -> (abs x)		// fold (abs (abs x)) -> (abs x)
if (N0.getOpcode() == ISD::ABS)		if (N0.getOpcode() == ISD::ABS)
return N0;		return N0;
// fold (abs x) -> x iff not-negative		// fold (abs x) -> x iff not-negative
if (DAG.SignBitIsZero(N0))		if (DAG.SignBitIsZero(N0))
return N0;		return N0;

if (SDValue ABD = combineABSToABD(N, DAG, TLI))		if (SDValue ABD = combineABSToABD(N, DAG, TLI, LegalTypes))
return ABD;		return ABD;

// fold (abs (sign_extend_inreg x)) -> (zero_extend (abs (truncate x)))		// fold (abs (sign_extend_inreg x)) -> (zero_extend (abs (truncate x)))
// iff zero_extend/truncate are free.		// iff zero_extend/truncate are free.
if (N0.getOpcode() == ISD::SIGN_EXTEND_INREG) {		if (N0.getOpcode() == ISD::SIGN_EXTEND_INREG) {
EVT ExtVT = cast<VTSDNode>(N0.getOperand(1))->getVT();		EVT ExtVT = cast<VTSDNode>(N0.getOperand(1))->getVT();
if (TLI.isTruncateFree(VT, ExtVT) && TLI.isZExtFree(ExtVT, VT) &&		if (TLI.isTruncateFree(VT, ExtVT) && TLI.isZExtFree(ExtVT, VT) &&
TLI.isTypeDesirableForOp(ISD::ABS, ExtVT) &&		TLI.isTypeDesirableForOp(ISD::ABS, ExtVT) &&
▲ Show 20 Lines • Show All 15,432 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 926 Lines • ▼ Show 20 Lines	SDValue TargetLowering::SimplifyMultipleUseDemandedVectorElts(
unsigned Depth) const {		unsigned Depth) const {
APInt DemandedBits = APInt::getAllOnes(Op.getScalarValueSizeInBits());		APInt DemandedBits = APInt::getAllOnes(Op.getScalarValueSizeInBits());
return SimplifyMultipleUseDemandedBits(Op, DemandedBits, DemandedElts, DAG,		return SimplifyMultipleUseDemandedBits(Op, DemandedBits, DemandedElts, DAG,
Depth);		Depth);
}		}

// Attempt to form ext(avgfloor(A, B)) from shr(add(ext(A), ext(B)), 1).		// Attempt to form ext(avgfloor(A, B)) from shr(add(ext(A), ext(B)), 1).
// or to form ext(avgceil(A, B)) from shr(add(ext(A), ext(B), 1), 1).		// or to form ext(avgceil(A, B)) from shr(add(ext(A), ext(B), 1), 1).
static SDValue combineShiftToAVG(SDValue Op, SelectionDAG &DAG,		static SDValue combineShiftToAVG(SDValue Op,
		TargetLowering::TargetLoweringOpt &TLO,
const TargetLowering &TLI,		const TargetLowering &TLI,
const APInt &DemandedBits,		const APInt &DemandedBits,
const APInt &DemandedElts,		const APInt &DemandedElts, unsigned Depth) {
unsigned Depth) {		SelectionDAG &DAG = TLO.DAG;
assert((Op.getOpcode() == ISD::SRL \|\| Op.getOpcode() == ISD::SRA) &&		assert((Op.getOpcode() == ISD::SRL \|\| Op.getOpcode() == ISD::SRA) &&
"SRL or SRA node is required here!");		"SRL or SRA node is required here!");
// Is the right shift using an immediate value of 1?		// Is the right shift using an immediate value of 1?
ConstantSDNode *N1C = isConstOrConstSplat(Op.getOperand(1), DemandedElts);		ConstantSDNode *N1C = isConstOrConstSplat(Op.getOperand(1), DemandedElts);
if (!N1C \|\| !N1C->isOne())		if (!N1C \|\| !N1C->isOne())
return SDValue();		return SDValue();

// We are looking for an avgfloor		// We are looking for an avgfloor
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	static SDValue combineShiftToAVG(SDValue Op,
// operation, given the original type size and the number of known sign/zero		// operation, given the original type size and the number of known sign/zero
// bits.		// bits.
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
unsigned MinWidth =		unsigned MinWidth =
std::max<unsigned>(VT.getScalarSizeInBits() - KnownBits, 8);		std::max<unsigned>(VT.getScalarSizeInBits() - KnownBits, 8);
EVT NVT = EVT::getIntegerVT(*DAG.getContext(), PowerOf2Ceil(MinWidth));		EVT NVT = EVT::getIntegerVT(*DAG.getContext(), PowerOf2Ceil(MinWidth));
if (VT.isVector())		if (VT.isVector())
NVT = EVT::getVectorVT(*DAG.getContext(), NVT, VT.getVectorElementCount());		NVT = EVT::getVectorVT(*DAG.getContext(), NVT, VT.getVectorElementCount());
if (!TLI.isOperationLegalOrCustom(AVGOpc, NVT))		if (!TLI.isOperationLegalOrCustom(AVGOpc, NVT) &&
		(TLO.LegalTypes() \|\| !TLI.isOperationCustom(AVGOpc, NVT)))
return SDValue();		return SDValue();

SDLoc DL(Op);		SDLoc DL(Op);
SDValue ResultAVG =		SDValue ResultAVG =
DAG.getNode(AVGOpc, DL, NVT, DAG.getNode(ISD::TRUNCATE, DL, NVT, ExtOpA),		DAG.getNode(AVGOpc, DL, NVT, DAG.getNode(ISD::TRUNCATE, DL, NVT, ExtOpA),
DAG.getNode(ISD::TRUNCATE, DL, NVT, ExtOpB));		DAG.getNode(ISD::TRUNCATE, DL, NVT, ExtOpB));
return DAG.getNode(IsSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND, DL, VT,		return DAG.getNode(IsSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND, DL, VT,
ResultAVG);		ResultAVG);
▲ Show 20 Lines • Show All 767 Lines • ▼ Show 20 Lines	case ISD::SHL: {
break;		break;
}		}
case ISD::SRL: {		case ISD::SRL: {
SDValue Op0 = Op.getOperand(0);		SDValue Op0 = Op.getOperand(0);
SDValue Op1 = Op.getOperand(1);		SDValue Op1 = Op.getOperand(1);
EVT ShiftVT = Op1.getValueType();		EVT ShiftVT = Op1.getValueType();

// Try to match AVG patterns.		// Try to match AVG patterns.
if (SDValue AVG = combineShiftToAVG(Op, TLO.DAG, *this, DemandedBits,		if (SDValue AVG = combineShiftToAVG(Op, TLO, *this, DemandedBits,
DemandedElts, Depth + 1))		DemandedElts, Depth + 1))
return TLO.CombineTo(Op, AVG);		return TLO.CombineTo(Op, AVG);

if (const APInt *SA =		if (const APInt *SA =
TLO.DAG.getValidShiftAmountConstant(Op, DemandedElts)) {		TLO.DAG.getValidShiftAmountConstant(Op, DemandedElts)) {
unsigned ShAmt = SA->getZExtValue();		unsigned ShAmt = SA->getZExtValue();
if (ShAmt == 0)		if (ShAmt == 0)
return TLO.CombineTo(Op, Op0);		return TLO.CombineTo(Op, Op0);
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	case ISD::SRA: {
// If this is an arithmetic shift right and only the low-bit is set, we can		// If this is an arithmetic shift right and only the low-bit is set, we can
// always convert this into a logical shr, even if the shift amount is		// always convert this into a logical shr, even if the shift amount is
// variable. The low bit of the shift cannot be an input sign bit unless		// variable. The low bit of the shift cannot be an input sign bit unless
// the shift amount is >= the size of the datatype, which is undefined.		// the shift amount is >= the size of the datatype, which is undefined.
if (DemandedBits.isOne())		if (DemandedBits.isOne())
return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::SRL, dl, VT, Op0, Op1));		return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::SRL, dl, VT, Op0, Op1));

// Try to match AVG patterns.		// Try to match AVG patterns.
if (SDValue AVG = combineShiftToAVG(Op, TLO.DAG, *this, DemandedBits,		if (SDValue AVG = combineShiftToAVG(Op, TLO, *this, DemandedBits,
DemandedElts, Depth + 1))		DemandedElts, Depth + 1))
return TLO.CombineTo(Op, AVG);		return TLO.CombineTo(Op, AVG);

if (const APInt *SA =		if (const APInt *SA =
TLO.DAG.getValidShiftAmountConstant(Op, DemandedElts)) {		TLO.DAG.getValidShiftAmountConstant(Op, DemandedElts)) {
unsigned ShAmt = SA->getZExtValue();		unsigned ShAmt = SA->getZExtValue();
if (ShAmt == 0)		if (ShAmt == 0)
return TLO.CombineTo(Op, Op0);		return TLO.CombineTo(Op, Op0);
▲ Show 20 Lines • Show All 8,439 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 428 Lines • ▼ Show 20 Lines	for (unsigned im = (unsigned)ISD::PRE_INC;
for (auto VT : {MVT::v8i8, MVT::v4i8, MVT::v4i16}) {		for (auto VT : {MVT::v8i8, MVT::v4i8, MVT::v4i16}) {
setIndexedLoadAction(im, VT, Legal);		setIndexedLoadAction(im, VT, Legal);
setIndexedStoreAction(im, VT, Legal);		setIndexedStoreAction(im, VT, Legal);
setIndexedMaskedLoadAction(im, VT, Legal);		setIndexedMaskedLoadAction(im, VT, Legal);
setIndexedMaskedStoreAction(im, VT, Legal);		setIndexedMaskedStoreAction(im, VT, Legal);
}		}
}		}

		// Custom extend some nodes so that the generic combines fire on smaller than
		// legal types.
		for (auto VT : {MVT::v8i8, MVT::v4i8, MVT::v4i16}) {
		setOperationAction(ISD::MULHS, VT, Custom);
		setOperationAction(ISD::MULHU, VT, Custom);
		setOperationAction(ISD::AVGFLOORS, VT, Custom);
		setOperationAction(ISD::AVGFLOORU, VT, Custom);
		setOperationAction(ISD::AVGCEILS, VT, Custom);
		setOperationAction(ISD::AVGCEILU, VT, Custom);
		setOperationAction(ISD::ABDS, VT, Custom);
		setOperationAction(ISD::ABDU, VT, Custom);
		}

// Predicate types		// Predicate types
const MVT pTypes[] = {MVT::v16i1, MVT::v8i1, MVT::v4i1, MVT::v2i1};		const MVT pTypes[] = {MVT::v16i1, MVT::v8i1, MVT::v4i1, MVT::v2i1};
for (auto VT : pTypes) {		for (auto VT : pTypes) {
addRegisterClass(VT, &ARM::VCCRRegClass);		addRegisterClass(VT, &ARM::VCCRRegClass);
setOperationAction(ISD::BUILD_VECTOR, VT, Custom);		setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);		setOperationAction(ISD::VECTOR_SHUFFLE, VT, Custom);
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
▲ Show 20 Lines • Show All 4,650 Lines • ▼ Show 20 Lines	static SDValue LowerADDSUBSAT(SDValue Op, SelectionDAG &DAG,
SDLoc dl(Op);		SDLoc dl(Op);
SDValue Add =		SDValue Add =
DAG.getNode(NewOpcode, dl, MVT::i32,		DAG.getNode(NewOpcode, dl, MVT::i32,
DAG.getSExtOrTrunc(Op->getOperand(0), dl, MVT::i32),		DAG.getSExtOrTrunc(Op->getOperand(0), dl, MVT::i32),
DAG.getSExtOrTrunc(Op->getOperand(1), dl, MVT::i32));		DAG.getSExtOrTrunc(Op->getOperand(1), dl, MVT::i32));
return DAG.getNode(ISD::TRUNCATE, dl, VT, Add);		return DAG.getNode(ISD::TRUNCATE, dl, VT, Add);
}		}

		// Custom lower MULH, ABD, HADD and RHADD nodes that are smaller than legal,
		// using a bitcast to a larger legal type upon which we perform the bitcast.
		// This allow DAG combine to recognize the nodes where it usually would not.
		static SDValue LowerBinopWithBitcast(SDNode *N, SelectionDAG &DAG) {
		EVT VT = N->getValueType(0);

		assert((VT == MVT::v4i8 \|\| VT == MVT::v8i8 \|\| VT == MVT::v4i16) &&
		"Expected smaller than legal type!");

		EVT ExtVT = VT.getVectorNumElements() == 4 ? MVT::v4i32 : MVT::v8i16;
		EVT BinOpVT = VT.getScalarType() == MVT::i8 ? MVT::v16i8 : MVT::v8i16;

		SDLoc DL(N);
		SDValue Ext0 = DAG.getNode(ISD::ANY_EXTEND, DL, ExtVT, N->getOperand(0));
		SDValue Ext1 = DAG.getNode(ISD::ANY_EXTEND, DL, ExtVT, N->getOperand(1));
		SDValue BC0 = DAG.getNode(ARMISD::VECTOR_REG_CAST, DL, BinOpVT, Ext0);
		SDValue BC1 = DAG.getNode(ARMISD::VECTOR_REG_CAST, DL, BinOpVT, Ext1);
		SDValue BinOp = DAG.getNode(N->getOpcode(), DL, BinOpVT, BC0, BC1);
		SDValue BC2 = DAG.getNode(ARMISD::VECTOR_REG_CAST, DL, ExtVT, BinOp);
		return DAG.getNode(ISD::TRUNCATE, DL, VT, BC2);
		}

SDValue ARMTargetLowering::LowerSELECT(SDValue Op, SelectionDAG &DAG) const {		SDValue ARMTargetLowering::LowerSELECT(SDValue Op, SelectionDAG &DAG) const {
SDValue Cond = Op.getOperand(0);		SDValue Cond = Op.getOperand(0);
SDValue SelectTrue = Op.getOperand(1);		SDValue SelectTrue = Op.getOperand(1);
SDValue SelectFalse = Op.getOperand(2);		SDValue SelectFalse = Op.getOperand(2);
SDLoc dl(Op);		SDLoc dl(Op);
unsigned Opc = Cond.getOpcode();		unsigned Opc = Cond.getOpcode();

if (Cond.getResNo() == 1 &&		if (Cond.getResNo() == 1 &&
▲ Show 20 Lines • Show All 5,429 Lines • ▼ Show 20 Lines	case ISD::UDIVREM:
Results.push_back(Res.getValue(1));		Results.push_back(Res.getValue(1));
return;		return;
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::USUBSAT:		case ISD::USUBSAT:
Res = LowerADDSUBSAT(SDValue(N, 0), DAG, Subtarget);		Res = LowerADDSUBSAT(SDValue(N, 0), DAG, Subtarget);
break;		break;
		case ISD::MULHS:
		case ISD::MULHU:
		case ISD::ABDS:
		case ISD::ABDU:
		case ISD::AVGFLOORS:
		case ISD::AVGFLOORU:
		case ISD::AVGCEILS:
		case ISD::AVGCEILU:
		Res = LowerBinopWithBitcast(N, DAG);
		break;
case ISD::READCYCLECOUNTER:		case ISD::READCYCLECOUNTER:
ReplaceREADCYCLECOUNTER(N, Results, DAG, Subtarget);		ReplaceREADCYCLECOUNTER(N, Results, DAG, Subtarget);
return;		return;
case ISD::UDIV:		case ISD::UDIV:
case ISD::SDIV:		case ISD::SDIV:
assert(Subtarget->isTargetWindows() && "can only expand DIV on Windows");		assert(Subtarget->isTargetWindows() && "can only expand DIV on Windows");
return ExpandDIV_Windows(SDValue(N, 0), DAG, N->getOpcode() == ISD::SDIV,		return ExpandDIV_Windows(SDValue(N, 0), DAG, N->getOpcode() == ISD::SDIV,
Results);		Results);
▲ Show 20 Lines • Show All 11,386 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vabdus.ll

Show All 13 Lines	; CHECK-NEXT: bx lr
%s = select <16 x i1> %c, <16 x i16> %add1, <16 x i16> %add2		%s = select <16 x i1> %c, <16 x i16> %add1, <16 x i16> %add2
%result = trunc <16 x i16> %s to <16 x i8>		%result = trunc <16 x i16> %s to <16 x i8>
ret <16 x i8> %result		ret <16 x i8> %result
}		}

define arm_aapcs_vfpcc <8 x i8> @vabd_v8s8(<8 x i8> %src1, <8 x i8> %src2) {		define arm_aapcs_vfpcc <8 x i8> @vabd_v8s8(<8 x i8> %src1, <8 x i8> %src2) {
; CHECK-LABEL: vabd_v8s8:		; CHECK-LABEL: vabd_v8s8:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.s8 q1, q1		; CHECK-NEXT: vabd.s8 q0, q0, q1
; CHECK-NEXT: vmovlb.s8 q0, q0		; CHECK-NEXT: vmovlb.u8 q0, q0
; CHECK-NEXT: vabd.s16 q0, q0, q1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%sextsrc1 = sext <8 x i8> %src1 to <8 x i16>		%sextsrc1 = sext <8 x i8> %src1 to <8 x i16>
%sextsrc2 = sext <8 x i8> %src2 to <8 x i16>		%sextsrc2 = sext <8 x i8> %src2 to <8 x i16>
%add1 = sub <8 x i16> %sextsrc1, %sextsrc2		%add1 = sub <8 x i16> %sextsrc1, %sextsrc2
%add2 = sub <8 x i16> zeroinitializer, %add1		%add2 = sub <8 x i16> zeroinitializer, %add1
%c = icmp sge <8 x i16> %add1, zeroinitializer		%c = icmp sge <8 x i16> %add1, zeroinitializer
%s = select <8 x i1> %c, <8 x i16> %add1, <8 x i16> %add2		%s = select <8 x i1> %c, <8 x i16> %add1, <8 x i16> %add2
%result = trunc <8 x i16> %s to <8 x i8>		%result = trunc <8 x i16> %s to <8 x i8>
ret <8 x i8> %result		ret <8 x i8> %result
}		}

define arm_aapcs_vfpcc <4 x i8> @vabd_v4s8(<4 x i8> %src1, <4 x i8> %src2) {		define arm_aapcs_vfpcc <4 x i8> @vabd_v4s8(<4 x i8> %src1, <4 x i8> %src2) {
; CHECK-LABEL: vabd_v4s8:		; CHECK-LABEL: vabd_v4s8:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.s8 q1, q1		; CHECK-NEXT: vabd.s8 q0, q0, q1
; CHECK-NEXT: vmovlb.s8 q0, q0		; CHECK-NEXT: vmov.i32 q1, #0xff
; CHECK-NEXT: vmovlb.s16 q1, q1		; CHECK-NEXT: vand q0, q0, q1
; CHECK-NEXT: vmovlb.s16 q0, q0
; CHECK-NEXT: vsub.i32 q0, q0, q1
; CHECK-NEXT: vabs.s32 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%sextsrc1 = sext <4 x i8> %src1 to <4 x i16>		%sextsrc1 = sext <4 x i8> %src1 to <4 x i16>
%sextsrc2 = sext <4 x i8> %src2 to <4 x i16>		%sextsrc2 = sext <4 x i8> %src2 to <4 x i16>
%add1 = sub <4 x i16> %sextsrc1, %sextsrc2		%add1 = sub <4 x i16> %sextsrc1, %sextsrc2
%add2 = sub <4 x i16> zeroinitializer, %add1		%add2 = sub <4 x i16> zeroinitializer, %add1
%c = icmp sge <4 x i16> %add1, zeroinitializer		%c = icmp sge <4 x i16> %add1, zeroinitializer
%s = select <4 x i1> %c, <4 x i16> %add1, <4 x i16> %add2		%s = select <4 x i1> %c, <4 x i16> %add1, <4 x i16> %add2
%result = trunc <4 x i16> %s to <4 x i8>		%result = trunc <4 x i16> %s to <4 x i8>
Show All 13 Lines	; CHECK-NEXT: bx lr
%s = select <8 x i1> %c, <8 x i32> %add1, <8 x i32> %add2		%s = select <8 x i1> %c, <8 x i32> %add1, <8 x i32> %add2
%result = trunc <8 x i32> %s to <8 x i16>		%result = trunc <8 x i32> %s to <8 x i16>
ret <8 x i16> %result		ret <8 x i16> %result
}		}

define arm_aapcs_vfpcc <4 x i16> @vabd_v4s16(<4 x i16> %src1, <4 x i16> %src2) {		define arm_aapcs_vfpcc <4 x i16> @vabd_v4s16(<4 x i16> %src1, <4 x i16> %src2) {
; CHECK-LABEL: vabd_v4s16:		; CHECK-LABEL: vabd_v4s16:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.s16 q1, q1		; CHECK-NEXT: vabd.s16 q0, q0, q1
; CHECK-NEXT: vmovlb.s16 q0, q0		; CHECK-NEXT: vmovlb.u16 q0, q0
; CHECK-NEXT: vabd.s32 q0, q0, q1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%sextsrc1 = sext <4 x i16> %src1 to <4 x i32>		%sextsrc1 = sext <4 x i16> %src1 to <4 x i32>
%sextsrc2 = sext <4 x i16> %src2 to <4 x i32>		%sextsrc2 = sext <4 x i16> %src2 to <4 x i32>
%add1 = sub <4 x i32> %sextsrc1, %sextsrc2		%add1 = sub <4 x i32> %sextsrc1, %sextsrc2
%add2 = sub <4 x i32> zeroinitializer, %add1		%add2 = sub <4 x i32> zeroinitializer, %add1
%c = icmp sge <4 x i32> %add1, zeroinitializer		%c = icmp sge <4 x i32> %add1, zeroinitializer
%s = select <4 x i1> %c, <4 x i32> %add1, <4 x i32> %add2		%s = select <4 x i1> %c, <4 x i32> %add1, <4 x i32> %add2
%result = trunc <4 x i32> %s to <4 x i16>		%result = trunc <4 x i32> %s to <4 x i16>
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bx lr
%s = select <16 x i1> %c, <16 x i16> %add1, <16 x i16> %add2		%s = select <16 x i1> %c, <16 x i16> %add1, <16 x i16> %add2
%result = trunc <16 x i16> %s to <16 x i8>		%result = trunc <16 x i16> %s to <16 x i8>
ret <16 x i8> %result		ret <16 x i8> %result
}		}

define arm_aapcs_vfpcc <8 x i8> @vabd_v8u8(<8 x i8> %src1, <8 x i8> %src2) {		define arm_aapcs_vfpcc <8 x i8> @vabd_v8u8(<8 x i8> %src1, <8 x i8> %src2) {
; CHECK-LABEL: vabd_v8u8:		; CHECK-LABEL: vabd_v8u8:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.u8 q1, q1		; CHECK-NEXT: vabd.u8 q0, q0, q1
; CHECK-NEXT: vmovlb.u8 q0, q0		; CHECK-NEXT: vmovlb.u8 q0, q0
; CHECK-NEXT: vabd.u16 q0, q0, q1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%zextsrc1 = zext <8 x i8> %src1 to <8 x i16>		%zextsrc1 = zext <8 x i8> %src1 to <8 x i16>
%zextsrc2 = zext <8 x i8> %src2 to <8 x i16>		%zextsrc2 = zext <8 x i8> %src2 to <8 x i16>
%add1 = sub <8 x i16> %zextsrc1, %zextsrc2		%add1 = sub <8 x i16> %zextsrc1, %zextsrc2
%add2 = sub <8 x i16> zeroinitializer, %add1		%add2 = sub <8 x i16> zeroinitializer, %add1
%c = icmp sge <8 x i16> %add1, zeroinitializer		%c = icmp sge <8 x i16> %add1, zeroinitializer
%s = select <8 x i1> %c, <8 x i16> %add1, <8 x i16> %add2		%s = select <8 x i1> %c, <8 x i16> %add1, <8 x i16> %add2
%result = trunc <8 x i16> %s to <8 x i8>		%result = trunc <8 x i16> %s to <8 x i8>
ret <8 x i8> %result		ret <8 x i8> %result
}		}

define arm_aapcs_vfpcc <4 x i8> @vabd_v4u8(<4 x i8> %src1, <4 x i8> %src2) {		define arm_aapcs_vfpcc <4 x i8> @vabd_v4u8(<4 x i8> %src1, <4 x i8> %src2) {
; CHECK-LABEL: vabd_v4u8:		; CHECK-LABEL: vabd_v4u8:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmov.i32 q2, #0xff		; CHECK-NEXT: vabd.u8 q0, q0, q1
; CHECK-NEXT: vand q1, q1, q2		; CHECK-NEXT: vmov.i32 q1, #0xff
; CHECK-NEXT: vand q0, q0, q2		; CHECK-NEXT: vand q0, q0, q1
; CHECK-NEXT: vsub.i32 q0, q0, q1
; CHECK-NEXT: vabs.s32 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%zextsrc1 = zext <4 x i8> %src1 to <4 x i16>		%zextsrc1 = zext <4 x i8> %src1 to <4 x i16>
%zextsrc2 = zext <4 x i8> %src2 to <4 x i16>		%zextsrc2 = zext <4 x i8> %src2 to <4 x i16>
%add1 = sub <4 x i16> %zextsrc1, %zextsrc2		%add1 = sub <4 x i16> %zextsrc1, %zextsrc2
%add2 = sub <4 x i16> zeroinitializer, %add1		%add2 = sub <4 x i16> zeroinitializer, %add1
%c = icmp sge <4 x i16> %add1, zeroinitializer		%c = icmp sge <4 x i16> %add1, zeroinitializer
%s = select <4 x i1> %c, <4 x i16> %add1, <4 x i16> %add2		%s = select <4 x i1> %c, <4 x i16> %add1, <4 x i16> %add2
%result = trunc <4 x i16> %s to <4 x i8>		%result = trunc <4 x i16> %s to <4 x i8>
Show All 13 Lines	; CHECK-NEXT: bx lr
%s = select <8 x i1> %c, <8 x i32> %add1, <8 x i32> %add2		%s = select <8 x i1> %c, <8 x i32> %add1, <8 x i32> %add2
%result = trunc <8 x i32> %s to <8 x i16>		%result = trunc <8 x i32> %s to <8 x i16>
ret <8 x i16> %result		ret <8 x i16> %result
}		}

define arm_aapcs_vfpcc <4 x i16> @vabd_v4u16(<4 x i16> %src1, <4 x i16> %src2) {		define arm_aapcs_vfpcc <4 x i16> @vabd_v4u16(<4 x i16> %src1, <4 x i16> %src2) {
; CHECK-LABEL: vabd_v4u16:		; CHECK-LABEL: vabd_v4u16:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.u16 q1, q1		; CHECK-NEXT: vabd.u16 q0, q0, q1
; CHECK-NEXT: vmovlb.u16 q0, q0		; CHECK-NEXT: vmovlb.u16 q0, q0
; CHECK-NEXT: vabd.u32 q0, q0, q1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%zextsrc1 = zext <4 x i16> %src1 to <4 x i32>		%zextsrc1 = zext <4 x i16> %src1 to <4 x i32>
%zextsrc2 = zext <4 x i16> %src2 to <4 x i32>		%zextsrc2 = zext <4 x i16> %src2 to <4 x i32>
%add1 = sub <4 x i32> %zextsrc1, %zextsrc2		%add1 = sub <4 x i32> %zextsrc1, %zextsrc2
%add2 = sub <4 x i32> zeroinitializer, %add1		%add2 = sub <4 x i32> zeroinitializer, %add1
%c = icmp sge <4 x i32> %add1, zeroinitializer		%c = icmp sge <4 x i32> %add1, zeroinitializer
%s = select <4 x i1> %c, <4 x i32> %add1, <4 x i32> %add2		%s = select <4 x i1> %c, <4 x i32> %add1, <4 x i32> %add2
%result = trunc <4 x i32> %s to <4 x i16>		%result = trunc <4 x i32> %s to <4 x i16>
▲ Show 20 Lines • Show All 417 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vhadd.ll

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	entry:
%s = lshr <4 x i32> %m, <i32 1, i32 1, i32 1, i32 1>		%s = lshr <4 x i32> %m, <i32 1, i32 1, i32 1, i32 1>
%s2 = trunc <4 x i32> %s to <4 x i16>		%s2 = trunc <4 x i32> %s to <4 x i16>
ret <4 x i16> %s2		ret <4 x i16> %s2
}		}

define arm_aapcs_vfpcc <4 x i16> @vhaddu_v4i16(<4 x i16> %s0, <4 x i16> %s1) {		define arm_aapcs_vfpcc <4 x i16> @vhaddu_v4i16(<4 x i16> %s0, <4 x i16> %s1) {
; CHECK-LABEL: vhaddu_v4i16:		; CHECK-LABEL: vhaddu_v4i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmovlb.u16 q1, q1		; CHECK-NEXT: vhadd.u16 q0, q0, q1
; CHECK-NEXT: vmovlb.u16 q0, q0		; CHECK-NEXT: vmovlb.u16 q0, q0
; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: vshr.u32 q0, q0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <4 x i16> %s0 to <4 x i32>		%s0s = zext <4 x i16> %s0 to <4 x i32>
%s1s = zext <4 x i16> %s1 to <4 x i32>		%s1s = zext <4 x i16> %s1 to <4 x i32>
%m = add <4 x i32> %s0s, %s1s		%m = add <4 x i32> %s0s, %s1s
%s = lshr <4 x i32> %m, <i32 1, i32 1, i32 1, i32 1>		%s = lshr <4 x i32> %m, <i32 1, i32 1, i32 1, i32 1>
%s2 = trunc <4 x i32> %s to <4 x i16>		%s2 = trunc <4 x i32> %s to <4 x i16>
ret <4 x i16> %s2		ret <4 x i16> %s2
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	entry:
%s = lshr <4 x i16> %m, <i16 1, i16 1, i16 1, i16 1>		%s = lshr <4 x i16> %m, <i16 1, i16 1, i16 1, i16 1>
%s2 = trunc <4 x i16> %s to <4 x i8>		%s2 = trunc <4 x i16> %s to <4 x i8>
ret <4 x i8> %s2		ret <4 x i8> %s2
}		}

define arm_aapcs_vfpcc <4 x i8> @vhaddu_v4i8(<4 x i8> %s0, <4 x i8> %s1) {		define arm_aapcs_vfpcc <4 x i8> @vhaddu_v4i8(<4 x i8> %s0, <4 x i8> %s1) {
; CHECK-LABEL: vhaddu_v4i8:		; CHECK-LABEL: vhaddu_v4i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmov.i32 q2, #0xff		; CHECK-NEXT: vhadd.u8 q0, q0, q1
; CHECK-NEXT: vand q1, q1, q2		; CHECK-NEXT: vmov.i32 q1, #0xff
; CHECK-NEXT: vand q0, q0, q2		; CHECK-NEXT: vand q0, q0, q1
; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: vshr.u32 q0, q0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <4 x i8> %s0 to <4 x i16>		%s0s = zext <4 x i8> %s0 to <4 x i16>
%s1s = zext <4 x i8> %s1 to <4 x i16>		%s1s = zext <4 x i8> %s1 to <4 x i16>
%m = add <4 x i16> %s0s, %s1s		%m = add <4 x i16> %s0s, %s1s
%s = lshr <4 x i16> %m, <i16 1, i16 1, i16 1, i16 1>		%s = lshr <4 x i16> %m, <i16 1, i16 1, i16 1, i16 1>
%s2 = trunc <4 x i16> %s to <4 x i8>		%s2 = trunc <4 x i16> %s to <4 x i8>
ret <4 x i8> %s2		ret <4 x i8> %s2
Show All 14 Lines	entry:
%s = lshr <8 x i16> %m, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>		%s = lshr <8 x i16> %m, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
%s2 = trunc <8 x i16> %s to <8 x i8>		%s2 = trunc <8 x i16> %s to <8 x i8>
ret <8 x i8> %s2		ret <8 x i8> %s2
}		}

define arm_aapcs_vfpcc <8 x i8> @vhaddu_v8i8(<8 x i8> %s0, <8 x i8> %s1) {		define arm_aapcs_vfpcc <8 x i8> @vhaddu_v8i8(<8 x i8> %s0, <8 x i8> %s1) {
; CHECK-LABEL: vhaddu_v8i8:		; CHECK-LABEL: vhaddu_v8i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmovlb.u8 q1, q1		; CHECK-NEXT: vhadd.u8 q0, q0, q1
; CHECK-NEXT: vmovlb.u8 q0, q0		; CHECK-NEXT: vmovlb.u8 q0, q0
; CHECK-NEXT: vadd.i16 q0, q0, q1
; CHECK-NEXT: vshr.u16 q0, q0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <8 x i8> %s0 to <8 x i16>		%s0s = zext <8 x i8> %s0 to <8 x i16>
%s1s = zext <8 x i8> %s1 to <8 x i16>		%s1s = zext <8 x i8> %s1 to <8 x i16>
%m = add <8 x i16> %s0s, %s1s		%m = add <8 x i16> %s0s, %s1s
%s = lshr <8 x i16> %m, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>		%s = lshr <8 x i16> %m, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
%s2 = trunc <8 x i16> %s to <8 x i8>		%s2 = trunc <8 x i16> %s to <8 x i8>
ret <8 x i8> %s2		ret <8 x i8> %s2
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	entry:
%s = lshr <4 x i32> %add2, <i32 1, i32 1, i32 1, i32 1>		%s = lshr <4 x i32> %add2, <i32 1, i32 1, i32 1, i32 1>
%result = trunc <4 x i32> %s to <4 x i16>		%result = trunc <4 x i32> %s to <4 x i16>
ret <4 x i16> %result		ret <4 x i16> %result
}		}

define arm_aapcs_vfpcc <4 x i16> @vrhaddu_v4i16(<4 x i16> %s0, <4 x i16> %s1) {		define arm_aapcs_vfpcc <4 x i16> @vrhaddu_v4i16(<4 x i16> %s0, <4 x i16> %s1) {
; CHECK-LABEL: vrhaddu_v4i16:		; CHECK-LABEL: vrhaddu_v4i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmovlb.u16 q1, q1		; CHECK-NEXT: vrhadd.u16 q0, q0, q1
; CHECK-NEXT: vmovlb.u16 q0, q0		; CHECK-NEXT: vmovlb.u16 q0, q0
; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: movs r0, #1
; CHECK-NEXT: vadd.i32 q0, q0, r0
; CHECK-NEXT: vshr.u32 q0, q0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <4 x i16> %s0 to <4 x i32>		%s0s = zext <4 x i16> %s0 to <4 x i32>
%s1s = zext <4 x i16> %s1 to <4 x i32>		%s1s = zext <4 x i16> %s1 to <4 x i32>
%add = add <4 x i32> %s0s, %s1s		%add = add <4 x i32> %s0s, %s1s
%add2 = add <4 x i32> %add, <i32 1, i32 1, i32 1, i32 1>		%add2 = add <4 x i32> %add, <i32 1, i32 1, i32 1, i32 1>
%s = lshr <4 x i32> %add2, <i32 1, i32 1, i32 1, i32 1>		%s = lshr <4 x i32> %add2, <i32 1, i32 1, i32 1, i32 1>
%result = trunc <4 x i32> %s to <4 x i16>		%result = trunc <4 x i32> %s to <4 x i16>
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	entry:
%s = lshr <4 x i16> %add2, <i16 1, i16 1, i16 1, i16 1>		%s = lshr <4 x i16> %add2, <i16 1, i16 1, i16 1, i16 1>
%result = trunc <4 x i16> %s to <4 x i8>		%result = trunc <4 x i16> %s to <4 x i8>
ret <4 x i8> %result		ret <4 x i8> %result
}		}

define arm_aapcs_vfpcc <4 x i8> @vrhaddu_v4i8(<4 x i8> %s0, <4 x i8> %s1) {		define arm_aapcs_vfpcc <4 x i8> @vrhaddu_v4i8(<4 x i8> %s0, <4 x i8> %s1) {
; CHECK-LABEL: vrhaddu_v4i8:		; CHECK-LABEL: vrhaddu_v4i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmov.i32 q2, #0xff		; CHECK-NEXT: vrhadd.u8 q0, q0, q1
; CHECK-NEXT: movs r0, #1		; CHECK-NEXT: vmov.i32 q1, #0xff
; CHECK-NEXT: vand q1, q1, q2		; CHECK-NEXT: vand q0, q0, q1
; CHECK-NEXT: vand q0, q0, q2
; CHECK-NEXT: vadd.i32 q0, q0, q1
; CHECK-NEXT: vadd.i32 q0, q0, r0
; CHECK-NEXT: vshr.u32 q0, q0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <4 x i8> %s0 to <4 x i16>		%s0s = zext <4 x i8> %s0 to <4 x i16>
%s1s = zext <4 x i8> %s1 to <4 x i16>		%s1s = zext <4 x i8> %s1 to <4 x i16>
%add = add <4 x i16> %s0s, %s1s		%add = add <4 x i16> %s0s, %s1s
%add2 = add <4 x i16> %add, <i16 1, i16 1, i16 1, i16 1>		%add2 = add <4 x i16> %add, <i16 1, i16 1, i16 1, i16 1>
%s = lshr <4 x i16> %add2, <i16 1, i16 1, i16 1, i16 1>		%s = lshr <4 x i16> %add2, <i16 1, i16 1, i16 1, i16 1>
%result = trunc <4 x i16> %s to <4 x i8>		%result = trunc <4 x i16> %s to <4 x i8>
Show All 18 Lines	entry:
%s = lshr <8 x i16> %add2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>		%s = lshr <8 x i16> %add2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
%result = trunc <8 x i16> %s to <8 x i8>		%result = trunc <8 x i16> %s to <8 x i8>
ret <8 x i8> %result		ret <8 x i8> %result
}		}

define arm_aapcs_vfpcc <8 x i8> @vrhaddu_v8i8(<8 x i8> %s0, <8 x i8> %s1) {		define arm_aapcs_vfpcc <8 x i8> @vrhaddu_v8i8(<8 x i8> %s0, <8 x i8> %s1) {
; CHECK-LABEL: vrhaddu_v8i8:		; CHECK-LABEL: vrhaddu_v8i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmovlb.u8 q1, q1		; CHECK-NEXT: vrhadd.u8 q0, q0, q1
; CHECK-NEXT: vmovlb.u8 q0, q0		; CHECK-NEXT: vmovlb.u8 q0, q0
; CHECK-NEXT: vadd.i16 q0, q0, q1
; CHECK-NEXT: movs r0, #1
; CHECK-NEXT: vadd.i16 q0, q0, r0
; CHECK-NEXT: vshr.u16 q0, q0, #1
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <8 x i8> %s0 to <8 x i16>		%s0s = zext <8 x i8> %s0 to <8 x i16>
%s1s = zext <8 x i8> %s1 to <8 x i16>		%s1s = zext <8 x i8> %s1 to <8 x i16>
%add = add <8 x i16> %s0s, %s1s		%add = add <8 x i16> %s0s, %s1s
%add2 = add <8 x i16> %add, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>		%add2 = add <8 x i16> %add, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
%s = lshr <8 x i16> %add2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>		%s = lshr <8 x i16> %add2, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
%result = trunc <8 x i16> %s to <8 x i8>		%result = trunc <8 x i16> %s to <8 x i8>
▲ Show 20 Lines • Show All 545 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vmulh.ll

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	entry:
%s = lshr <4 x i64> %m, <i64 32, i64 32, i64 32, i64 32>		%s = lshr <4 x i64> %m, <i64 32, i64 32, i64 32, i64 32>
%s2 = trunc <4 x i64> %s to <4 x i32>		%s2 = trunc <4 x i64> %s to <4 x i32>
ret <4 x i32> %s2		ret <4 x i32> %s2
}		}

define arm_aapcs_vfpcc <4 x i16> @vmulhs_v4i16(<4 x i16> %s0, <4 x i16> %s1) {		define arm_aapcs_vfpcc <4 x i16> @vmulhs_v4i16(<4 x i16> %s0, <4 x i16> %s1) {
; CHECK-LABEL: vmulhs_v4i16:		; CHECK-LABEL: vmulhs_v4i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmullb.s16 q0, q0, q1		; CHECK-NEXT: vmulh.s16 q0, q0, q1
; CHECK-NEXT: vshr.s32 q0, q0, #16		; CHECK-NEXT: vmovlb.s16 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = sext <4 x i16> %s0 to <4 x i32>		%s0s = sext <4 x i16> %s0 to <4 x i32>
%s1s = sext <4 x i16> %s1 to <4 x i32>		%s1s = sext <4 x i16> %s1 to <4 x i32>
%m = mul <4 x i32> %s0s, %s1s		%m = mul <4 x i32> %s0s, %s1s
%s = ashr <4 x i32> %m, <i32 16, i32 16, i32 16, i32 16>		%s = ashr <4 x i32> %m, <i32 16, i32 16, i32 16, i32 16>
%s2 = trunc <4 x i32> %s to <4 x i16>		%s2 = trunc <4 x i32> %s to <4 x i16>
ret <4 x i16> %s2		ret <4 x i16> %s2
}		}

define arm_aapcs_vfpcc <4 x i16> @vmulhu_v4i16(<4 x i16> %s0, <4 x i16> %s1) {		define arm_aapcs_vfpcc <4 x i16> @vmulhu_v4i16(<4 x i16> %s0, <4 x i16> %s1) {
; CHECK-LABEL: vmulhu_v4i16:		; CHECK-LABEL: vmulhu_v4i16:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmullb.u16 q0, q0, q1		; CHECK-NEXT: vmulh.u16 q0, q0, q1
; CHECK-NEXT: vshr.u32 q0, q0, #16		; CHECK-NEXT: vmovlb.u16 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <4 x i16> %s0 to <4 x i32>		%s0s = zext <4 x i16> %s0 to <4 x i32>
%s1s = zext <4 x i16> %s1 to <4 x i32>		%s1s = zext <4 x i16> %s1 to <4 x i32>
%m = mul <4 x i32> %s0s, %s1s		%m = mul <4 x i32> %s0s, %s1s
%s = lshr <4 x i32> %m, <i32 16, i32 16, i32 16, i32 16>		%s = lshr <4 x i32> %m, <i32 16, i32 16, i32 16, i32 16>
%s2 = trunc <4 x i32> %s to <4 x i16>		%s2 = trunc <4 x i32> %s to <4 x i16>
ret <4 x i16> %s2		ret <4 x i16> %s2
Show All 25 Lines	entry:
%s = lshr <8 x i32> %m, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		%s = lshr <8 x i32> %m, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
%s2 = trunc <8 x i32> %s to <8 x i16>		%s2 = trunc <8 x i32> %s to <8 x i16>
ret <8 x i16> %s2		ret <8 x i16> %s2
}		}

define arm_aapcs_vfpcc <4 x i8> @vmulhs_v4i8(<4 x i8> %s0, <4 x i8> %s1) {		define arm_aapcs_vfpcc <4 x i8> @vmulhs_v4i8(<4 x i8> %s0, <4 x i8> %s1) {
; CHECK-LABEL: vmulhs_v4i8:		; CHECK-LABEL: vmulhs_v4i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmovlb.s8 q1, q1		; CHECK-NEXT: vmulh.s8 q0, q0, q1
; CHECK-NEXT: vmovlb.s8 q0, q0		; CHECK-NEXT: vmovlb.s8 q0, q0
; CHECK-NEXT: vmovlb.s16 q1, q1
; CHECK-NEXT: vmovlb.s16 q0, q0		; CHECK-NEXT: vmovlb.s16 q0, q0
; CHECK-NEXT: vmul.i32 q0, q0, q1
; CHECK-NEXT: vshr.s32 q0, q0, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = sext <4 x i8> %s0 to <4 x i16>		%s0s = sext <4 x i8> %s0 to <4 x i16>
%s1s = sext <4 x i8> %s1 to <4 x i16>		%s1s = sext <4 x i8> %s1 to <4 x i16>
%m = mul <4 x i16> %s0s, %s1s		%m = mul <4 x i16> %s0s, %s1s
%s = ashr <4 x i16> %m, <i16 8, i16 8, i16 8, i16 8>		%s = ashr <4 x i16> %m, <i16 8, i16 8, i16 8, i16 8>
%s2 = trunc <4 x i16> %s to <4 x i8>		%s2 = trunc <4 x i16> %s to <4 x i8>
ret <4 x i8> %s2		ret <4 x i8> %s2
}		}

define arm_aapcs_vfpcc <4 x i8> @vmulhu_v4i8(<4 x i8> %s0, <4 x i8> %s1) {		define arm_aapcs_vfpcc <4 x i8> @vmulhu_v4i8(<4 x i8> %s0, <4 x i8> %s1) {
; CHECK-LABEL: vmulhu_v4i8:		; CHECK-LABEL: vmulhu_v4i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmov.i32 q2, #0xff		; CHECK-NEXT: vmulh.u8 q0, q0, q1
; CHECK-NEXT: vand q1, q1, q2		; CHECK-NEXT: vmov.i32 q1, #0xff
; CHECK-NEXT: vand q0, q0, q2		; CHECK-NEXT: vand q0, q0, q1
; CHECK-NEXT: vmul.i32 q0, q0, q1
; CHECK-NEXT: vshr.u32 q0, q0, #8
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <4 x i8> %s0 to <4 x i16>		%s0s = zext <4 x i8> %s0 to <4 x i16>
%s1s = zext <4 x i8> %s1 to <4 x i16>		%s1s = zext <4 x i8> %s1 to <4 x i16>
%m = mul <4 x i16> %s0s, %s1s		%m = mul <4 x i16> %s0s, %s1s
%s = lshr <4 x i16> %m, <i16 8, i16 8, i16 8, i16 8>		%s = lshr <4 x i16> %m, <i16 8, i16 8, i16 8, i16 8>
%s2 = trunc <4 x i16> %s to <4 x i8>		%s2 = trunc <4 x i16> %s to <4 x i8>
ret <4 x i8> %s2		ret <4 x i8> %s2
}		}

define arm_aapcs_vfpcc <8 x i8> @vmulhs_v8i8(<8 x i8> %s0, <8 x i8> %s1) {		define arm_aapcs_vfpcc <8 x i8> @vmulhs_v8i8(<8 x i8> %s0, <8 x i8> %s1) {
; CHECK-LABEL: vmulhs_v8i8:		; CHECK-LABEL: vmulhs_v8i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmullb.s8 q0, q0, q1		; CHECK-NEXT: vmulh.s8 q0, q0, q1
; CHECK-NEXT: vshr.s16 q0, q0, #8		; CHECK-NEXT: vmovlb.s8 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = sext <8 x i8> %s0 to <8 x i16>		%s0s = sext <8 x i8> %s0 to <8 x i16>
%s1s = sext <8 x i8> %s1 to <8 x i16>		%s1s = sext <8 x i8> %s1 to <8 x i16>
%m = mul <8 x i16> %s0s, %s1s		%m = mul <8 x i16> %s0s, %s1s
%s = ashr <8 x i16> %m, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		%s = ashr <8 x i16> %m, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
%s2 = trunc <8 x i16> %s to <8 x i8>		%s2 = trunc <8 x i16> %s to <8 x i8>
ret <8 x i8> %s2		ret <8 x i8> %s2
}		}

define arm_aapcs_vfpcc <8 x i8> @vmulhu_v8i8(<8 x i8> %s0, <8 x i8> %s1) {		define arm_aapcs_vfpcc <8 x i8> @vmulhu_v8i8(<8 x i8> %s0, <8 x i8> %s1) {
; CHECK-LABEL: vmulhu_v8i8:		; CHECK-LABEL: vmulhu_v8i8:
; CHECK: @ %bb.0: @ %entry		; CHECK: @ %bb.0: @ %entry
; CHECK-NEXT: vmullb.u8 q0, q0, q1		; CHECK-NEXT: vmulh.u8 q0, q0, q1
; CHECK-NEXT: vshr.u16 q0, q0, #8		; CHECK-NEXT: vmovlb.u8 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
entry:		entry:
%s0s = zext <8 x i8> %s0 to <8 x i16>		%s0s = zext <8 x i8> %s0 to <8 x i16>
%s1s = zext <8 x i8> %s1 to <8 x i16>		%s1s = zext <8 x i8> %s1 to <8 x i16>
%m = mul <8 x i16> %s0s, %s1s		%m = mul <8 x i16> %s0s, %s1s
%s = lshr <8 x i16> %m, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>		%s = lshr <8 x i16> %m, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
%s2 = trunc <8 x i16> %s to <8 x i8>		%s2 = trunc <8 x i16> %s to <8 x i8>
ret <8 x i8> %s2		ret <8 x i8> %s2
▲ Show 20 Lines • Show All 610 Lines • Show Last 20 Lines