This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Refactor lower to S[LR]I optimization
ClosedPublic

Authored by PetreTudor on May 1 2020, 4:48 AM.

Download Raw Diff

Details

Reviewers

dmgreen
SjoerdMeijer
mstorsjo
efriedma

Commits

rG9682d0d5dcc5: [ARM] Refactor lower to S[LR]I optimization

Summary

The optimization has been refactored to fix certain bugs and
limitations. The condition for lowering to S[LR]I has been changed
to reflect the manual pseudocode description of SLI and SRI operation.
The optimization can now handle more cases of operand type and order.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

PetreTudor created this revision.May 1 2020, 4:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2020, 4:48 AM

Herald added subscribers: llvm-commits, danielkiss, hiraditya, kristof.beyls. · View Herald Transcript

PetreTudor added reviewers: dmgreen, SjoerdMeijer, mstorsjo, efriedma.May 1 2020, 4:51 AM

The problem with the original patch was that ElemMask would overflow, since it was only 32-bit wide. The patch now makes use of APInt to avoid this type of situation.

Harbormaster failed remote builds in B55426: Diff 261456!May 1 2020, 5:01 AM

efriedma added inline comments.May 1 2020, 11:04 AM

llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll
1–2	Please generate the test using update-llc-test-checks.py

Thanks for the updates. Looks like it's picked up some new tricks since the last version.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
8017	Can you add a comment saying this reconstructs the immediate from the BIC
8020	I think if this used C1 = ~(C1nodeImm->getZExtValue() << C1nodeShift->getZExtValue()) here, then the IsAnd condition below can be removed.

Addressed reviewers' comments.

Added comments, simplified some logic and updated llc tests via
update_llc_test_checks.py

PetreTudor marked 3 inline comments as done.May 4 2020, 7:19 AM

Harbormaster failed remote builds in B55633: Diff 261808!May 4 2020, 8:00 AM

Thanks. This looks good to me, but I remember Eli making some comments on the other patch. Please wait for him to comment again too.

LGTM

It's not obvious to me from reading the testcase that we have test coverage code for both the BICi and the isAllConstantBuildVector case; please check before merging.

Probably it would be worth looking into pattern-matching cases that don't involve precisely the AND you're looking for, using computeKnownBits. But that doesn't need to happen here.

This revision is now accepted and ready to land.May 5 2020, 10:37 AM

Added a test for the isAllConstantBuildVector case. The BICi case happens
when the constant values' sizes are greater than 8 bits.

Harbormaster failed remote builds in B56321: Diff 263189!May 11 2020, 10:12 AM

Closed by commit rG9682d0d5dcc5: [ARM] Refactor lower to S[LR]I optimization (authored by PetreTudor). · Explain WhyMay 12 2020, 3:11 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

10 lines

AArch64ISelLowering.cpp

105 lines

AArch64InstrInfo.td

14 lines

test/

CodeGen/

AArch64/

arm64-sli-sri-opt.ll

439 lines

Diff 263390

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {

// Vector shift by scalar (again)		// Vector shift by scalar (again)
SQSHL_I,		SQSHL_I,
UQSHL_I,		UQSHL_I,
SQSHLU_I,		SQSHLU_I,
SRSHR_I,		SRSHR_I,
URSHR_I,		URSHR_I,

		// Vector shift by constant and insert
		VSLI,
		VSRI,

// Vector comparisons		// Vector comparisons
CMEQ,		CMEQ,
CMGE,		CMGE,
CMGT,		CMGT,
CMHI,		CMHI,
CMHS,		CMHS,
FCMEQ,		FCMEQ,
FCMGE,		FCMGE,
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
/// need to re-interpret the data in SIMD vector registers in big-endian		/// need to re-interpret the data in SIMD vector registers in big-endian
/// mode without emitting such REV instructions.		/// mode without emitting such REV instructions.
NVCAST,		NVCAST,

SMULL,		SMULL,
UMULL,		UMULL,

// Reciprocal estimates and steps.		// Reciprocal estimates and steps.
FRECPE, FRECPS,		FRECPE,
FRSQRTE, FRSQRTS,		FRECPS,
		FRSQRTE,
		FRSQRTS,

SUNPKHI,		SUNPKHI,
SUNPKLO,		SUNPKLO,
UUNPKHI,		UUNPKHI,
UUNPKLO,		UUNPKLO,

CLASTA_N,		CLASTA_N,
CLASTB_N,		CLASTB_N,
▲ Show 20 Lines • Show All 681 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "aarch64-lower"		#define DEBUG_TYPE "aarch64-lower"

STATISTIC(NumTailCalls, "Number of tail calls");		STATISTIC(NumTailCalls, "Number of tail calls");
STATISTIC(NumShiftInserts, "Number of vector shift inserts");		STATISTIC(NumShiftInserts, "Number of vector shift inserts");
STATISTIC(NumOptimizedImms, "Number of times immediates were optimized");		STATISTIC(NumOptimizedImms, "Number of times immediates were optimized");

static cl::opt<bool>
EnableAArch64SlrGeneration("aarch64-shift-insert-generation", cl::Hidden,
cl::desc("Allow AArch64 SLI/SRI formation"),
cl::init(false));

// FIXME: The necessary dtprel relocations don't seem to be supported		// FIXME: The necessary dtprel relocations don't seem to be supported
// well in the GNU bfd and gold linkers at the moment. Therefore, by		// well in the GNU bfd and gold linkers at the moment. Therefore, by
// default, for now, fall back to GeneralDynamic code generation.		// default, for now, fall back to GeneralDynamic code generation.
cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(		cl::opt<bool> EnableAArch64ELFLocalDynamicTLSGeneration(
"aarch64-elf-ldtls-generation", cl::Hidden,		"aarch64-elf-ldtls-generation", cl::Hidden,
cl::desc("Allow AArch64 Local Dynamic TLS code generation"),		cl::desc("Allow AArch64 Local Dynamic TLS code generation"),
cl::init(false));		cl::init(false));

▲ Show 20 Lines • Show All 1,220 Lines • ▼ Show 20 Lines	const char *AArch64TargetLowering::getTargetNodeName(unsigned Opcode) const {
case AArch64ISD::TRN2: return "AArch64ISD::TRN2";		case AArch64ISD::TRN2: return "AArch64ISD::TRN2";
case AArch64ISD::REV16: return "AArch64ISD::REV16";		case AArch64ISD::REV16: return "AArch64ISD::REV16";
case AArch64ISD::REV32: return "AArch64ISD::REV32";		case AArch64ISD::REV32: return "AArch64ISD::REV32";
case AArch64ISD::REV64: return "AArch64ISD::REV64";		case AArch64ISD::REV64: return "AArch64ISD::REV64";
case AArch64ISD::EXT: return "AArch64ISD::EXT";		case AArch64ISD::EXT: return "AArch64ISD::EXT";
case AArch64ISD::VSHL: return "AArch64ISD::VSHL";		case AArch64ISD::VSHL: return "AArch64ISD::VSHL";
case AArch64ISD::VLSHR: return "AArch64ISD::VLSHR";		case AArch64ISD::VLSHR: return "AArch64ISD::VLSHR";
case AArch64ISD::VASHR: return "AArch64ISD::VASHR";		case AArch64ISD::VASHR: return "AArch64ISD::VASHR";
		case AArch64ISD::VSLI: return "AArch64ISD::VSLI";
		case AArch64ISD::VSRI: return "AArch64ISD::VSRI";
case AArch64ISD::CMEQ: return "AArch64ISD::CMEQ";		case AArch64ISD::CMEQ: return "AArch64ISD::CMEQ";
case AArch64ISD::CMGE: return "AArch64ISD::CMGE";		case AArch64ISD::CMGE: return "AArch64ISD::CMGE";
case AArch64ISD::CMGT: return "AArch64ISD::CMGT";		case AArch64ISD::CMGT: return "AArch64ISD::CMGT";
case AArch64ISD::CMHI: return "AArch64ISD::CMHI";		case AArch64ISD::CMHI: return "AArch64ISD::CMHI";
case AArch64ISD::CMHS: return "AArch64ISD::CMHS";		case AArch64ISD::CMHS: return "AArch64ISD::CMHS";
case AArch64ISD::FCMEQ: return "AArch64ISD::FCMEQ";		case AArch64ISD::FCMEQ: return "AArch64ISD::FCMEQ";
case AArch64ISD::FCMGE: return "AArch64ISD::FCMGE";		case AArch64ISD::FCMGE: return "AArch64ISD::FCMGE";
case AArch64ISD::FCMGT: return "AArch64ISD::FCMGT";		case AArch64ISD::FCMGT: return "AArch64ISD::FCMGT";
▲ Show 20 Lines • Show All 1,817 Lines • ▼ Show 20 Lines	case Intrinsic::eh_recoverfp: {
SDValue IncomingFPOp = Op.getOperand(2);		SDValue IncomingFPOp = Op.getOperand(2);
GlobalAddressSDNode *GSD = dyn_cast<GlobalAddressSDNode>(FnOp);		GlobalAddressSDNode *GSD = dyn_cast<GlobalAddressSDNode>(FnOp);
auto *Fn = dyn_cast_or_null<Function>(GSD ? GSD->getGlobal() : nullptr);		auto *Fn = dyn_cast_or_null<Function>(GSD ? GSD->getGlobal() : nullptr);
if (!Fn)		if (!Fn)
report_fatal_error(		report_fatal_error(
"llvm.eh.recoverfp must take a function as the first argument");		"llvm.eh.recoverfp must take a function as the first argument");
return IncomingFPOp;		return IncomingFPOp;
}		}

		case Intrinsic::aarch64_neon_vsri:
		case Intrinsic::aarch64_neon_vsli: {
		EVT Ty = Op.getValueType();

		if (!Ty.isVector())
		report_fatal_error("Unexpected type for aarch64_neon_vsli");

		uint64_t ShiftAmount = Op.getConstantOperandVal(3);
		unsigned ElemSizeInBits = Ty.getScalarSizeInBits();
		assert(ShiftAmount <= ElemSizeInBits);

		bool IsShiftRight = IntNo == Intrinsic::aarch64_neon_vsri;
		unsigned Opcode = IsShiftRight ? AArch64ISD::VSRI : AArch64ISD::VSLI;
		return DAG.getNode(Opcode, dl, Ty, Op.getOperand(1), Op.getOperand(2),
		Op.getOperand(3));
		}
}		}
}		}

bool AArch64TargetLowering::isVectorLoadExtDesirable(SDValue ExtVal) const {		bool AArch64TargetLowering::isVectorLoadExtDesirable(SDValue ExtVal) const {
return ExtVal.getValueType().isScalableVector();		return ExtVal.getValueType().isScalableVector();
}		}

// Custom lower trunc store for v4i8 vectors, since it is promoted to v4i16.		// Custom lower trunc store for v4i8 vectors, since it is promoted to v4i16.
▲ Show 20 Lines • Show All 4,761 Lines • ▼ Show 20 Lines	if (IID < Intrinsic::num_intrinsics)
return IID;		return IID;
return Intrinsic::not_intrinsic;		return Intrinsic::not_intrinsic;
}		}
}		}
}		}

// Attempt to form a vector S[LR]I from (or (and X, BvecC1), (lsl Y, C2)),		// Attempt to form a vector S[LR]I from (or (and X, BvecC1), (lsl Y, C2)),
// to (SLI X, Y, C2), where X and Y have matching vector types, BvecC1 is a		// to (SLI X, Y, C2), where X and Y have matching vector types, BvecC1 is a
// BUILD_VECTORs with constant element C1, C2 is a constant, and C1 == ~C2.		// BUILD_VECTORs with constant element C1, C2 is a constant, and:
// Also, logical shift right -> sri, with the same structure.		// - for the SLI case: C1 == ~(Ones(ElemSizeInBits) << C2)
		// - for the SRI case: C1 == ~(Ones(ElemSizeInBits) >> C2)
		// The (or (lsl Y, C2), (and X, BvecC1)) case is also handled.
static SDValue tryLowerToSLI(SDNode *N, SelectionDAG &DAG) {		static SDValue tryLowerToSLI(SDNode *N, SelectionDAG &DAG) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

if (!VT.isVector())		if (!VT.isVector())
return SDValue();		return SDValue();

SDLoc DL(N);		SDLoc DL(N);

// Is the first op an AND?		SDValue And;
const SDValue And = N->getOperand(0);		SDValue Shift;
if (And.getOpcode() != ISD::AND)
return SDValue();

// Is the second op an shl or lshr?		SDValue FirstOp = N->getOperand(0);
SDValue Shift = N->getOperand(1);		unsigned FirstOpc = FirstOp.getOpcode();
// This will have been turned into: AArch64ISD::VSHL vector, #shift		SDValue SecondOp = N->getOperand(1);
// or AArch64ISD::VLSHR vector, #shift		unsigned SecondOpc = SecondOp.getOpcode();
unsigned ShiftOpc = Shift.getOpcode();
if ((ShiftOpc != AArch64ISD::VSHL && ShiftOpc != AArch64ISD::VLSHR))		// Is one of the operands an AND or a BICi? The AND may have been optimised to
		// a BICi in order to use an immediate instead of a register.
		// Is the other operand an shl or lshr? This will have been turned into:
		// AArch64ISD::VSHL vector, #shift or AArch64ISD::VLSHR vector, #shift.
		if ((FirstOpc == ISD::AND \|\| FirstOpc == AArch64ISD::BICi) &&
		(SecondOpc == AArch64ISD::VSHL \|\| SecondOpc == AArch64ISD::VLSHR)) {
		And = FirstOp;
		Shift = SecondOp;

		} else if ((SecondOpc == ISD::AND \|\| SecondOpc == AArch64ISD::BICi) &&
		(FirstOpc == AArch64ISD::VSHL \|\| FirstOpc == AArch64ISD::VLSHR)) {
		And = SecondOp;
		Shift = FirstOp;
		} else
return SDValue();		return SDValue();
bool IsShiftRight = ShiftOpc == AArch64ISD::VLSHR;
		bool IsAnd = And.getOpcode() == ISD::AND;
		bool IsShiftRight = Shift.getOpcode() == AArch64ISD::VLSHR;

// Is the shift amount constant?		// Is the shift amount constant?
ConstantSDNode *C2node = dyn_cast<ConstantSDNode>(Shift.getOperand(1));		ConstantSDNode *C2node = dyn_cast<ConstantSDNode>(Shift.getOperand(1));
if (!C2node)		if (!C2node)
return SDValue();		return SDValue();

// Is the and mask vector all constant?
uint64_t C1;		uint64_t C1;
		if (IsAnd) {
		// Is the and mask vector all constant?
if (!isAllConstantBuildVector(And.getOperand(1), C1))		if (!isAllConstantBuildVector(And.getOperand(1), C1))
return SDValue();		return SDValue();
		} else {
		// Reconstruct the corresponding AND immediate from the two BICi immediates.
		dmgreenUnsubmitted Done Reply Inline Actions Can you add a comment saying this reconstructs the immediate from the BIC dmgreen: Can you add a comment saying this reconstructs the immediate from the BIC
		ConstantSDNode *C1nodeImm = dyn_cast<ConstantSDNode>(And.getOperand(1));
		ConstantSDNode *C1nodeShift = dyn_cast<ConstantSDNode>(And.getOperand(2));
		assert(C1nodeImm && C1nodeShift);
		dmgreenUnsubmitted Done Reply Inline Actions I think if this used C1 = ~(C1nodeImm->getZExtValue() << C1nodeShift->getZExtValue()) here, then the IsAnd condition below can be removed. dmgreen: I think if this used C1 = ~(C1nodeImm->getZExtValue() << C1nodeShift->getZExtValue()) here…
		C1 = ~(C1nodeImm->getZExtValue() << C1nodeShift->getZExtValue());
		}

// Is C1 == ~C2, taking into account how much one can shift elements of a		// Is C1 == ~(Ones(ElemSizeInBits) << C2) or
// particular size?		// C1 == ~(Ones(ElemSizeInBits) >> C2), taking into account
		// how much one can shift elements of a particular size?
uint64_t C2 = C2node->getZExtValue();		uint64_t C2 = C2node->getZExtValue();
unsigned ElemSizeInBits = VT.getScalarSizeInBits();		unsigned ElemSizeInBits = VT.getScalarSizeInBits();
if (C2 > ElemSizeInBits)		if (C2 > ElemSizeInBits)
return SDValue();		return SDValue();
unsigned ElemMask = (1 << ElemSizeInBits) - 1;
if ((C1 & ElemMask) != (~C2 & ElemMask))		APInt C1AsAPInt(ElemSizeInBits, C1);
		APInt RequiredC1 = IsShiftRight ? APInt::getHighBitsSet(ElemSizeInBits, C2)
		: APInt::getLowBitsSet(ElemSizeInBits, C2);
		if (C1AsAPInt != RequiredC1)
return SDValue();		return SDValue();

SDValue X = And.getOperand(0);		SDValue X = And.getOperand(0);
SDValue Y = Shift.getOperand(0);		SDValue Y = Shift.getOperand(0);

unsigned Intrin =		unsigned Inst = IsShiftRight ? AArch64ISD::VSRI : AArch64ISD::VSLI;
IsShiftRight ? Intrinsic::aarch64_neon_vsri : Intrinsic::aarch64_neon_vsli;		SDValue ResultSLI = DAG.getNode(Inst, DL, VT, X, Y, Shift.getOperand(1));
SDValue ResultSLI =
DAG.getNode(ISD::INTRINSIC_WO_CHAIN, DL, VT,
DAG.getConstant(Intrin, DL, MVT::i32), X, Y,
Shift.getOperand(1));

LLVM_DEBUG(dbgs() << "aarch64-lower: transformed: \n");		LLVM_DEBUG(dbgs() << "aarch64-lower: transformed: \n");
LLVM_DEBUG(N->dump(&DAG));		LLVM_DEBUG(N->dump(&DAG));
LLVM_DEBUG(dbgs() << "into: \n");		LLVM_DEBUG(dbgs() << "into: \n");
LLVM_DEBUG(ResultSLI->dump(&DAG));		LLVM_DEBUG(ResultSLI->dump(&DAG));

++NumShiftInserts;		++NumShiftInserts;
return ResultSLI;		return ResultSLI;
}		}

SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,		SDValue AArch64TargetLowering::LowerVectorOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
// Attempt to form a vector S[LR]I from (or (and X, C1), (lsl Y, C2))		// Attempt to form a vector S[LR]I from (or (and X, C1), (lsl Y, C2))
if (EnableAArch64SlrGeneration) {
if (SDValue Res = tryLowerToSLI(Op.getNode(), DAG))		if (SDValue Res = tryLowerToSLI(Op.getNode(), DAG))
return Res;		return Res;
}

EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

SDValue LHS = Op.getOperand(0);		SDValue LHS = Op.getOperand(0);
BuildVectorSDNode *BVN =		BuildVectorSDNode *BVN =
dyn_cast<BuildVectorSDNode>(Op.getOperand(1).getNode());		dyn_cast<BuildVectorSDNode>(Op.getOperand(1).getNode());
if (!BVN) {		if (!BVN) {
// OR commutes, so try swapping the operands.		// OR commutes, so try swapping the operands.
▲ Show 20 Lines • Show All 6,155 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines
	def SDT_AArch64MOVIshift : SDTypeProfile<1, 2, [SDTCisInt<1>, SDTCisInt<2>]>;			def SDT_AArch64MOVIshift : SDTypeProfile<1, 2, [SDTCisInt<1>, SDTCisInt<2>]>;
	def SDT_AArch64vecimm : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,			def SDT_AArch64vecimm : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisInt<2>, SDTCisInt<3>]>;			SDTCisInt<2>, SDTCisInt<3>]>;
	def SDT_AArch64UnaryVec: SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;			def SDT_AArch64UnaryVec: SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;
	def SDT_AArch64ExtVec: SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,			def SDT_AArch64ExtVec: SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisSameAs<0,2>, SDTCisInt<3>]>;			SDTCisSameAs<0,2>, SDTCisInt<3>]>;
	def SDT_AArch64vshift : SDTypeProfile<1, 2, [SDTCisSameAs<0,1>, SDTCisInt<2>]>;			def SDT_AArch64vshift : SDTypeProfile<1, 2, [SDTCisSameAs<0,1>, SDTCisInt<2>]>;

				def SDT_AArch64vshiftinsert : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisInt<3>,
				SDTCisSameAs<0,1>,
				SDTCisSameAs<0,2>]>;

	def SDT_AArch64unvec : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;			def SDT_AArch64unvec : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisSameAs<0,1>]>;
	def SDT_AArch64fcmpz : SDTypeProfile<1, 1, []>;			def SDT_AArch64fcmpz : SDTypeProfile<1, 1, []>;
	def SDT_AArch64fcmp : SDTypeProfile<1, 2, [SDTCisSameAs<1,2>]>;			def SDT_AArch64fcmp : SDTypeProfile<1, 2, [SDTCisSameAs<1,2>]>;
	def SDT_AArch64binvec : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,			def SDT_AArch64binvec : SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisSameAs<0,2>]>;			SDTCisSameAs<0,2>]>;
	def SDT_AArch64trivec : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,			def SDT_AArch64trivec : SDTypeProfile<1, 3, [SDTCisVec<0>, SDTCisSameAs<0,1>,
	SDTCisSameAs<0,2>,			SDTCisSameAs<0,2>,
	SDTCisSameAs<0,3>]>;			SDTCisSameAs<0,3>]>;
	▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines
	def AArch64vashr : SDNode<"AArch64ISD::VASHR", SDT_AArch64vshift>;			def AArch64vashr : SDNode<"AArch64ISD::VASHR", SDT_AArch64vshift>;
	def AArch64vlshr : SDNode<"AArch64ISD::VLSHR", SDT_AArch64vshift>;			def AArch64vlshr : SDNode<"AArch64ISD::VLSHR", SDT_AArch64vshift>;
	def AArch64vshl : SDNode<"AArch64ISD::VSHL", SDT_AArch64vshift>;			def AArch64vshl : SDNode<"AArch64ISD::VSHL", SDT_AArch64vshift>;
	def AArch64sqshli : SDNode<"AArch64ISD::SQSHL_I", SDT_AArch64vshift>;			def AArch64sqshli : SDNode<"AArch64ISD::SQSHL_I", SDT_AArch64vshift>;
	def AArch64uqshli : SDNode<"AArch64ISD::UQSHL_I", SDT_AArch64vshift>;			def AArch64uqshli : SDNode<"AArch64ISD::UQSHL_I", SDT_AArch64vshift>;
	def AArch64sqshlui : SDNode<"AArch64ISD::SQSHLU_I", SDT_AArch64vshift>;			def AArch64sqshlui : SDNode<"AArch64ISD::SQSHLU_I", SDT_AArch64vshift>;
	def AArch64srshri : SDNode<"AArch64ISD::SRSHR_I", SDT_AArch64vshift>;			def AArch64srshri : SDNode<"AArch64ISD::SRSHR_I", SDT_AArch64vshift>;
	def AArch64urshri : SDNode<"AArch64ISD::URSHR_I", SDT_AArch64vshift>;			def AArch64urshri : SDNode<"AArch64ISD::URSHR_I", SDT_AArch64vshift>;
				def AArch64vsli : SDNode<"AArch64ISD::VSLI", SDT_AArch64vshiftinsert>;
				def AArch64vsri : SDNode<"AArch64ISD::VSRI", SDT_AArch64vshiftinsert>;

	def AArch64not: SDNode<"AArch64ISD::NOT", SDT_AArch64unvec>;			def AArch64not: SDNode<"AArch64ISD::NOT", SDT_AArch64unvec>;
	def AArch64bit: SDNode<"AArch64ISD::BIT", SDT_AArch64trivec>;			def AArch64bit: SDNode<"AArch64ISD::BIT", SDT_AArch64trivec>;
	def AArch64bsp: SDNode<"AArch64ISD::BSP", SDT_AArch64trivec>;			def AArch64bsp: SDNode<"AArch64ISD::BSP", SDT_AArch64trivec>;

	def AArch64cmeq: SDNode<"AArch64ISD::CMEQ", SDT_AArch64binvec>;			def AArch64cmeq: SDNode<"AArch64ISD::CMEQ", SDT_AArch64binvec>;
	def AArch64cmge: SDNode<"AArch64ISD::CMGE", SDT_AArch64binvec>;			def AArch64cmge: SDNode<"AArch64ISD::CMGE", SDT_AArch64binvec>;
	def AArch64cmgt: SDNode<"AArch64ISD::CMGT", SDT_AArch64binvec>;			def AArch64cmgt: SDNode<"AArch64ISD::CMGT", SDT_AArch64binvec>;
	▲ Show 20 Lines • Show All 5,395 Lines • ▼ Show 20 Lines
	defm FCVTZU:SIMDVectorRShiftSD<1, 0b11111, "fcvtzu", int_aarch64_neon_vcvtfp2fxu>;			defm FCVTZU:SIMDVectorRShiftSD<1, 0b11111, "fcvtzu", int_aarch64_neon_vcvtfp2fxu>;
	defm SCVTF: SIMDVectorRShiftToFP<0, 0b11100, "scvtf",			defm SCVTF: SIMDVectorRShiftToFP<0, 0b11100, "scvtf",
	int_aarch64_neon_vcvtfxs2fp>;			int_aarch64_neon_vcvtfxs2fp>;
	defm RSHRN : SIMDVectorRShiftNarrowBHS<0, 0b10001, "rshrn",			defm RSHRN : SIMDVectorRShiftNarrowBHS<0, 0b10001, "rshrn",
	int_aarch64_neon_rshrn>;			int_aarch64_neon_rshrn>;
	defm SHL : SIMDVectorLShiftBHSD<0, 0b01010, "shl", AArch64vshl>;			defm SHL : SIMDVectorLShiftBHSD<0, 0b01010, "shl", AArch64vshl>;
	defm SHRN : SIMDVectorRShiftNarrowBHS<0, 0b10000, "shrn",			defm SHRN : SIMDVectorRShiftNarrowBHS<0, 0b10000, "shrn",
	BinOpFrag<(trunc (AArch64vashr node:$LHS, node:$RHS))>>;			BinOpFrag<(trunc (AArch64vashr node:$LHS, node:$RHS))>>;
	defm SLI : SIMDVectorLShiftBHSDTied<1, 0b01010, "sli", int_aarch64_neon_vsli>;			defm SLI : SIMDVectorLShiftBHSDTied<1, 0b01010, "sli", AArch64vsli>;
	def : Pat<(v1i64 (int_aarch64_neon_vsli (v1i64 FPR64:$Rd), (v1i64 FPR64:$Rn),			def : Pat<(v1i64 (AArch64vsli (v1i64 FPR64:$Rd), (v1i64 FPR64:$Rn),
	(i32 vecshiftL64:$imm))),			(i32 vecshiftL64:$imm))),
	(SLId FPR64:$Rd, FPR64:$Rn, vecshiftL64:$imm)>;			(SLId FPR64:$Rd, FPR64:$Rn, vecshiftL64:$imm)>;
	defm SQRSHRN : SIMDVectorRShiftNarrowBHS<0, 0b10011, "sqrshrn",			defm SQRSHRN : SIMDVectorRShiftNarrowBHS<0, 0b10011, "sqrshrn",
	int_aarch64_neon_sqrshrn>;			int_aarch64_neon_sqrshrn>;
	defm SQRSHRUN: SIMDVectorRShiftNarrowBHS<1, 0b10001, "sqrshrun",			defm SQRSHRUN: SIMDVectorRShiftNarrowBHS<1, 0b10001, "sqrshrun",
	int_aarch64_neon_sqrshrun>;			int_aarch64_neon_sqrshrun>;
	defm SQSHLU : SIMDVectorLShiftBHSD<1, 0b01100, "sqshlu", AArch64sqshlui>;			defm SQSHLU : SIMDVectorLShiftBHSD<1, 0b01100, "sqshlu", AArch64sqshlui>;
	defm SQSHL : SIMDVectorLShiftBHSD<0, 0b01110, "sqshl", AArch64sqshli>;			defm SQSHL : SIMDVectorLShiftBHSD<0, 0b01110, "sqshl", AArch64sqshli>;
	defm SQSHRN : SIMDVectorRShiftNarrowBHS<0, 0b10010, "sqshrn",			defm SQSHRN : SIMDVectorRShiftNarrowBHS<0, 0b10010, "sqshrn",
	int_aarch64_neon_sqshrn>;			int_aarch64_neon_sqshrn>;
	defm SQSHRUN : SIMDVectorRShiftNarrowBHS<1, 0b10000, "sqshrun",			defm SQSHRUN : SIMDVectorRShiftNarrowBHS<1, 0b10000, "sqshrun",
	int_aarch64_neon_sqshrun>;			int_aarch64_neon_sqshrun>;
	defm SRI : SIMDVectorRShiftBHSDTied<1, 0b01000, "sri", int_aarch64_neon_vsri>;			defm SRI : SIMDVectorRShiftBHSDTied<1, 0b01000, "sri", AArch64vsri>;
	def : Pat<(v1i64 (int_aarch64_neon_vsri (v1i64 FPR64:$Rd), (v1i64 FPR64:$Rn),			def : Pat<(v1i64 (AArch64vsri (v1i64 FPR64:$Rd), (v1i64 FPR64:$Rn),
	(i32 vecshiftR64:$imm))),			(i32 vecshiftR64:$imm))),
	(SRId FPR64:$Rd, FPR64:$Rn, vecshiftR64:$imm)>;			(SRId FPR64:$Rd, FPR64:$Rn, vecshiftR64:$imm)>;
	defm SRSHR : SIMDVectorRShiftBHSD<0, 0b00100, "srshr", AArch64srshri>;			defm SRSHR : SIMDVectorRShiftBHSD<0, 0b00100, "srshr", AArch64srshri>;
	defm SRSRA : SIMDVectorRShiftBHSDTied<0, 0b00110, "srsra",			defm SRSRA : SIMDVectorRShiftBHSDTied<0, 0b00110, "srsra",
	TriOpFrag<(add node:$LHS,			TriOpFrag<(add node:$LHS,
	(AArch64srshri node:$MHS, node:$RHS))> >;			(AArch64srshri node:$MHS, node:$RHS))> >;
	defm SSHLL : SIMDVectorLShiftLongBHSD<0, 0b10100, "sshll",			defm SSHLL : SIMDVectorLShiftLongBHSD<0, 0b10100, "sshll",
	BinOpFrag<(AArch64vshl (sext node:$LHS), node:$RHS)>>;			BinOpFrag<(AArch64vshl (sext node:$LHS), node:$RHS)>>;
	▲ Show 20 Lines • Show All 1,513 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll

	; RUN: llc < %s -aarch64-shift-insert-generation=true -mtriple=arm64-eabi -aarch64-neon-syntax=apple \| FileCheck %s			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple \| FileCheck %s
				efriedmaUnsubmitted Done Reply Inline Actions Please generate the test using update-llc-test-checks.py efriedma: Please generate the test using update-llc-test-checks.py

	define void @testLeftGood(<16 x i8> %src1, <16 x i8> %src2, <16 x i8>* %dest) nounwind {			define void @testLeftGood8x8(<8 x i8> %src1, <8 x i8> %src2, <8 x i8>* %dest) nounwind {
	; CHECK-LABEL: testLeftGood:			; CHECK-LABEL: testLeftGood8x8:
	; CHECK: sli.16b v0, v1, #3			; CHECK: // %bb.0:
	%and.i = and <16 x i8> %src1, <i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252>			; CHECK-NEXT: sli.8b v0, v1, #3
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i8> %src1, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
				%vshl_n = shl <8 x i8> %src2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
				%result = or <8 x i8> %and.i, %vshl_n
				store <8 x i8> %result, <8 x i8>* %dest, align 8
				ret void
				}

				define void @testLeftBad8x8(<8 x i8> %src1, <8 x i8> %src2, <8 x i8>* %dest) nounwind {
				; CHECK-LABEL: testLeftBad8x8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi.8b v2, #165
				; CHECK-NEXT: and.8b v0, v0, v2
				; CHECK-NEXT: shl.8b v1, v1, #1
				; CHECK-NEXT: orr.8b v0, v0, v1
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i8> %src1, <i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165>
				%vshl_n = shl <8 x i8> %src2, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
				%result = or <8 x i8> %and.i, %vshl_n
				store <8 x i8> %result, <8 x i8>* %dest, align 8
				ret void
				}

				define void @testRightGood8x8(<8 x i8> %src1, <8 x i8> %src2, <8 x i8>* %dest) nounwind {
				; CHECK-LABEL: testRightGood8x8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sri.8b v0, v1, #3
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i8> %src1, <i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224>
				%vshl_n = lshr <8 x i8> %src2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
				%result = or <8 x i8> %and.i, %vshl_n
				store <8 x i8> %result, <8 x i8>* %dest, align 8
				ret void
				}

				define void @testRightBad8x8(<8 x i8> %src1, <8 x i8> %src2, <8 x i8>* %dest) nounwind {
				; CHECK-LABEL: testRightBad8x8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: movi.8b v2, #165
				; CHECK-NEXT: and.8b v0, v0, v2
				; CHECK-NEXT: ushr.8b v1, v1, #1
				; CHECK-NEXT: orr.8b v0, v0, v1
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i8> %src1, <i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165>
				%vshl_n = lshr <8 x i8> %src2, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
				%result = or <8 x i8> %and.i, %vshl_n
				store <8 x i8> %result, <8 x i8>* %dest, align 8
				ret void
				}

				define void @testLeftGood16x8(<16 x i8> %src1, <16 x i8> %src2, <16 x i8>* %dest) nounwind {
				; CHECK-LABEL: testLeftGood16x8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sli.16b v0, v1, #3
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <16 x i8> %src1, <i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7, i8 7>
	%vshl_n = shl <16 x i8> %src2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%vshl_n = shl <16 x i8> %src2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	%result = or <16 x i8> %and.i, %vshl_n			%result = or <16 x i8> %and.i, %vshl_n
	store <16 x i8> %result, <16 x i8>* %dest, align 16			store <16 x i8> %result, <16 x i8>* %dest, align 16
	ret void			ret void
	}			}

	define void @testLeftBad(<16 x i8> %src1, <16 x i8> %src2, <16 x i8>* %dest) nounwind {			define void @testLeftBad16x8(<16 x i8> %src1, <16 x i8> %src2, <16 x i8>* %dest) nounwind {
	; CHECK-LABEL: testLeftBad:			; CHECK-LABEL: testLeftBad16x8:
	; CHECK-NOT: sli			; CHECK: // %bb.0:
				; CHECK-NEXT: movi.16b v2, #165
				; CHECK-NEXT: and.16b v0, v0, v2
				; CHECK-NEXT: shl.16b v1, v1, #1
				; CHECK-NEXT: orr.16b v0, v0, v1
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
	%and.i = and <16 x i8> %src1, <i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165>			%and.i = and <16 x i8> %src1, <i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165>
	%vshl_n = shl <16 x i8> %src2, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%vshl_n = shl <16 x i8> %src2, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%result = or <16 x i8> %and.i, %vshl_n			%result = or <16 x i8> %and.i, %vshl_n
	store <16 x i8> %result, <16 x i8>* %dest, align 16			store <16 x i8> %result, <16 x i8>* %dest, align 16
	ret void			ret void
	}			}

	define void @testRightGood(<16 x i8> %src1, <16 x i8> %src2, <16 x i8>* %dest) nounwind {			define void @testRightGood16x8(<16 x i8> %src1, <16 x i8> %src2, <16 x i8>* %dest) nounwind {
	; CHECK-LABEL: testRightGood:			; CHECK-LABEL: testRightGood16x8:
	; CHECK: sri.16b v0, v1, #3			; CHECK: // %bb.0:
	%and.i = and <16 x i8> %src1, <i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252, i8 252>			; CHECK-NEXT: sri.16b v0, v1, #3
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <16 x i8> %src1, <i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224, i8 224>
	%vshl_n = lshr <16 x i8> %src2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>			%vshl_n = lshr <16 x i8> %src2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
	%result = or <16 x i8> %and.i, %vshl_n			%result = or <16 x i8> %and.i, %vshl_n
	store <16 x i8> %result, <16 x i8>* %dest, align 16			store <16 x i8> %result, <16 x i8>* %dest, align 16
	ret void			ret void
	}			}

	define void @testRightBad(<16 x i8> %src1, <16 x i8> %src2, <16 x i8>* %dest) nounwind {			define void @testRightBad16x8(<16 x i8> %src1, <16 x i8> %src2, <16 x i8>* %dest) nounwind {
	; CHECK-LABEL: testRightBad:			; CHECK-LABEL: testRightBad16x8:
	; CHECK-NOT: sri			; CHECK: // %bb.0:
				; CHECK-NEXT: movi.16b v2, #165
				; CHECK-NEXT: and.16b v0, v0, v2
				; CHECK-NEXT: ushr.16b v1, v1, #1
				; CHECK-NEXT: orr.16b v0, v0, v1
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
	%and.i = and <16 x i8> %src1, <i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165>			%and.i = and <16 x i8> %src1, <i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165, i8 165>
	%vshl_n = lshr <16 x i8> %src2, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>			%vshl_n = lshr <16 x i8> %src2, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
	%result = or <16 x i8> %and.i, %vshl_n			%result = or <16 x i8> %and.i, %vshl_n
	store <16 x i8> %result, <16 x i8>* %dest, align 16			store <16 x i8> %result, <16 x i8>* %dest, align 16
	ret void			ret void
	}			}

				define void @testLeftGood4x16(<4 x i16> %src1, <4 x i16> %src2, <4 x i16>* %dest) nounwind {
				; CHECK-LABEL: testLeftGood4x16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sli.4h v0, v1, #14
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <4 x i16> %src1, <i16 16383, i16 16383, i16 16383, i16 16383>
				%vshl_n = shl <4 x i16> %src2, <i16 14, i16 14, i16 14, i16 14>
				%result = or <4 x i16> %and.i, %vshl_n
				store <4 x i16> %result, <4 x i16>* %dest, align 8
				ret void
				}

				define void @testLeftBad4x16(<4 x i16> %src1, <4 x i16> %src2, <4 x i16>* %dest) nounwind {
				; CHECK-LABEL: testLeftBad4x16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #16500
				; CHECK-NEXT: dup.4h v2, w8
				; CHECK-NEXT: and.8b v0, v0, v2
				; CHECK-NEXT: shl.4h v1, v1, #14
				; CHECK-NEXT: orr.8b v0, v0, v1
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <4 x i16> %src1, <i16 16500, i16 16500, i16 16500, i16 16500>
				%vshl_n = shl <4 x i16> %src2, <i16 14, i16 14, i16 14, i16 14>
				%result = or <4 x i16> %and.i, %vshl_n
				store <4 x i16> %result, <4 x i16>* %dest, align 8
				ret void
				}

				define void @testRightGood4x16(<4 x i16> %src1, <4 x i16> %src2, <4 x i16>* %dest) nounwind {
				; CHECK-LABEL: testRightGood4x16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sri.4h v0, v1, #14
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <4 x i16> %src1, <i16 65532, i16 65532, i16 65532, i16 65532>
				%vshl_n = lshr <4 x i16> %src2, <i16 14, i16 14, i16 14, i16 14>
				%result = or <4 x i16> %and.i, %vshl_n
				store <4 x i16> %result, <4 x i16>* %dest, align 8
				ret void
				}

				define void @testRightBad4x16(<4 x i16> %src1, <4 x i16> %src2, <4 x i16>* %dest) nounwind {
				; CHECK-LABEL: testRightBad4x16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #16500
				; CHECK-NEXT: dup.4h v2, w8
				; CHECK-NEXT: and.8b v0, v0, v2
				; CHECK-NEXT: ushr.4h v1, v1, #14
				; CHECK-NEXT: orr.8b v0, v0, v1
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <4 x i16> %src1, <i16 16500, i16 16500, i16 16500, i16 16500>
				%vshl_n = lshr <4 x i16> %src2, <i16 14, i16 14, i16 14, i16 14>
				%result = or <4 x i16> %and.i, %vshl_n
				store <4 x i16> %result, <4 x i16>* %dest, align 8
				ret void
				}

				define void @testLeftGood8x16(<8 x i16> %src1, <8 x i16> %src2, <8 x i16>* %dest) nounwind {
				; CHECK-LABEL: testLeftGood8x16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sli.8h v0, v1, #14
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i16> %src1, <i16 16383, i16 16383, i16 16383, i16 16383, i16 16383, i16 16383, i16 16383, i16 16383>
				%vshl_n = shl <8 x i16> %src2, <i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14>
				%result = or <8 x i16> %and.i, %vshl_n
				store <8 x i16> %result, <8 x i16>* %dest, align 16
				ret void
				}

				define void @testLeftBad8x16(<8 x i16> %src1, <8 x i16> %src2, <8 x i16>* %dest) nounwind {
				; CHECK-LABEL: testLeftBad8x16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #16500
				; CHECK-NEXT: dup.8h v2, w8
				; CHECK-NEXT: and.16b v0, v0, v2
				; CHECK-NEXT: shl.8h v1, v1, #14
				; CHECK-NEXT: orr.16b v0, v0, v1
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i16> %src1, <i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500>
				%vshl_n = shl <8 x i16> %src2, <i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14>
				%result = or <8 x i16> %and.i, %vshl_n
				store <8 x i16> %result, <8 x i16>* %dest, align 16
				ret void
				}

				define void @testRightGood8x16(<8 x i16> %src1, <8 x i16> %src2, <8 x i16>* %dest) nounwind {
				; CHECK-LABEL: testRightGood8x16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sri.8h v0, v1, #14
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i16> %src1, <i16 65532, i16 65532, i16 65532, i16 65532, i16 65532, i16 65532, i16 65532, i16 65532>
				%vshl_n = lshr <8 x i16> %src2, <i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14>
				%result = or <8 x i16> %and.i, %vshl_n
				store <8 x i16> %result, <8 x i16>* %dest, align 16
				ret void
				}

				define void @testRightBad8x16(<8 x i16> %src1, <8 x i16> %src2, <8 x i16>* %dest) nounwind {
				; CHECK-LABEL: testRightBad8x16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #16500
				; CHECK-NEXT: dup.8h v2, w8
				; CHECK-NEXT: and.16b v0, v0, v2
				; CHECK-NEXT: ushr.8h v1, v1, #14
				; CHECK-NEXT: orr.16b v0, v0, v1
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i16> %src1, <i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500, i16 16500>
				%vshl_n = lshr <8 x i16> %src2, <i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14, i16 14>
				%result = or <8 x i16> %and.i, %vshl_n
				store <8 x i16> %result, <8 x i16>* %dest, align 16
				ret void
				}

				define void @testLeftGood2x32(<2 x i32> %src1, <2 x i32> %src2, <2 x i32>* %dest) nounwind {
				; CHECK-LABEL: testLeftGood2x32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sli.2s v0, v1, #22
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <2 x i32> %src1, <i32 4194303, i32 4194303>
				%vshl_n = shl <2 x i32> %src2, <i32 22, i32 22>
				%result = or <2 x i32> %and.i, %vshl_n
				store <2 x i32> %result, <2 x i32>* %dest, align 8
				ret void
				}

				define void @testLeftBad2x32(<2 x i32> %src1, <2 x i32> %src2, <2 x i32>* %dest) nounwind {
				; CHECK-LABEL: testLeftBad2x32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #4194300
				; CHECK-NEXT: dup.2s v2, w8
				; CHECK-NEXT: and.8b v0, v0, v2
				; CHECK-NEXT: shl.2s v1, v1, #22
				; CHECK-NEXT: orr.8b v0, v0, v1
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <2 x i32> %src1, <i32 4194300, i32 4194300>
				%vshl_n = shl <2 x i32> %src2, <i32 22, i32 22>
				%result = or <2 x i32> %and.i, %vshl_n
				store <2 x i32> %result, <2 x i32>* %dest, align 8
				ret void
				}

				define void @testRightGood2x32(<2 x i32> %src1, <2 x i32> %src2, <2 x i32>* %dest) nounwind {
				; CHECK-LABEL: testRightGood2x32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sri.2s v0, v1, #22
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <2 x i32> %src1, <i32 4294966272, i32 4294966272>
				%vshl_n = lshr <2 x i32> %src2, <i32 22, i32 22>
				%result = or <2 x i32> %and.i, %vshl_n
				store <2 x i32> %result, <2 x i32>* %dest, align 8
				ret void
				}

				define void @testRightBad2x32(<2 x i32> %src1, <2 x i32> %src2, <2 x i32>* %dest) nounwind {
				; CHECK-LABEL: testRightBad2x32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #4194300
				; CHECK-NEXT: dup.2s v2, w8
				; CHECK-NEXT: and.8b v0, v0, v2
				; CHECK-NEXT: ushr.2s v1, v1, #22
				; CHECK-NEXT: orr.8b v0, v0, v1
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <2 x i32> %src1, <i32 4194300, i32 4194300>
				%vshl_n = lshr <2 x i32> %src2, <i32 22, i32 22>
				%result = or <2 x i32> %and.i, %vshl_n
				store <2 x i32> %result, <2 x i32>* %dest, align 8
				ret void
				}

				define void @testLeftGood4x32(<4 x i32> %src1, <4 x i32> %src2, <4 x i32>* %dest) nounwind {
				; CHECK-LABEL: testLeftGood4x32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sli.4s v0, v1, #22
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <4 x i32> %src1, <i32 4194303, i32 4194303, i32 4194303, i32 4194303>
				%vshl_n = shl <4 x i32> %src2, <i32 22, i32 22, i32 22, i32 22>
				%result = or <4 x i32> %and.i, %vshl_n
				store <4 x i32> %result, <4 x i32>* %dest, align 16
				ret void
				}

				define void @testLeftBad4x32(<4 x i32> %src1, <4 x i32> %src2, <4 x i32>* %dest) nounwind {
				; CHECK-LABEL: testLeftBad4x32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #4194300
				; CHECK-NEXT: dup.4s v2, w8
				; CHECK-NEXT: and.16b v0, v0, v2
				; CHECK-NEXT: shl.4s v1, v1, #22
				; CHECK-NEXT: orr.16b v0, v0, v1
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <4 x i32> %src1, <i32 4194300, i32 4194300, i32 4194300, i32 4194300>
				%vshl_n = shl <4 x i32> %src2, <i32 22, i32 22, i32 22, i32 22>
				%result = or <4 x i32> %and.i, %vshl_n
				store <4 x i32> %result, <4 x i32>* %dest, align 16
				ret void
				}

				define void @testRightGood4x32(<4 x i32> %src1, <4 x i32> %src2, <4 x i32>* %dest) nounwind {
				; CHECK-LABEL: testRightGood4x32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sri.4s v0, v1, #22
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <4 x i32> %src1, <i32 4294966272, i32 4294966272, i32 4294966272, i32 4294966272>
				%vshl_n = lshr <4 x i32> %src2, <i32 22, i32 22, i32 22, i32 22>
				%result = or <4 x i32> %and.i, %vshl_n
				store <4 x i32> %result, <4 x i32>* %dest, align 16
				ret void
				}

				define void @testRightBad4x32(<4 x i32> %src1, <4 x i32> %src2, <4 x i32>* %dest) nounwind {
				; CHECK-LABEL: testRightBad4x32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov w8, #4194300
				; CHECK-NEXT: dup.4s v2, w8
				; CHECK-NEXT: and.16b v0, v0, v2
				; CHECK-NEXT: ushr.4s v1, v1, #22
				; CHECK-NEXT: orr.16b v0, v0, v1
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <4 x i32> %src1, <i32 4194300, i32 4194300, i32 4194300, i32 4194300>
				%vshl_n = lshr <4 x i32> %src2, <i32 22, i32 22, i32 22, i32 22>
				%result = or <4 x i32> %and.i, %vshl_n
				store <4 x i32> %result, <4 x i32>* %dest, align 16
				ret void
				}

				define void @testLeftGood2x64(<2 x i64> %src1, <2 x i64> %src2, <2 x i64>* %dest) nounwind {
				; CHECK-LABEL: testLeftGood2x64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sli.2d v0, v1, #48
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <2 x i64> %src1, <i64 281474976710655, i64 281474976710655>
				%vshl_n = shl <2 x i64> %src2, <i64 48, i64 48>
				%result = or <2 x i64> %and.i, %vshl_n
				store <2 x i64> %result, <2 x i64>* %dest, align 16
				ret void
				}

				define void @testLeftBad2x64(<2 x i64> %src1, <2 x i64> %src2, <2 x i64>* %dest) nounwind {
				; CHECK-LABEL: testLeftBad2x64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov x8, #10
				; CHECK-NEXT: movk x8, #1, lsl #48
				; CHECK-NEXT: dup.2d v2, x8
				; CHECK-NEXT: and.16b v0, v0, v2
				; CHECK-NEXT: shl.2d v1, v1, #48
				; CHECK-NEXT: orr.16b v0, v0, v1
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <2 x i64> %src1, <i64 281474976710666, i64 281474976710666>
				%vshl_n = shl <2 x i64> %src2, <i64 48, i64 48>
				%result = or <2 x i64> %and.i, %vshl_n
				store <2 x i64> %result, <2 x i64>* %dest, align 16
				ret void
				}

				define void @testRightGood2x64(<2 x i64> %src1, <2 x i64> %src2, <2 x i64>* %dest) nounwind {
				; CHECK-LABEL: testRightGood2x64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: sri.2d v0, v1, #48
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <2 x i64> %src1, <i64 18446744073709486080, i64 18446744073709486080>
				%vshl_n = lshr <2 x i64> %src2, <i64 48, i64 48>
				%result = or <2 x i64> %and.i, %vshl_n
				store <2 x i64> %result, <2 x i64>* %dest, align 16
				ret void
				}

				define void @testRightBad2x64(<2 x i64> %src1, <2 x i64> %src2, <2 x i64>* %dest) nounwind {
				; CHECK-LABEL: testRightBad2x64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: mov x8, #10
				; CHECK-NEXT: movk x8, #1, lsl #48
				; CHECK-NEXT: dup.2d v2, x8
				; CHECK-NEXT: and.16b v0, v0, v2
				; CHECK-NEXT: ushr.2d v1, v1, #48
				; CHECK-NEXT: orr.16b v0, v0, v1
				; CHECK-NEXT: str q0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <2 x i64> %src1, <i64 281474976710666, i64 281474976710666>
				%vshl_n = lshr <2 x i64> %src2, <i64 48, i64 48>
				%result = or <2 x i64> %and.i, %vshl_n
				store <2 x i64> %result, <2 x i64>* %dest, align 16
				ret void
				}

				define void @testLeftShouldNotCreateSLI1x128(<1 x i128> %src1, <1 x i128> %src2, <1 x i128>* %dest) nounwind {
				; CHECK-LABEL: testLeftShouldNotCreateSLI1x128:
				; CHECK: // %bb.0:
				; CHECK-NEXT: bfi x1, x2, #6, #58
				; CHECK-NEXT: stp x0, x1, [x4]
				; CHECK-NEXT: ret
				%and.i = and <1 x i128> %src1, <i128 1180591620717411303423>
				%vshl_n = shl <1 x i128> %src2, <i128 70>
				%result = or <1 x i128> %and.i, %vshl_n
				store <1 x i128> %result, <1 x i128>* %dest, align 16
				ret void
				}

				define void @testLeftNotAllConstantBuildVec8x8(<8 x i8> %src1, <8 x i8> %src2, <8 x i8>* %dest) nounwind {
				; CHECK-LABEL: testLeftNotAllConstantBuildVec8x8:
				; CHECK: // %bb.0:
				; CHECK-NEXT: adrp x8, .LCPI29_0
				; CHECK-NEXT: ldr d2, [x8, :lo12:.LCPI29_0]
				; CHECK-NEXT: shl.8b v1, v1, #3
				; CHECK-NEXT: and.8b v0, v0, v2
				; CHECK-NEXT: orr.8b v0, v0, v1
				; CHECK-NEXT: str d0, [x0]
				; CHECK-NEXT: ret
				%and.i = and <8 x i8> %src1, <i8 7, i8 7, i8 255, i8 7, i8 7, i8 7, i8 255, i8 7>
				%vshl_n = shl <8 x i8> %src2, <i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3, i8 3>
				%result = or <8 x i8> %and.i, %vshl_n
				store <8 x i8> %result, <8 x i8>* %dest, align 8
				ret void
				}