This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3
AArch64.td
1/2
AArch64ISelDAGToDAG.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/1
aarch64-fold-lslfast.ll
1/1
arm64-addr-mode-folding.ll
-
extract-bits.ll

Differential D155470

[AArch64] LSLFast to fold onto base address by default
Needs ReviewPublic

Authored by harviniriawan on Jul 17 2023, 8:14 AM.

Download Raw Diff

Details

Reviewers

dmgreen
chill
fhahn
Allen

Summary

Most CPUs have dedicated adder & shifter to compute base address of

loads and stores, hence they are always free to use

Older CPUs incur extra 1 cycle when doing load with left shift by 2, don't fold LSL to base address in these cases, add new feature for this

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

harviniriawan created this revision.Jul 17 2023, 8:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 8:14 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

harviniriawan requested review of this revision.Jul 17 2023, 8:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 8:14 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I agree that it makes sense to do this more aggressively from looking at some of the optimisation manuals. (lsl-fast has taken the odd route of being originally added to represent when shifts into addressing operands were quick, but has started now to just mean that the add with shift is cheap). Some of the older optimization manual mention shifts of 2 being slower, is that something that we need to take into account? I'm not sure about other non-arm cores too. Presumably there was a reason why people originally believed that operands with shifts would be better as separate instructions.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
664	Subtarget->hasLSLFast() \|\| FoldToBaseAddr
llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
189–190	These CHECK prefixes can be removed now, if they are all expected to be the same.
llvm/test/CodeGen/AArch64/arm64-addr-mode-folding.ll
127–128	Make sure to remove the old CHECK lines. It is probably worth updating the check lines in a separate patch, so that just the differences can be shown here.
llvm/test/CodeGen/AArch64/arm64-fold-address.ll
62 ↗	(On Diff #541046)	This looks like it hasn't been generated properly. It might mean the update script doesn't like the triple.

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 17 2023, 11:25 AM

Harbormaster completed remote builds in B245865: Diff 541046.Jul 17 2023, 12:19 PM

Agree with @dmgreen, fold small shift into addressing operands don't create the instruction latency and reduce register usage, GCC has done this folding, https://gcc.godbolt.org/z/68zEq8x81

harviniriawan updated this revision to Diff 542553.Jul 20 2023, 9:00 AM

harviniriawan marked 4 inline comments as done.

harviniriawan edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B246938: Diff 542553.Jul 20 2023, 12:14 PM

harviniriawan updated this revision to Diff 542804.Jul 21 2023, 1:52 AM

Harbormaster completed remote builds in B247134: Diff 542804.Jul 21 2023, 3:55 AM

Sorry for the delay. I've been looking at something very related recently (folding extends into address operands), so I may have become overly opinionated.

I think that it makes sense to split LSLFast into an address and alu part, but not as a part of this patch. From the optimization guides it then looks like we have 4 case where it is or isn't better to fold into the addressing operands (with multiple uses):

None. The current default. If it has multiple uses don't fold it.
LSLFast. The current LSLFast which folds LSL (not extends), and is used for the original cases of LSLFast on Kryo and Falkor.
Ext23Fast. Shifts of 2 and 3 (Scales of 4 and 8) are cheap, the other two are not. All extends are cheap. Used for all Arm OoO cores.
AllFast. Everything is cheap. Scales of 2 and 16 along with those above. Used for in order cores I think.

It looks like Ext24Fast should be the default for -mcpu=generic, as a conservative compromise between Ext24Fast and AllFast. (There is a chance that AllFast is better on all cores, but it looks like it takes the load/store pipe for an extra cycle and the ones I tested had an extra cycle latency). This patch seems to essentially switch the default to AllFast, with a target feature that makes LSL1 slow again (but now LSL4)? I like changing the default but I'm not sure we can change it to AllFast for all cpus.

llvm/lib/Target/AArch64/AArch64.td
386	Can we add Addr to the name of this feature, to explain that it is about address operands, not add+lsl's. Should we also use Scale2 or Shift1? From looking at the optimization guides and what we model in the scheduling model (https://github.com/llvm/llvm-project/blob/f6bdfb0b92690403ceef8c1d58adf7a3a733b543/llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td#L490), it looks like this should be slower for scale2 and scale16. scale4 and scale8 (and scale1, that one's easy) are fast.
774	Should FeatureShiftBy2Slow be slower on A55? I don't see that from the optimization guide.
786	From what I can see from the optimization guides, almost all AArch64 Arm cpus (except for the Cortex-A510) say that the latency of `Load vector reg, register offset, scale, S/D-form` is a cycle lower than `Load vector reg, register offset, scale, H/Q-form`.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1230	Should this and the one below be true too?

I've put up a patch to split LSLFast into two parts in https://reviews.llvm.org/D157982.

dmgreen mentioned this in D152828: [MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode .Aug 16 2023, 2:27 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64.td

8 lines

AArch64ISelDAGToDAG.cpp

45 lines

test/

CodeGen/

AArch64/

aarch64-fold-lslfast.ll

146 lines

arm64-addr-mode-folding.ll

1 line

extract-bits.ll

103 lines

Diff 542804

llvm/lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	def FeatureNoNegativeImmediates : SubtargetFeature<"no-neg-immediates",
"to their negated or complemented "		"to their negated or complemented "
"equivalent when the immediate does "		"equivalent when the immediate does "
"not fit in the encoding.">;		"not fit in the encoding.">;

def FeatureLSLFast : SubtargetFeature<		def FeatureLSLFast : SubtargetFeature<
"lsl-fast", "HasLSLFast", "true",		"lsl-fast", "HasLSLFast", "true",
"CPU has a fastpath logical shift of up to 3 places">;		"CPU has a fastpath logical shift of up to 3 places">;

		def FeatureShiftBy2Slow : SubtargetFeature<
		dmgreenUnsubmitted Not Done Reply Inline Actions Can we add Addr to the name of this feature, to explain that it is about address operands, not add+lsl's. Should we also use Scale2 or Shift1? From looking at the optimization guides and what we model in the scheduling model (https://github.com/llvm/llvm-project/blob/f6bdfb0b92690403ceef8c1d58adf7a3a733b543/llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td#L490), it looks like this should be slower for scale2 and scale16. scale4 and scale8 (and scale1, that one's easy) are fast. dmgreen: Can we add Addr to the name of this feature, to explain that it is about address operands, not…
		"shift-2-slow", "HasSlowShift2", "true",
		"CPU needs an extra cycle if doing load/store with shift by 2 register offset">;

def FeatureAggressiveFMA :		def FeatureAggressiveFMA :
SubtargetFeature<"aggressive-fma",		SubtargetFeature<"aggressive-fma",
"HasAggressiveFMA",		"HasAggressiveFMA",
"true",		"true",
"Enable Aggressive FMA for floating-point.">;		"Enable Aggressive FMA for floating-point.">;

def FeatureAltFPCmp : SubtargetFeature<"altnzcv", "HasAlternativeNZCV", "true",		def FeatureAltFPCmp : SubtargetFeature<"altnzcv", "HasAlternativeNZCV", "true",
"Enable alternative NZCV format for floating point comparisons (FEAT_FlagM2)">;		"Enable alternative NZCV format for floating point comparisons (FEAT_FlagM2)">;
▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	def TuneA35 : SubtargetFeature<"a35", "ARMProcFamily", "CortexA35",
"Cortex-A35 ARM processors">;		"Cortex-A35 ARM processors">;

def TuneA53 : SubtargetFeature<"a53", "ARMProcFamily", "CortexA53",		def TuneA53 : SubtargetFeature<"a53", "ARMProcFamily", "CortexA53",
"Cortex-A53 ARM processors", [		"Cortex-A53 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureBalanceFPOps,		FeatureBalanceFPOps,
FeatureCustomCheapAsMoveHandling,		FeatureCustomCheapAsMoveHandling,
		FeatureShiftBy2Slow,
FeaturePostRAScheduler]>;		FeaturePostRAScheduler]>;

def TuneA55 : SubtargetFeature<"a55", "ARMProcFamily", "CortexA55",		def TuneA55 : SubtargetFeature<"a55", "ARMProcFamily", "CortexA55",
"Cortex-A55 ARM processors", [		"Cortex-A55 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
		FeatureShiftBy2Slow,
		dmgreenUnsubmitted Not Done Reply Inline Actions Should FeatureShiftBy2Slow be slower on A55? I don't see that from the optimization guide. dmgreen: Should FeatureShiftBy2Slow be slower on A55? I don't see that from the optimization guide.
FeatureFuseAddress]>;		FeatureFuseAddress]>;

def TuneA510 : SubtargetFeature<"a510", "ARMProcFamily", "CortexA510",		def TuneA510 : SubtargetFeature<"a510", "ARMProcFamily", "CortexA510",
"Cortex-A510 ARM processors", [		"Cortex-A510 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeaturePostRAScheduler		FeaturePostRAScheduler
]>;		]>;

def TuneA57 : SubtargetFeature<"a57", "ARMProcFamily", "CortexA57",		def TuneA57 : SubtargetFeature<"a57", "ARMProcFamily", "CortexA57",
"Cortex-A57 ARM processors", [		"Cortex-A57 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
		dmgreenUnsubmitted Not Done Reply Inline Actions From what I can see from the optimization guides, almost all AArch64 Arm cpus (except for the Cortex-A510) say that the latency of `Load vector reg, register offset, scale, S/D-form` is a cycle lower than `Load vector reg, register offset, scale, H/Q-form`. dmgreen: From what I can see from the optimization guides, almost all AArch64 Arm cpus (except for the…
FeatureBalanceFPOps,		FeatureBalanceFPOps,
FeatureCustomCheapAsMoveHandling,		FeatureCustomCheapAsMoveHandling,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureFuseLiterals,		FeatureFuseLiterals,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureEnableSelectOptimize,		FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;		FeaturePredictableSelectIsExpensive]>;

Show All 21 Lines	def TuneA73 : SubtargetFeature<"a73", "ARMProcFamily", "CortexA73",
FeatureEnableSelectOptimize,		FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;		FeaturePredictableSelectIsExpensive]>;

def TuneA75 : SubtargetFeature<"a75", "ARMProcFamily", "CortexA75",		def TuneA75 : SubtargetFeature<"a75", "ARMProcFamily", "CortexA75",
"Cortex-A75 ARM processors", [		"Cortex-A75 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureEnableSelectOptimize,		FeatureEnableSelectOptimize,
		FeatureShiftBy2Slow,
FeaturePredictableSelectIsExpensive]>;		FeaturePredictableSelectIsExpensive]>;

def TuneA76 : SubtargetFeature<"a76", "ARMProcFamily", "CortexA76",		def TuneA76 : SubtargetFeature<"a76", "ARMProcFamily", "CortexA76",
"Cortex-A76 ARM processors", [		"Cortex-A76 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
FeatureEnableSelectOptimize,		FeatureEnableSelectOptimize,
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	def TuneNeoverse512TVB : SubtargetFeature<"neoverse512tvb", "ARMProcFamily", "Neoverse512TVB",
FeatureEnableSelectOptimize,		FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;		FeaturePredictableSelectIsExpensive]>;

def TuneNeoverseV1 : SubtargetFeature<"neoversev1", "ARMProcFamily", "NeoverseV1",		def TuneNeoverseV1 : SubtargetFeature<"neoversev1", "ARMProcFamily", "NeoverseV1",
"Neoverse V1 ARM processors", [		"Neoverse V1 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureFuseAdrpAdd,		FeatureFuseAdrpAdd,
FeatureLSLFast,		FeatureLSLFast,
		FeatureShiftBy2Slow,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
FeatureEnableSelectOptimize,		FeatureEnableSelectOptimize,
FeaturePredictableSelectIsExpensive]>;		FeaturePredictableSelectIsExpensive]>;

def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",		def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2",
"Neoverse V2 ARM processors", [		"Neoverse V2 ARM processors", [
FeatureFuseAES,		FeatureFuseAES,
FeatureLSLFast,		FeatureLSLFast,
▲ Show 20 Lines • Show All 430 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 446 Lines • ▼ Show 20 Lines	private:
bool SelectAddrModeUnscaled(SDValue N, unsigned Size, SDValue &Base,		bool SelectAddrModeUnscaled(SDValue N, unsigned Size, SDValue &Base,
SDValue &OffImm);		SDValue &OffImm);
bool SelectAddrModeWRO(SDValue N, unsigned Size, SDValue &Base,		bool SelectAddrModeWRO(SDValue N, unsigned Size, SDValue &Base,
SDValue &Offset, SDValue &SignExtend,		SDValue &Offset, SDValue &SignExtend,
SDValue &DoShift);		SDValue &DoShift);
bool SelectAddrModeXRO(SDValue N, unsigned Size, SDValue &Base,		bool SelectAddrModeXRO(SDValue N, unsigned Size, SDValue &Base,
SDValue &Offset, SDValue &SignExtend,		SDValue &Offset, SDValue &SignExtend,
SDValue &DoShift);		SDValue &DoShift);
bool isWorthFolding(SDValue V) const;		bool isWorthFolding(SDValue V, bool FoldToBaseAddr) const;
bool SelectExtendedSHL(SDValue N, unsigned Size, bool WantExtend,		bool SelectExtendedSHL(SDValue N, unsigned Size, bool WantExtend,
SDValue &Offset, SDValue &SignExtend);		SDValue &Offset, SDValue &SignExtend);

template<unsigned RegWidth>		template<unsigned RegWidth>
bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos) {		bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos) {
return SelectCVTFixedPosOperand(N, FixedPos, RegWidth);		return SelectCVTFixedPosOperand(N, FixedPos, RegWidth);
}		}

▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	case ISD::SRA:
return AArch64_AM::ASR;		return AArch64_AM::ASR;
case ISD::ROTR:		case ISD::ROTR:
return AArch64_AM::ROR;		return AArch64_AM::ROR;
}		}
}		}

/// Determine whether it is worth it to fold SHL into the addressing		/// Determine whether it is worth it to fold SHL into the addressing
/// mode.		/// mode.
static bool isWorthFoldingSHL(SDValue V) {		static bool isWorthFoldingSHL(SDValue V, bool FoldToBaseAddr = false,
		bool SlowShift = false) {
assert(V.getOpcode() == ISD::SHL && "invalid opcode");		assert(V.getOpcode() == ISD::SHL && "invalid opcode");
// It is worth folding logical shift of up to three places.		// It is worth folding logical shift of up to three places.
auto *CSD = dyn_cast<ConstantSDNode>(V.getOperand(1));		auto *CSD = dyn_cast<ConstantSDNode>(V.getOperand(1));
if (!CSD)		if (!CSD)
return false;		return false;
unsigned ShiftVal = CSD->getZExtValue();		unsigned ShiftVal = CSD->getZExtValue();
		// Older Arm CPUs perform this load variant slower than other types
		if ((ShiftVal == 1) && SlowShift)
		return false;
if (ShiftVal > 3)		if (ShiftVal > 3)
return false;		return false;

// Check if this particular node is reused in any non-memory related		// Check if this particular node is reused in any non-memory related
// operation. If yes, do not try to fold this node into the address		// operation. If yes, do not try to fold this node into the address
// computation, since the computation will be kept.		// computation, since the computation will be kept.
		if (!FoldToBaseAddr) {
const SDNode *Node = V.getNode();		const SDNode *Node = V.getNode();
for (SDNode *UI : Node->uses())		for (SDNode *UI : Node->uses())
if (!isa<MemSDNode>(*UI))		if (!isa<MemSDNode>(*UI))
for (SDNode *UII : UI->uses())		for (SDNode *UII : UI->uses())
if (!isa<MemSDNode>(*UII))		if (!isa<MemSDNode>(*UII))
return false;		return false;
		}
return true;		return true;
}		}

/// Determine whether it is worth to fold V into an extended register.		/// Determine whether it is worth to fold V into an extended register.
bool AArch64DAGToDAGISel::isWorthFolding(SDValue V) const {		bool AArch64DAGToDAGISel::isWorthFolding(SDValue V,
		bool FoldToBaseAddr = false) const {
		dmgreenUnsubmitted Done Reply Inline Actions Subtarget->hasLSLFast() \|\| FoldToBaseAddr dmgreen: Subtarget->hasLSLFast() \|\| FoldToBaseAddr
		bool AllowLSLFast = Subtarget->hasLSLFast() \|\| FoldToBaseAddr;
		bool ShiftBy2Slow = Subtarget->hasSlowShift2();
// Trivial if we are optimizing for code size or if there is only		// Trivial if we are optimizing for code size or if there is only
// one use of the value.		// one use of the value.
if (CurDAG->shouldOptForSize() \|\| V.hasOneUse())		if (CurDAG->shouldOptForSize() \|\| V.hasOneUse())
return true;		return true;
// If a subtarget has a fastpath LSL we can fold a logical shift into		// If a subtarget has a fastpath LSL we can fold a logical shift into
// the addressing mode and save a cycle.		// the addressing mode and save a cycle.
if (Subtarget->hasLSLFast() && V.getOpcode() == ISD::SHL &&		if (AllowLSLFast && V.getOpcode() == ISD::SHL &&
isWorthFoldingSHL(V))		isWorthFoldingSHL(V, FoldToBaseAddr, ShiftBy2Slow))
return true;		return true;
if (Subtarget->hasLSLFast() && V.getOpcode() == ISD::ADD) {		if (AllowLSLFast && V.getOpcode() == ISD::ADD) {
const SDValue LHS = V.getOperand(0);		const SDValue LHS = V.getOperand(0);
const SDValue RHS = V.getOperand(1);		const SDValue RHS = V.getOperand(1);
if (LHS.getOpcode() == ISD::SHL && isWorthFoldingSHL(LHS))		if (LHS.getOpcode() == ISD::SHL &&
		isWorthFoldingSHL(LHS, FoldToBaseAddr, ShiftBy2Slow))
return true;		return true;
if (RHS.getOpcode() == ISD::SHL && isWorthFoldingSHL(RHS))		if (RHS.getOpcode() == ISD::SHL &&
		isWorthFoldingSHL(RHS, FoldToBaseAddr, ShiftBy2Slow))
return true;		return true;
}		}

// It hurts otherwise, since the value will be reused.		// It hurts otherwise, since the value will be reused.
return false;		return false;
}		}

/// and (shl/srl/sra, x, c), mask --> shl (srl/sra, x, c1), c2		/// and (shl/srl/sra, x, c), mask --> shl (srl/sra, x, c1), c2
▲ Show 20 Lines • Show All 499 Lines • ▼ Show 20 Lines	bool AArch64DAGToDAGISel::SelectAddrModeWRO(SDValue N, unsigned Size,
// computation, since the computation will be kept.		// computation, since the computation will be kept.
const SDNode *Node = N.getNode();		const SDNode *Node = N.getNode();
for (SDNode *UI : Node->uses()) {		for (SDNode *UI : Node->uses()) {
if (!isa<MemSDNode>(*UI))		if (!isa<MemSDNode>(*UI))
return false;		return false;
}		}

// Remember if it is worth folding N when it produces extended register.		// Remember if it is worth folding N when it produces extended register.
bool IsExtendedRegisterWorthFolding = isWorthFolding(N);		bool IsExtendedRegisterWorthFolding =
		isWorthFolding(N, /* FoldtoBaseAddr */ true);

// Try to match a shifted extend on the RHS.		// Try to match a shifted extend on the RHS.
if (IsExtendedRegisterWorthFolding && RHS.getOpcode() == ISD::SHL &&		if (IsExtendedRegisterWorthFolding && RHS.getOpcode() == ISD::SHL &&
SelectExtendedSHL(RHS, Size, true, Offset, SignExtend)) {		SelectExtendedSHL(RHS, Size, true, Offset, SignExtend)) {
Base = LHS;		Base = LHS;
DoShift = CurDAG->getTargetConstant(true, dl, MVT::i32);		DoShift = CurDAG->getTargetConstant(true, dl, MVT::i32);
return true;		return true;
}		}
Show All 13 Lines	bool AArch64DAGToDAGISel::SelectAddrModeWRO(SDValue N, unsigned Size,
// Try to match an unshifted extend on the LHS.		// Try to match an unshifted extend on the LHS.
if (IsExtendedRegisterWorthFolding &&		if (IsExtendedRegisterWorthFolding &&
(Ext = getExtendTypeForNode(LHS, true)) !=		(Ext = getExtendTypeForNode(LHS, true)) !=
AArch64_AM::InvalidShiftExtend) {		AArch64_AM::InvalidShiftExtend) {
Base = RHS;		Base = RHS;
Offset = narrowIfNeeded(CurDAG, LHS.getOperand(0));		Offset = narrowIfNeeded(CurDAG, LHS.getOperand(0));
SignExtend = CurDAG->getTargetConstant(Ext == AArch64_AM::SXTW, dl,		SignExtend = CurDAG->getTargetConstant(Ext == AArch64_AM::SXTW, dl,
MVT::i32);		MVT::i32);
if (isWorthFolding(LHS))		if (isWorthFolding(LHS))
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this and the one below be true too? dmgreen: Should this and the one below be true too?
return true;		return true;
}		}

// Try to match an unshifted extend on the RHS.		// Try to match an unshifted extend on the RHS.
if (IsExtendedRegisterWorthFolding &&		if (IsExtendedRegisterWorthFolding &&
(Ext = getExtendTypeForNode(RHS, true)) !=		(Ext = getExtendTypeForNode(RHS, true)) !=
AArch64_AM::InvalidShiftExtend) {		AArch64_AM::InvalidShiftExtend) {
Base = LHS;		Base = LHS;
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (isa<ConstantSDNode>(RHS)) {
SDNode *MOVI =		SDNode *MOVI =
CurDAG->getMachineNode(AArch64::MOVi64imm, DL, MVT::i64, Ops);		CurDAG->getMachineNode(AArch64::MOVi64imm, DL, MVT::i64, Ops);
SDValue MOVIV = SDValue(MOVI, 0);		SDValue MOVIV = SDValue(MOVI, 0);
// This ADD of two X register will be selected into [Reg+Reg] mode.		// This ADD of two X register will be selected into [Reg+Reg] mode.
N = CurDAG->getNode(ISD::ADD, DL, MVT::i64, LHS, MOVIV);		N = CurDAG->getNode(ISD::ADD, DL, MVT::i64, LHS, MOVIV);
}		}

// Remember if it is worth folding N when it produces extended register.		// Remember if it is worth folding N when it produces extended register.
bool IsExtendedRegisterWorthFolding = isWorthFolding(N);		bool IsExtendedRegisterWorthFolding =
		isWorthFolding(N, /FoldToBaseAddr/ true);

// Try to match a shifted extend on the RHS.		// Try to match a shifted extend on the RHS.
if (IsExtendedRegisterWorthFolding && RHS.getOpcode() == ISD::SHL &&		if (IsExtendedRegisterWorthFolding && RHS.getOpcode() == ISD::SHL &&
SelectExtendedSHL(RHS, Size, false, Offset, SignExtend)) {		SelectExtendedSHL(RHS, Size, false, Offset, SignExtend)) {
Base = LHS;		Base = LHS;
DoShift = CurDAG->getTargetConstant(true, DL, MVT::i32);		DoShift = CurDAG->getTargetConstant(true, DL, MVT::i32);
return true;		return true;
}		}
▲ Show 20 Lines • Show All 5,389 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s --check-prefixes=CHECK,CHECK0		; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s --check-prefixes=CHECK,CHECK0
; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+lsl-fast \| FileCheck %s --check-prefixes=CHECK,CHECK3		; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+lsl-fast -mattr=+shift-2-slow \| FileCheck %s --check-prefixes=CHECK,CHECK1

%struct.a = type [256 x i16]		%struct.a = type [256 x i16]
%struct.b = type [256 x i32]		%struct.b = type [256 x i32]
%struct.c = type [256 x i64]		%struct.c = type [256 x i64]

declare void @foo()		declare void @foo()
define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {		define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {
; CHECK0-LABEL: halfword:		; CHECK0-LABEL: halfword:
; CHECK0: // %bb.0:		; CHECK0: // %bb.0:
; CHECK0-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill		; CHECK0-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK0-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK0-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK0-NEXT: ubfx x8, x1, #9, #8		; CHECK0-NEXT: ubfx x21, x1, #9, #8
; CHECK0-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill		; CHECK0-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK0-NEXT: lsl x21, x8, #1
; CHECK0-NEXT: mov x19, x0		; CHECK0-NEXT: mov x19, x0
; CHECK0-NEXT: ldrh w20, [x0, x21]		; CHECK0-NEXT: ldrh w20, [x0, x21, lsl #1]
; CHECK0-NEXT: bl foo		; CHECK0-NEXT: bl foo
; CHECK0-NEXT: mov w0, w20		; CHECK0-NEXT: mov w0, w20
; CHECK0-NEXT: strh w20, [x19, x21]		; CHECK0-NEXT: strh w20, [x19, x21, lsl #1]
; CHECK0-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload		; CHECK0-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK0-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload		; CHECK0-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK0-NEXT: ret		; CHECK0-NEXT: ret
;		;
; CHECK3-LABEL: halfword:		; CHECK1-LABEL: halfword:
; CHECK3: // %bb.0:		; CHECK1: // %bb.0:
; CHECK3-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill		; CHECK1-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK3-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK1-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK3-NEXT: ubfx x21, x1, #9, #8		; CHECK1-NEXT: ubfx x8, x1, #9, #8
; CHECK3-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill		; CHECK1-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK3-NEXT: mov x19, x0		; CHECK1-NEXT: lsl x21, x8, #1
; CHECK3-NEXT: ldrh w20, [x0, x21, lsl #1]		; CHECK1-NEXT: mov x19, x0
; CHECK3-NEXT: bl foo		; CHECK1-NEXT: ldrh w20, [x0, x21]
; CHECK3-NEXT: mov w0, w20		; CHECK1-NEXT: bl foo
; CHECK3-NEXT: strh w20, [x19, x21, lsl #1]		; CHECK1-NEXT: mov w0, w20
; CHECK3-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload		; CHECK1-NEXT: strh w20, [x19, x21]
; CHECK3-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload		; CHECK1-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK3-NEXT: ret		; CHECK1-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
		; CHECK1-NEXT: ret
%shr81 = lshr i32 %xor72, 9		%shr81 = lshr i32 %xor72, 9
%conv82 = zext i32 %shr81 to i64		%conv82 = zext i32 %shr81 to i64
%idxprom83 = and i64 %conv82, 255		%idxprom83 = and i64 %conv82, 255
%arrayidx86 = getelementptr inbounds %struct.a, ptr %ctx, i64 0, i64 %idxprom83		%arrayidx86 = getelementptr inbounds %struct.a, ptr %ctx, i64 0, i64 %idxprom83
%result = load i16, ptr %arrayidx86, align 2		%result = load i16, ptr %arrayidx86, align 2
call void @foo()		call void @foo()
store i16 %result, ptr %arrayidx86, align 2		store i16 %result, ptr %arrayidx86, align 2
ret i16 %result		ret i16 %result
}		}

define i32 @word(ptr %ctx, i32 %xor72) nounwind {		define i32 @word(ptr %ctx, i32 %xor72) nounwind {
; CHECK0-LABEL: word:		; CHECK-LABEL: word:
; CHECK0: // %bb.0:		; CHECK: // %bb.0:
; CHECK0-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill		; CHECK-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK0-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK0-NEXT: ubfx x8, x1, #9, #8		; CHECK-NEXT: ubfx x21, x1, #9, #8
; CHECK0-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill		; CHECK-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK0-NEXT: lsl x21, x8, #2		; CHECK-NEXT: mov x19, x0
; CHECK0-NEXT: mov x19, x0		; CHECK-NEXT: ldr w20, [x0, x21, lsl #2]
; CHECK0-NEXT: ldr w20, [x0, x21]		; CHECK-NEXT: bl foo
; CHECK0-NEXT: bl foo		; CHECK-NEXT: mov w0, w20
; CHECK0-NEXT: mov w0, w20		; CHECK-NEXT: str w20, [x19, x21, lsl #2]
; CHECK0-NEXT: str w20, [x19, x21]		; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK0-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload		; CHECK-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK0-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload		; CHECK-NEXT: ret
; CHECK0-NEXT: ret
;
; CHECK3-LABEL: word:
; CHECK3: // %bb.0:
; CHECK3-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK3-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK3-NEXT: ubfx x21, x1, #9, #8
; CHECK3-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK3-NEXT: mov x19, x0
; CHECK3-NEXT: ldr w20, [x0, x21, lsl #2]
; CHECK3-NEXT: bl foo
; CHECK3-NEXT: mov w0, w20
; CHECK3-NEXT: str w20, [x19, x21, lsl #2]
; CHECK3-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK3-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK3-NEXT: ret
%shr81 = lshr i32 %xor72, 9		%shr81 = lshr i32 %xor72, 9
%conv82 = zext i32 %shr81 to i64		%conv82 = zext i32 %shr81 to i64
%idxprom83 = and i64 %conv82, 255		%idxprom83 = and i64 %conv82, 255
%arrayidx86 = getelementptr inbounds %struct.b, ptr %ctx, i64 0, i64 %idxprom83		%arrayidx86 = getelementptr inbounds %struct.b, ptr %ctx, i64 0, i64 %idxprom83
%result = load i32, ptr %arrayidx86, align 4		%result = load i32, ptr %arrayidx86, align 4
call void @foo()		call void @foo()
store i32 %result, ptr %arrayidx86, align 4		store i32 %result, ptr %arrayidx86, align 4
ret i32 %result		ret i32 %result
}		}

define i64 @doubleword(ptr %ctx, i32 %xor72) nounwind {		define i64 @doubleword(ptr %ctx, i32 %xor72) nounwind {
; CHECK0-LABEL: doubleword:		; CHECK-LABEL: doubleword:
; CHECK0: // %bb.0:		; CHECK: // %bb.0:
; CHECK0-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill		; CHECK-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK0-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK0-NEXT: ubfx x8, x1, #9, #8		; CHECK-NEXT: ubfx x21, x1, #9, #8
; CHECK0-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill		; CHECK-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK0-NEXT: lsl x21, x8, #3		; CHECK-NEXT: mov x19, x0
; CHECK0-NEXT: mov x19, x0		; CHECK-NEXT: ldr x20, [x0, x21, lsl #3]
; CHECK0-NEXT: ldr x20, [x0, x21]		; CHECK-NEXT: bl foo
; CHECK0-NEXT: bl foo		; CHECK-NEXT: mov x0, x20
; CHECK0-NEXT: mov x0, x20		; CHECK-NEXT: str x20, [x19, x21, lsl #3]
; CHECK0-NEXT: str x20, [x19, x21]		; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK0-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload		; CHECK-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK0-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload		; CHECK-NEXT: ret
; CHECK0-NEXT: ret
;
; CHECK3-LABEL: doubleword:
; CHECK3: // %bb.0:
; CHECK3-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK3-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK3-NEXT: ubfx x21, x1, #9, #8
; CHECK3-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK3-NEXT: mov x19, x0
; CHECK3-NEXT: ldr x20, [x0, x21, lsl #3]
; CHECK3-NEXT: bl foo
; CHECK3-NEXT: mov x0, x20
; CHECK3-NEXT: str x20, [x19, x21, lsl #3]
; CHECK3-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK3-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK3-NEXT: ret
%shr81 = lshr i32 %xor72, 9		%shr81 = lshr i32 %xor72, 9
%conv82 = zext i32 %shr81 to i64		%conv82 = zext i32 %shr81 to i64
%idxprom83 = and i64 %conv82, 255		%idxprom83 = and i64 %conv82, 255
%arrayidx86 = getelementptr inbounds %struct.c, ptr %ctx, i64 0, i64 %idxprom83		%arrayidx86 = getelementptr inbounds %struct.c, ptr %ctx, i64 0, i64 %idxprom83
%result = load i64, ptr %arrayidx86, align 8		%result = load i64, ptr %arrayidx86, align 8
call void @foo()		call void @foo()
store i64 %result, ptr %arrayidx86, align 8		store i64 %result, ptr %arrayidx86, align 8
ret i64 %result		ret i64 %result
Show All 27 Lines	falsebb:
br i1 %cmp2, label %exitbb, label %endbb		br i1 %cmp2, label %exitbb, label %endbb
exitbb:		exitbb:
ret i64 %mul1		ret i64 %mul1
endbb:		endbb:
ret i64 %mul2		ret i64 %mul2
}		}

define i64 @gep3(ptr %p, i64 %b) {		define i64 @gep3(ptr %p, i64 %b) {
; CHECK0-LABEL: gep3:		; CHECK-LABEL: gep3:
; CHECK0: // %bb.0:		; CHECK: // %bb.0:
; CHECK0-NEXT: lsl x9, x1, #3		; CHECK-NEXT: mov x8, x0
; CHECK0-NEXT: mov x8, x0		; CHECK-NEXT: ldr x0, [x0, x1, lsl #3]
; CHECK0-NEXT: ldr x0, [x0, x9]		; CHECK-NEXT: str x1, [x8, x1, lsl #3]
; CHECK0-NEXT: str x1, [x8, x9]		; CHECK-NEXT: ret
; CHECK0-NEXT: ret
;
; CHECK3-LABEL: gep3:
; CHECK3: // %bb.0:
; CHECK3-NEXT: mov x8, x0
; CHECK3-NEXT: ldr x0, [x0, x1, lsl #3]
; CHECK3-NEXT: str x1, [x8, x1, lsl #3]
; CHECK3-NEXT: ret
%g = getelementptr inbounds i64, ptr %p, i64 %b		%g = getelementptr inbounds i64, ptr %p, i64 %b
%l = load i64, ptr %g		%l = load i64, ptr %g
store i64 %b, ptr %g		store i64 %b, ptr %g
ret i64 %l		ret i64 %l
}		}

define i128 @gep4(ptr %p, i128 %a, i64 %b) {		define i128 @gep4(ptr %p, i128 %a, i64 %b) {
; CHECK-LABEL: gep4:		; CHECK-LABEL: gep4:
Show All 31 Lines
; CHECK-NEXT: sub x8, x1, x8		; CHECK-NEXT: sub x8, x1, x8
; CHECK-NEXT: eor x0, x9, x8		; CHECK-NEXT: eor x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%x = shl i64 %a, 4		%x = shl i64 %a, 4
%y = add i64 %b, %x		%y = add i64 %b, %x
%z = sub i64 %b, %x		%z = sub i64 %b, %x
%r = xor i64 %y, %z		%r = xor i64 %y, %z
ret i64 %r		ret i64 %r
}		}
		dmgreenUnsubmitted Done Reply Inline Actions These CHECK prefixes can be removed now, if they are all expected to be the same. dmgreen: These CHECK prefixes can be removed now, if they are all expected to be the same.

llvm/test/CodeGen/AArch64/arm64-addr-mode-folding.ll

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	if.then34: ; preds = %if.end23
%conv38 = zext i1 %cmp37 to i32		%conv38 = zext i1 %cmp37 to i32
br label %return		br label %return

return: ; preds = %if.end23, %if.then34, %if.then18, %if.then		return: ; preds = %if.end23, %if.then34, %if.then18, %if.then
%retval.0 = phi i32 [ %conv8, %if.then ], [ %conv22, %if.then18 ], [ %conv38, %if.then34 ], [ 1, %if.end23 ]		%retval.0 = phi i32 [ %conv8, %if.then ], [ %conv22, %if.then18 ], [ %conv38, %if.then34 ], [ 1, %if.end23 ]
ret i32 %retval.0		ret i32 %retval.0
}		}

; CHECK: @test		; CHECK: @test
; CHECK-NOT: , uxtw #2]
define i32 @test(ptr %array, i8 zeroext %c, i32 %arg) {		define i32 @test(ptr %array, i8 zeroext %c, i32 %arg) {
		dmgreenUnsubmitted Done Reply Inline Actions Make sure to remove the old CHECK lines. It is probably worth updating the check lines in a separate patch, so that just the differences can be shown here. dmgreen: Make sure to remove the old CHECK lines. It is probably worth updating the check lines in a…
entry:		entry:
%conv = zext i8 %c to i32		%conv = zext i8 %c to i32
%add = sub i32 0, %arg		%add = sub i32 0, %arg
%tobool = icmp eq i32 %conv, %add		%tobool = icmp eq i32 %conv, %add
br i1 %tobool, label %if.end, label %if.then		br i1 %tobool, label %if.end, label %if.then

if.then: ; preds = %entry		if.then: ; preds = %entry
%idxprom = zext i8 %c to i64		%idxprom = zext i8 %c to i64
Show All 34 Lines

llvm/test/CodeGen/AArch64/extract-bits.ll

Show All 15 Lines

; ---------------------------------------------------------------------------- ;		; ---------------------------------------------------------------------------- ;
; Pattern a. 32-bit		; Pattern a. 32-bit
; ---------------------------------------------------------------------------- ;		; ---------------------------------------------------------------------------- ;

define i32 @bextr32_a0(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_a0(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_a0:		; CHECK-LABEL: bextr32_a0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsr w9, w0, w1		; CHECK-NEXT: lsr w9, w0, w1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%onebit = shl i32 1, %numlowbits		%onebit = shl i32 1, %numlowbits
%mask = add nsw i32 %onebit, -1		%mask = add nsw i32 %onebit, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_a0_arithmetic(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_a0_arithmetic(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_a0_arithmetic:		; CHECK-LABEL: bextr32_a0_arithmetic:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: asr w9, w0, w1		; CHECK-NEXT: asr w9, w0, w1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = ashr i32 %val, %numskipbits		%shifted = ashr i32 %val, %numskipbits
%onebit = shl i32 1, %numlowbits		%onebit = shl i32 1, %numlowbits
%mask = add nsw i32 %onebit, -1		%mask = add nsw i32 %onebit, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_a1_indexzext(i32 %val, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {		define i32 @bextr32_a1_indexzext(i32 %val, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {
; CHECK-LABEL: bextr32_a1_indexzext:		; CHECK-LABEL: bextr32_a1_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsr w9, w0, w1		; CHECK-NEXT: lsr w9, w0, w1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%skip = zext i8 %numskipbits to i32		%skip = zext i8 %numskipbits to i32
%shifted = lshr i32 %val, %skip		%shifted = lshr i32 %val, %skip
%conv = zext i8 %numlowbits to i32		%conv = zext i8 %numlowbits to i32
%onebit = shl i32 1, %conv		%onebit = shl i32 1, %conv
%mask = add nsw i32 %onebit, -1		%mask = add nsw i32 %onebit, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_a2_load(ptr %w, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_a2_load(ptr %w, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_a2_load:		; CHECK-LABEL: bextr32_a2_load:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr w9, [x0]		; CHECK-NEXT: ldr w9, [x0]
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: lsr w9, w9, w1		; CHECK-NEXT: lsr w9, w9, w1
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i32, ptr %w		%val = load i32, ptr %w
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%onebit = shl i32 1, %numlowbits		%onebit = shl i32 1, %numlowbits
%mask = add nsw i32 %onebit, -1		%mask = add nsw i32 %onebit, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_a3_load_indexzext(ptr %w, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {		define i32 @bextr32_a3_load_indexzext(ptr %w, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {
; CHECK-LABEL: bextr32_a3_load_indexzext:		; CHECK-LABEL: bextr32_a3_load_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr w9, [x0]		; CHECK-NEXT: ldr w9, [x0]
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: lsr w9, w9, w1		; CHECK-NEXT: lsr w9, w9, w1
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i32, ptr %w		%val = load i32, ptr %w
%skip = zext i8 %numskipbits to i32		%skip = zext i8 %numskipbits to i32
%shifted = lshr i32 %val, %skip		%shifted = lshr i32 %val, %skip
%conv = zext i8 %numlowbits to i32		%conv = zext i8 %numlowbits to i32
%onebit = shl i32 1, %conv		%onebit = shl i32 1, %conv
%mask = add nsw i32 %onebit, -1		%mask = add nsw i32 %onebit, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_a4_commutative(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_a4_commutative(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_a4_commutative:		; CHECK-LABEL: bextr32_a4_commutative:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsr w9, w0, w1		; CHECK-NEXT: lsr w9, w0, w1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: and w0, w9, w8		; CHECK-NEXT: and w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%onebit = shl i32 1, %numlowbits		%onebit = shl i32 1, %numlowbits
%mask = add nsw i32 %onebit, -1		%mask = add nsw i32 %onebit, -1
%masked = and i32 %shifted, %mask ; swapped order		%masked = and i32 %shifted, %mask ; swapped order
ret i32 %masked		ret i32 %masked
}		}

; 64-bit		; 64-bit

define i64 @bextr64_a0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_a0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_a0:		; CHECK-LABEL: bextr64_a0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: sub x8, x8, #1		; CHECK-NEXT: sub x8, x8, #1
; CHECK-NEXT: and x0, x8, x9		; CHECK-NEXT: and x0, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%onebit = shl i64 1, %numlowbits		%onebit = shl i64 1, %numlowbits
%mask = add nsw i64 %onebit, -1		%mask = add nsw i64 %onebit, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_a0_arithmetic(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_a0_arithmetic(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_a0_arithmetic:		; CHECK-LABEL: bextr64_a0_arithmetic:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: asr x9, x0, x1		; CHECK-NEXT: asr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: sub x8, x8, #1		; CHECK-NEXT: sub x8, x8, #1
; CHECK-NEXT: and x0, x8, x9		; CHECK-NEXT: and x0, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = ashr i64 %val, %numskipbits		%shifted = ashr i64 %val, %numskipbits
%onebit = shl i64 1, %numlowbits		%onebit = shl i64 1, %numlowbits
%mask = add nsw i64 %onebit, -1		%mask = add nsw i64 %onebit, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_a1_indexzext(i64 %val, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {		define i64 @bextr64_a1_indexzext(i64 %val, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {
; CHECK-LABEL: bextr64_a1_indexzext:		; CHECK-LABEL: bextr64_a1_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2		; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: sub x8, x8, #1		; CHECK-NEXT: sub x8, x8, #1
; CHECK-NEXT: and x0, x8, x9		; CHECK-NEXT: and x0, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%skip = zext i8 %numskipbits to i64		%skip = zext i8 %numskipbits to i64
%shifted = lshr i64 %val, %skip		%shifted = lshr i64 %val, %skip
%conv = zext i8 %numlowbits to i64		%conv = zext i8 %numlowbits to i64
%onebit = shl i64 1, %conv		%onebit = shl i64 1, %conv
%mask = add nsw i64 %onebit, -1		%mask = add nsw i64 %onebit, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_a2_load(ptr %w, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_a2_load(ptr %w, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_a2_load:		; CHECK-LABEL: bextr64_a2_load:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: sub x8, x8, #1		; CHECK-NEXT: sub x8, x8, #1
; CHECK-NEXT: lsr x9, x9, x1		; CHECK-NEXT: lsr x9, x9, x1
; CHECK-NEXT: and x0, x8, x9		; CHECK-NEXT: and x0, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i64, ptr %w		%val = load i64, ptr %w
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%onebit = shl i64 1, %numlowbits		%onebit = shl i64 1, %numlowbits
%mask = add nsw i64 %onebit, -1		%mask = add nsw i64 %onebit, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_a3_load_indexzext(ptr %w, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {		define i64 @bextr64_a3_load_indexzext(ptr %w, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {
; CHECK-LABEL: bextr64_a3_load_indexzext:		; CHECK-LABEL: bextr64_a3_load_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2		; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: sub x8, x8, #1		; CHECK-NEXT: sub x8, x8, #1
; CHECK-NEXT: lsr x9, x9, x1		; CHECK-NEXT: lsr x9, x9, x1
; CHECK-NEXT: and x0, x8, x9		; CHECK-NEXT: and x0, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i64, ptr %w		%val = load i64, ptr %w
%skip = zext i8 %numskipbits to i64		%skip = zext i8 %numskipbits to i64
%shifted = lshr i64 %val, %skip		%shifted = lshr i64 %val, %skip
%conv = zext i8 %numlowbits to i64		%conv = zext i8 %numlowbits to i64
%onebit = shl i64 1, %conv		%onebit = shl i64 1, %conv
%mask = add nsw i64 %onebit, -1		%mask = add nsw i64 %onebit, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_a4_commutative(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_a4_commutative(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_a4_commutative:		; CHECK-LABEL: bextr64_a4_commutative:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: sub x8, x8, #1		; CHECK-NEXT: sub x8, x8, #1
; CHECK-NEXT: and x0, x9, x8		; CHECK-NEXT: and x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%onebit = shl i64 1, %numlowbits		%onebit = shl i64 1, %numlowbits
%mask = add nsw i64 %onebit, -1		%mask = add nsw i64 %onebit, -1
%masked = and i64 %shifted, %mask ; swapped order		%masked = and i64 %shifted, %mask ; swapped order
ret i64 %masked		ret i64 %masked
}		}

; 64-bit, but with 32-bit output		; 64-bit, but with 32-bit output

; Everything done in 64-bit, truncation happens last.		; Everything done in 64-bit, truncation happens last.
define i32 @bextr64_32_a0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i32 @bextr64_32_a0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_a0:		; CHECK-LABEL: bextr64_32_a0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%onebit = shl i64 1, %numlowbits		%onebit = shl i64 1, %numlowbits
%mask = add nsw i64 %onebit, -1		%mask = add nsw i64 %onebit, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
%res = trunc i64 %masked to i32		%res = trunc i64 %masked to i32
ret i32 %res		ret i32 %res
}		}

; Shifting happens in 64-bit, then truncation. Masking is 32-bit.		; Shifting happens in 64-bit, then truncation. Masking is 32-bit.
define i32 @bextr64_32_a1(i64 %val, i64 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr64_32_a1(i64 %val, i64 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_a1:		; CHECK-LABEL: bextr64_32_a1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%truncshifted = trunc i64 %shifted to i32		%truncshifted = trunc i64 %shifted to i32
%onebit = shl i32 1, %numlowbits		%onebit = shl i32 1, %numlowbits
%mask = add nsw i32 %onebit, -1		%mask = add nsw i32 %onebit, -1
%masked = and i32 %mask, %truncshifted		%masked = and i32 %mask, %truncshifted
ret i32 %masked		ret i32 %masked
}		}

; Shifting happens in 64-bit. Mask is 32-bit, but extended to 64-bit.		; Shifting happens in 64-bit. Mask is 32-bit, but extended to 64-bit.
; Masking is 64-bit. Then truncation.		; Masking is 64-bit. Then truncation.
define i32 @bextr64_32_a2(i64 %val, i64 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr64_32_a2(i64 %val, i64 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_a2:		; CHECK-LABEL: bextr64_32_a2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #1		; CHECK-NEXT: mov w8, #1 // =0x1
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: sub w8, w8, #1		; CHECK-NEXT: sub w8, w8, #1
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%onebit = shl i32 1, %numlowbits		%onebit = shl i32 1, %numlowbits
%mask = add nsw i32 %onebit, -1		%mask = add nsw i32 %onebit, -1
%zextmask = zext i32 %mask to i64		%zextmask = zext i32 %mask to i64
%masked = and i64 %zextmask, %shifted		%masked = and i64 %zextmask, %shifted
%truncmasked = trunc i64 %masked to i32		%truncmasked = trunc i64 %masked to i32
ret i32 %truncmasked		ret i32 %truncmasked
}		}

; ---------------------------------------------------------------------------- ;		; ---------------------------------------------------------------------------- ;
; Pattern b. 32-bit		; Pattern b. 32-bit
; ---------------------------------------------------------------------------- ;		; ---------------------------------------------------------------------------- ;

define i32 @bextr32_b0(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_b0(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_b0:		; CHECK-LABEL: bextr32_b0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov w8, #-1 // =0xffffffff
; CHECK-NEXT: lsr w9, w0, w1		; CHECK-NEXT: lsr w9, w0, w1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: bic w0, w9, w8		; CHECK-NEXT: bic w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%notmask = shl i32 -1, %numlowbits		%notmask = shl i32 -1, %numlowbits
%mask = xor i32 %notmask, -1		%mask = xor i32 %notmask, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_b1_indexzext(i32 %val, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {		define i32 @bextr32_b1_indexzext(i32 %val, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {
; CHECK-LABEL: bextr32_b1_indexzext:		; CHECK-LABEL: bextr32_b1_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov w8, #-1 // =0xffffffff
; CHECK-NEXT: lsr w9, w0, w1		; CHECK-NEXT: lsr w9, w0, w1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: bic w0, w9, w8		; CHECK-NEXT: bic w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%skip = zext i8 %numskipbits to i32		%skip = zext i8 %numskipbits to i32
%shifted = lshr i32 %val, %skip		%shifted = lshr i32 %val, %skip
%conv = zext i8 %numlowbits to i32		%conv = zext i8 %numlowbits to i32
%notmask = shl i32 -1, %conv		%notmask = shl i32 -1, %conv
%mask = xor i32 %notmask, -1		%mask = xor i32 %notmask, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_b2_load(ptr %w, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_b2_load(ptr %w, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_b2_load:		; CHECK-LABEL: bextr32_b2_load:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr w9, [x0]		; CHECK-NEXT: ldr w9, [x0]
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov w8, #-1 // =0xffffffff
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: lsr w9, w9, w1		; CHECK-NEXT: lsr w9, w9, w1
; CHECK-NEXT: bic w0, w9, w8		; CHECK-NEXT: bic w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i32, ptr %w		%val = load i32, ptr %w
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%notmask = shl i32 -1, %numlowbits		%notmask = shl i32 -1, %numlowbits
%mask = xor i32 %notmask, -1		%mask = xor i32 %notmask, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_b3_load_indexzext(ptr %w, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {		define i32 @bextr32_b3_load_indexzext(ptr %w, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {
; CHECK-LABEL: bextr32_b3_load_indexzext:		; CHECK-LABEL: bextr32_b3_load_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr w9, [x0]		; CHECK-NEXT: ldr w9, [x0]
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov w8, #-1 // =0xffffffff
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: lsr w9, w9, w1		; CHECK-NEXT: lsr w9, w9, w1
; CHECK-NEXT: bic w0, w9, w8		; CHECK-NEXT: bic w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i32, ptr %w		%val = load i32, ptr %w
%skip = zext i8 %numskipbits to i32		%skip = zext i8 %numskipbits to i32
%shifted = lshr i32 %val, %skip		%shifted = lshr i32 %val, %skip
%conv = zext i8 %numlowbits to i32		%conv = zext i8 %numlowbits to i32
%notmask = shl i32 -1, %conv		%notmask = shl i32 -1, %conv
%mask = xor i32 %notmask, -1		%mask = xor i32 %notmask, -1
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_b4_commutative(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_b4_commutative(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_b4_commutative:		; CHECK-LABEL: bextr32_b4_commutative:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov w8, #-1 // =0xffffffff
; CHECK-NEXT: lsr w9, w0, w1		; CHECK-NEXT: lsr w9, w0, w1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: bic w0, w9, w8		; CHECK-NEXT: bic w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%notmask = shl i32 -1, %numlowbits		%notmask = shl i32 -1, %numlowbits
%mask = xor i32 %notmask, -1		%mask = xor i32 %notmask, -1
%masked = and i32 %shifted, %mask ; swapped order		%masked = and i32 %shifted, %mask ; swapped order
ret i32 %masked		ret i32 %masked
}		}

; 64-bit		; 64-bit

define i64 @bextr64_b0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_b0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_b0:		; CHECK-LABEL: bextr64_b0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-1		; CHECK-NEXT: mov x8, #-1 // =0xffffffffffffffff
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: bic x0, x9, x8		; CHECK-NEXT: bic x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%notmask = shl i64 -1, %numlowbits		%notmask = shl i64 -1, %numlowbits
%mask = xor i64 %notmask, -1		%mask = xor i64 %notmask, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_b1_indexzext(i64 %val, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {		define i64 @bextr64_b1_indexzext(i64 %val, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {
; CHECK-LABEL: bextr64_b1_indexzext:		; CHECK-LABEL: bextr64_b1_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-1		; CHECK-NEXT: mov x8, #-1 // =0xffffffffffffffff
; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2		; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: bic x0, x9, x8		; CHECK-NEXT: bic x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%skip = zext i8 %numskipbits to i64		%skip = zext i8 %numskipbits to i64
%shifted = lshr i64 %val, %skip		%shifted = lshr i64 %val, %skip
%conv = zext i8 %numlowbits to i64		%conv = zext i8 %numlowbits to i64
%notmask = shl i64 -1, %conv		%notmask = shl i64 -1, %conv
%mask = xor i64 %notmask, -1		%mask = xor i64 %notmask, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_b2_load(ptr %w, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_b2_load(ptr %w, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_b2_load:		; CHECK-LABEL: bextr64_b2_load:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: mov x8, #-1		; CHECK-NEXT: mov x8, #-1 // =0xffffffffffffffff
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: lsr x9, x9, x1		; CHECK-NEXT: lsr x9, x9, x1
; CHECK-NEXT: bic x0, x9, x8		; CHECK-NEXT: bic x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i64, ptr %w		%val = load i64, ptr %w
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%notmask = shl i64 -1, %numlowbits		%notmask = shl i64 -1, %numlowbits
%mask = xor i64 %notmask, -1		%mask = xor i64 %notmask, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_b3_load_indexzext(ptr %w, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {		define i64 @bextr64_b3_load_indexzext(ptr %w, i8 zeroext %numskipbits, i8 zeroext %numlowbits) nounwind {
; CHECK-LABEL: bextr64_b3_load_indexzext:		; CHECK-LABEL: bextr64_b3_load_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: mov x8, #-1		; CHECK-NEXT: mov x8, #-1 // =0xffffffffffffffff
; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2		; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: lsr x9, x9, x1		; CHECK-NEXT: lsr x9, x9, x1
; CHECK-NEXT: bic x0, x9, x8		; CHECK-NEXT: bic x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i64, ptr %w		%val = load i64, ptr %w
%skip = zext i8 %numskipbits to i64		%skip = zext i8 %numskipbits to i64
%shifted = lshr i64 %val, %skip		%shifted = lshr i64 %val, %skip
%conv = zext i8 %numlowbits to i64		%conv = zext i8 %numlowbits to i64
%notmask = shl i64 -1, %conv		%notmask = shl i64 -1, %conv
%mask = xor i64 %notmask, -1		%mask = xor i64 %notmask, -1
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_b4_commutative(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_b4_commutative(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_b4_commutative:		; CHECK-LABEL: bextr64_b4_commutative:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-1		; CHECK-NEXT: mov x8, #-1 // =0xffffffffffffffff
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: bic x0, x9, x8		; CHECK-NEXT: bic x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%notmask = shl i64 -1, %numlowbits		%notmask = shl i64 -1, %numlowbits
%mask = xor i64 %notmask, -1		%mask = xor i64 %notmask, -1
%masked = and i64 %shifted, %mask ; swapped order		%masked = and i64 %shifted, %mask ; swapped order
ret i64 %masked		ret i64 %masked
}		}

; 64-bit, but with 32-bit output		; 64-bit, but with 32-bit output

; Everything done in 64-bit, truncation happens last.		; Everything done in 64-bit, truncation happens last.
define i32 @bextr64_32_b0(i64 %val, i64 %numskipbits, i8 %numlowbits) nounwind {		define i32 @bextr64_32_b0(i64 %val, i64 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_b0:		; CHECK-LABEL: bextr64_32_b0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov x8, #-1		; CHECK-NEXT: mov x8, #-1 // =0xffffffffffffffff
; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2		; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl x8, x8, x2		; CHECK-NEXT: lsl x8, x8, x2
; CHECK-NEXT: bic w0, w9, w8		; CHECK-NEXT: bic w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shiftedval = lshr i64 %val, %numskipbits		%shiftedval = lshr i64 %val, %numskipbits
%widenumlowbits = zext i8 %numlowbits to i64		%widenumlowbits = zext i8 %numlowbits to i64
%notmask = shl nsw i64 -1, %widenumlowbits		%notmask = shl nsw i64 -1, %widenumlowbits
%mask = xor i64 %notmask, -1		%mask = xor i64 %notmask, -1
%wideres = and i64 %shiftedval, %mask		%wideres = and i64 %shiftedval, %mask
%res = trunc i64 %wideres to i32		%res = trunc i64 %wideres to i32
ret i32 %res		ret i32 %res
}		}

; Shifting happens in 64-bit, then truncation. Masking is 32-bit.		; Shifting happens in 64-bit, then truncation. Masking is 32-bit.
define i32 @bextr64_32_b1(i64 %val, i64 %numskipbits, i8 %numlowbits) nounwind {		define i32 @bextr64_32_b1(i64 %val, i64 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_b1:		; CHECK-LABEL: bextr64_32_b1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov w8, #-1 // =0xffffffff
; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2		; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: bic w0, w9, w8		; CHECK-NEXT: bic w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shiftedval = lshr i64 %val, %numskipbits		%shiftedval = lshr i64 %val, %numskipbits
%truncshiftedval = trunc i64 %shiftedval to i32		%truncshiftedval = trunc i64 %shiftedval to i32
%widenumlowbits = zext i8 %numlowbits to i32		%widenumlowbits = zext i8 %numlowbits to i32
%notmask = shl nsw i32 -1, %widenumlowbits		%notmask = shl nsw i32 -1, %widenumlowbits
%mask = xor i32 %notmask, -1		%mask = xor i32 %notmask, -1
%res = and i32 %truncshiftedval, %mask		%res = and i32 %truncshiftedval, %mask
ret i32 %res		ret i32 %res
}		}

; Shifting happens in 64-bit. Mask is 32-bit, but extended to 64-bit.		; Shifting happens in 64-bit. Mask is 32-bit, but extended to 64-bit.
; Masking is 64-bit. Then truncation.		; Masking is 64-bit. Then truncation.
define i32 @bextr64_32_b2(i64 %val, i64 %numskipbits, i8 %numlowbits) nounwind {		define i32 @bextr64_32_b2(i64 %val, i64 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_b2:		; CHECK-LABEL: bextr64_32_b2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov w8, #-1 // =0xffffffff
; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2		; CHECK-NEXT: // kill: def $w2 killed $w2 def $x2
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: lsl w8, w8, w2		; CHECK-NEXT: lsl w8, w8, w2
; CHECK-NEXT: bic w0, w9, w8		; CHECK-NEXT: bic w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shiftedval = lshr i64 %val, %numskipbits		%shiftedval = lshr i64 %val, %numskipbits
%widenumlowbits = zext i8 %numlowbits to i32		%widenumlowbits = zext i8 %numlowbits to i32
%notmask = shl nsw i32 -1, %widenumlowbits		%notmask = shl nsw i32 -1, %widenumlowbits
%mask = xor i32 %notmask, -1		%mask = xor i32 %notmask, -1
%zextmask = zext i32 %mask to i64		%zextmask = zext i32 %mask to i64
%wideres = and i64 %shiftedval, %zextmask		%wideres = and i64 %shiftedval, %zextmask
%res = trunc i64 %wideres to i32		%res = trunc i64 %wideres to i32
ret i32 %res		ret i32 %res
}		}

; ---------------------------------------------------------------------------- ;		; ---------------------------------------------------------------------------- ;
; Pattern c. 32-bit		; Pattern c. 32-bit
; ---------------------------------------------------------------------------- ;		; ---------------------------------------------------------------------------- ;

define i32 @bextr32_c0(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_c0(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_c0:		; CHECK-LABEL: bextr32_c0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg w8, w2		; CHECK-NEXT: neg w8, w2
; CHECK-NEXT: mov w9, #-1		; CHECK-NEXT: mov w9, #-1 // =0xffffffff
; CHECK-NEXT: lsr w10, w0, w1		; CHECK-NEXT: lsr w10, w0, w1
; CHECK-NEXT: lsr w8, w9, w8		; CHECK-NEXT: lsr w8, w9, w8
; CHECK-NEXT: and w0, w8, w10		; CHECK-NEXT: and w0, w8, w10
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%numhighbits = sub i32 32, %numlowbits		%numhighbits = sub i32 32, %numlowbits
%mask = lshr i32 -1, %numhighbits		%mask = lshr i32 -1, %numhighbits
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_c1_indexzext(i32 %val, i8 %numskipbits, i8 %numlowbits) nounwind {		define i32 @bextr32_c1_indexzext(i32 %val, i8 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_c1_indexzext:		; CHECK-LABEL: bextr32_c1_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #32		; CHECK-NEXT: mov w8, #32 // =0x20
; CHECK-NEXT: mov w9, #-1		; CHECK-NEXT: mov w9, #-1 // =0xffffffff
; CHECK-NEXT: sub w8, w8, w2		; CHECK-NEXT: sub w8, w8, w2
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr w10, w0, w1		; CHECK-NEXT: lsr w10, w0, w1
; CHECK-NEXT: lsr w8, w9, w8		; CHECK-NEXT: lsr w8, w9, w8
; CHECK-NEXT: and w0, w8, w10		; CHECK-NEXT: and w0, w8, w10
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%skip = zext i8 %numskipbits to i32		%skip = zext i8 %numskipbits to i32
%shifted = lshr i32 %val, %skip		%shifted = lshr i32 %val, %skip
%numhighbits = sub i8 32, %numlowbits		%numhighbits = sub i8 32, %numlowbits
%sh_prom = zext i8 %numhighbits to i32		%sh_prom = zext i8 %numhighbits to i32
%mask = lshr i32 -1, %sh_prom		%mask = lshr i32 -1, %sh_prom
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_c2_load(ptr %w, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_c2_load(ptr %w, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_c2_load:		; CHECK-LABEL: bextr32_c2_load:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg w8, w2		; CHECK-NEXT: neg w8, w2
; CHECK-NEXT: ldr w9, [x0]		; CHECK-NEXT: ldr w9, [x0]
; CHECK-NEXT: mov w10, #-1		; CHECK-NEXT: mov w10, #-1 // =0xffffffff
; CHECK-NEXT: lsr w9, w9, w1		; CHECK-NEXT: lsr w9, w9, w1
; CHECK-NEXT: lsr w8, w10, w8		; CHECK-NEXT: lsr w8, w10, w8
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i32, ptr %w		%val = load i32, ptr %w
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%numhighbits = sub i32 32, %numlowbits		%numhighbits = sub i32 32, %numlowbits
%mask = lshr i32 -1, %numhighbits		%mask = lshr i32 -1, %numhighbits
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_c3_load_indexzext(ptr %w, i8 %numskipbits, i8 %numlowbits) nounwind {		define i32 @bextr32_c3_load_indexzext(ptr %w, i8 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_c3_load_indexzext:		; CHECK-LABEL: bextr32_c3_load_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #32		; CHECK-NEXT: mov w8, #32 // =0x20
; CHECK-NEXT: ldr w9, [x0]		; CHECK-NEXT: ldr w9, [x0]
; CHECK-NEXT: sub w8, w8, w2		; CHECK-NEXT: sub w8, w8, w2
; CHECK-NEXT: mov w10, #-1		; CHECK-NEXT: mov w10, #-1 // =0xffffffff
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr w9, w9, w1		; CHECK-NEXT: lsr w9, w9, w1
; CHECK-NEXT: lsr w8, w10, w8		; CHECK-NEXT: lsr w8, w10, w8
; CHECK-NEXT: and w0, w8, w9		; CHECK-NEXT: and w0, w8, w9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i32, ptr %w		%val = load i32, ptr %w
%skip = zext i8 %numskipbits to i32		%skip = zext i8 %numskipbits to i32
%shifted = lshr i32 %val, %skip		%shifted = lshr i32 %val, %skip
%numhighbits = sub i8 32, %numlowbits		%numhighbits = sub i8 32, %numlowbits
%sh_prom = zext i8 %numhighbits to i32		%sh_prom = zext i8 %numhighbits to i32
%mask = lshr i32 -1, %sh_prom		%mask = lshr i32 -1, %sh_prom
%masked = and i32 %mask, %shifted		%masked = and i32 %mask, %shifted
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_c4_commutative(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr32_c4_commutative(i32 %val, i32 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_c4_commutative:		; CHECK-LABEL: bextr32_c4_commutative:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg w8, w2		; CHECK-NEXT: neg w8, w2
; CHECK-NEXT: mov w9, #-1		; CHECK-NEXT: mov w9, #-1 // =0xffffffff
; CHECK-NEXT: lsr w10, w0, w1		; CHECK-NEXT: lsr w10, w0, w1
; CHECK-NEXT: lsr w8, w9, w8		; CHECK-NEXT: lsr w8, w9, w8
; CHECK-NEXT: and w0, w10, w8		; CHECK-NEXT: and w0, w10, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i32 %val, %numskipbits		%shifted = lshr i32 %val, %numskipbits
%numhighbits = sub i32 32, %numlowbits		%numhighbits = sub i32 32, %numlowbits
%mask = lshr i32 -1, %numhighbits		%mask = lshr i32 -1, %numhighbits
%masked = and i32 %shifted, %mask ; swapped order		%masked = and i32 %shifted, %mask ; swapped order
ret i32 %masked		ret i32 %masked
}		}

; 64-bit		; 64-bit

define i64 @bextr64_c0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_c0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_c0:		; CHECK-LABEL: bextr64_c0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg x8, x2		; CHECK-NEXT: neg x8, x2
; CHECK-NEXT: mov x9, #-1		; CHECK-NEXT: mov x9, #-1 // =0xffffffffffffffff
; CHECK-NEXT: lsr x10, x0, x1		; CHECK-NEXT: lsr x10, x0, x1
; CHECK-NEXT: lsr x8, x9, x8		; CHECK-NEXT: lsr x8, x9, x8
; CHECK-NEXT: and x0, x8, x10		; CHECK-NEXT: and x0, x8, x10
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%numhighbits = sub i64 64, %numlowbits		%numhighbits = sub i64 64, %numlowbits
%mask = lshr i64 -1, %numhighbits		%mask = lshr i64 -1, %numhighbits
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_c1_indexzext(i64 %val, i8 %numskipbits, i8 %numlowbits) nounwind {		define i64 @bextr64_c1_indexzext(i64 %val, i8 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_c1_indexzext:		; CHECK-LABEL: bextr64_c1_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #64		; CHECK-NEXT: mov w8, #64 // =0x40
; CHECK-NEXT: mov x9, #-1		; CHECK-NEXT: mov x9, #-1 // =0xffffffffffffffff
; CHECK-NEXT: sub w8, w8, w2		; CHECK-NEXT: sub w8, w8, w2
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr x10, x0, x1		; CHECK-NEXT: lsr x10, x0, x1
; CHECK-NEXT: lsr x8, x9, x8		; CHECK-NEXT: lsr x8, x9, x8
; CHECK-NEXT: and x0, x8, x10		; CHECK-NEXT: and x0, x8, x10
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%skip = zext i8 %numskipbits to i64		%skip = zext i8 %numskipbits to i64
%shifted = lshr i64 %val, %skip		%shifted = lshr i64 %val, %skip
%numhighbits = sub i8 64, %numlowbits		%numhighbits = sub i8 64, %numlowbits
%sh_prom = zext i8 %numhighbits to i64		%sh_prom = zext i8 %numhighbits to i64
%mask = lshr i64 -1, %sh_prom		%mask = lshr i64 -1, %sh_prom
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_c2_load(ptr %w, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_c2_load(ptr %w, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_c2_load:		; CHECK-LABEL: bextr64_c2_load:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg x8, x2		; CHECK-NEXT: neg x8, x2
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: mov x10, #-1		; CHECK-NEXT: mov x10, #-1 // =0xffffffffffffffff
; CHECK-NEXT: lsr x9, x9, x1		; CHECK-NEXT: lsr x9, x9, x1
; CHECK-NEXT: lsr x8, x10, x8		; CHECK-NEXT: lsr x8, x10, x8
; CHECK-NEXT: and x0, x8, x9		; CHECK-NEXT: and x0, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i64, ptr %w		%val = load i64, ptr %w
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%numhighbits = sub i64 64, %numlowbits		%numhighbits = sub i64 64, %numlowbits
%mask = lshr i64 -1, %numhighbits		%mask = lshr i64 -1, %numhighbits
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_c3_load_indexzext(ptr %w, i8 %numskipbits, i8 %numlowbits) nounwind {		define i64 @bextr64_c3_load_indexzext(ptr %w, i8 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_c3_load_indexzext:		; CHECK-LABEL: bextr64_c3_load_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #64		; CHECK-NEXT: mov w8, #64 // =0x40
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: sub w8, w8, w2		; CHECK-NEXT: sub w8, w8, w2
; CHECK-NEXT: mov x10, #-1		; CHECK-NEXT: mov x10, #-1 // =0xffffffffffffffff
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr x9, x9, x1		; CHECK-NEXT: lsr x9, x9, x1
; CHECK-NEXT: lsr x8, x10, x8		; CHECK-NEXT: lsr x8, x10, x8
; CHECK-NEXT: and x0, x8, x9		; CHECK-NEXT: and x0, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i64, ptr %w		%val = load i64, ptr %w
%skip = zext i8 %numskipbits to i64		%skip = zext i8 %numskipbits to i64
%shifted = lshr i64 %val, %skip		%shifted = lshr i64 %val, %skip
%numhighbits = sub i8 64, %numlowbits		%numhighbits = sub i8 64, %numlowbits
%sh_prom = zext i8 %numhighbits to i64		%sh_prom = zext i8 %numhighbits to i64
%mask = lshr i64 -1, %sh_prom		%mask = lshr i64 -1, %sh_prom
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_c4_commutative(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i64 @bextr64_c4_commutative(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_c4_commutative:		; CHECK-LABEL: bextr64_c4_commutative:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg x8, x2		; CHECK-NEXT: neg x8, x2
; CHECK-NEXT: mov x9, #-1		; CHECK-NEXT: mov x9, #-1 // =0xffffffffffffffff
; CHECK-NEXT: lsr x10, x0, x1		; CHECK-NEXT: lsr x10, x0, x1
; CHECK-NEXT: lsr x8, x9, x8		; CHECK-NEXT: lsr x8, x9, x8
; CHECK-NEXT: and x0, x10, x8		; CHECK-NEXT: and x0, x10, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%numhighbits = sub i64 64, %numlowbits		%numhighbits = sub i64 64, %numlowbits
%mask = lshr i64 -1, %numhighbits		%mask = lshr i64 -1, %numhighbits
%masked = and i64 %shifted, %mask ; swapped order		%masked = and i64 %shifted, %mask ; swapped order
ret i64 %masked		ret i64 %masked
}		}

; 64-bit, but with 32-bit output		; 64-bit, but with 32-bit output

; Everything done in 64-bit, truncation happens last.		; Everything done in 64-bit, truncation happens last.
define i32 @bextr64_32_c0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {		define i32 @bextr64_32_c0(i64 %val, i64 %numskipbits, i64 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_c0:		; CHECK-LABEL: bextr64_32_c0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg x8, x2		; CHECK-NEXT: neg x8, x2
; CHECK-NEXT: mov x9, #-1		; CHECK-NEXT: mov x9, #-1 // =0xffffffffffffffff
; CHECK-NEXT: lsr x10, x0, x1		; CHECK-NEXT: lsr x10, x0, x1
; CHECK-NEXT: lsr x8, x9, x8		; CHECK-NEXT: lsr x8, x9, x8
; CHECK-NEXT: and w0, w8, w10		; CHECK-NEXT: and w0, w8, w10
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%numhighbits = sub i64 64, %numlowbits		%numhighbits = sub i64 64, %numlowbits
%mask = lshr i64 -1, %numhighbits		%mask = lshr i64 -1, %numhighbits
%masked = and i64 %mask, %shifted		%masked = and i64 %mask, %shifted
%res = trunc i64 %masked to i32		%res = trunc i64 %masked to i32
ret i32 %res		ret i32 %res
}		}

; Shifting happens in 64-bit, then truncation. Masking is 32-bit.		; Shifting happens in 64-bit, then truncation. Masking is 32-bit.
define i32 @bextr64_32_c1(i64 %val, i64 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr64_32_c1(i64 %val, i64 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_c1:		; CHECK-LABEL: bextr64_32_c1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg w8, w2		; CHECK-NEXT: neg w8, w2
; CHECK-NEXT: mov w9, #-1		; CHECK-NEXT: mov w9, #-1 // =0xffffffff
; CHECK-NEXT: lsr x10, x0, x1		; CHECK-NEXT: lsr x10, x0, x1
; CHECK-NEXT: lsr w8, w9, w8		; CHECK-NEXT: lsr w8, w9, w8
; CHECK-NEXT: and w0, w8, w10		; CHECK-NEXT: and w0, w8, w10
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%truncshifted = trunc i64 %shifted to i32		%truncshifted = trunc i64 %shifted to i32
%numhighbits = sub i32 32, %numlowbits		%numhighbits = sub i32 32, %numlowbits
%mask = lshr i32 -1, %numhighbits		%mask = lshr i32 -1, %numhighbits
%masked = and i32 %mask, %truncshifted		%masked = and i32 %mask, %truncshifted
ret i32 %masked		ret i32 %masked
}		}

; Shifting happens in 64-bit. Mask is 32-bit, but extended to 64-bit.		; Shifting happens in 64-bit. Mask is 32-bit, but extended to 64-bit.
; Masking is 64-bit. Then truncation.		; Masking is 64-bit. Then truncation.
define i32 @bextr64_32_c2(i64 %val, i64 %numskipbits, i32 %numlowbits) nounwind {		define i32 @bextr64_32_c2(i64 %val, i64 %numskipbits, i32 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_32_c2:		; CHECK-LABEL: bextr64_32_c2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: neg w8, w2		; CHECK-NEXT: neg w8, w2
; CHECK-NEXT: mov w9, #-1		; CHECK-NEXT: mov w9, #-1 // =0xffffffff
; CHECK-NEXT: lsr x10, x0, x1		; CHECK-NEXT: lsr x10, x0, x1
; CHECK-NEXT: lsr w8, w9, w8		; CHECK-NEXT: lsr w8, w9, w8
; CHECK-NEXT: and w0, w8, w10		; CHECK-NEXT: and w0, w8, w10
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%shifted = lshr i64 %val, %numskipbits		%shifted = lshr i64 %val, %numskipbits
%numhighbits = sub i32 32, %numlowbits		%numhighbits = sub i32 32, %numlowbits
%mask = lshr i32 -1, %numhighbits		%mask = lshr i32 -1, %numhighbits
%zextmask = zext i32 %mask to i64		%zextmask = zext i32 %mask to i64
Show All 19 Lines	; CHECK-NEXT: ret
%highbitscleared = shl i32 %shifted, %numhighbits		%highbitscleared = shl i32 %shifted, %numhighbits
%masked = lshr i32 %highbitscleared, %numhighbits		%masked = lshr i32 %highbitscleared, %numhighbits
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_d1_indexzext(i32 %val, i8 %numskipbits, i8 %numlowbits) nounwind {		define i32 @bextr32_d1_indexzext(i32 %val, i8 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_d1_indexzext:		; CHECK-LABEL: bextr32_d1_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #32		; CHECK-NEXT: mov w8, #32 // =0x20
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr w9, w0, w1		; CHECK-NEXT: lsr w9, w0, w1
; CHECK-NEXT: sub w8, w8, w2		; CHECK-NEXT: sub w8, w8, w2
; CHECK-NEXT: lsl w9, w9, w8		; CHECK-NEXT: lsl w9, w9, w8
; CHECK-NEXT: lsr w0, w9, w8		; CHECK-NEXT: lsr w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%skip = zext i8 %numskipbits to i32		%skip = zext i8 %numskipbits to i32
%shifted = lshr i32 %val, %skip		%shifted = lshr i32 %val, %skip
Show All 19 Lines	; CHECK-NEXT: ret
%highbitscleared = shl i32 %shifted, %numhighbits		%highbitscleared = shl i32 %shifted, %numhighbits
%masked = lshr i32 %highbitscleared, %numhighbits		%masked = lshr i32 %highbitscleared, %numhighbits
ret i32 %masked		ret i32 %masked
}		}

define i32 @bextr32_d3_load_indexzext(ptr %w, i8 %numskipbits, i8 %numlowbits) nounwind {		define i32 @bextr32_d3_load_indexzext(ptr %w, i8 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr32_d3_load_indexzext:		; CHECK-LABEL: bextr32_d3_load_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #32		; CHECK-NEXT: mov w8, #32 // =0x20
; CHECK-NEXT: ldr w9, [x0]		; CHECK-NEXT: ldr w9, [x0]
; CHECK-NEXT: sub w8, w8, w2		; CHECK-NEXT: sub w8, w8, w2
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr w9, w9, w1		; CHECK-NEXT: lsr w9, w9, w1
; CHECK-NEXT: lsl w9, w9, w8		; CHECK-NEXT: lsl w9, w9, w8
; CHECK-NEXT: lsr w0, w9, w8		; CHECK-NEXT: lsr w0, w9, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i32, ptr %w		%val = load i32, ptr %w
Show All 21 Lines	; CHECK-NEXT: ret
%highbitscleared = shl i64 %shifted, %numhighbits		%highbitscleared = shl i64 %shifted, %numhighbits
%masked = lshr i64 %highbitscleared, %numhighbits		%masked = lshr i64 %highbitscleared, %numhighbits
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_d1_indexzext(i64 %val, i8 %numskipbits, i8 %numlowbits) nounwind {		define i64 @bextr64_d1_indexzext(i64 %val, i8 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_d1_indexzext:		; CHECK-LABEL: bextr64_d1_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #64		; CHECK-NEXT: mov w8, #64 // =0x40
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr x9, x0, x1		; CHECK-NEXT: lsr x9, x0, x1
; CHECK-NEXT: sub w8, w8, w2		; CHECK-NEXT: sub w8, w8, w2
; CHECK-NEXT: lsl x9, x9, x8		; CHECK-NEXT: lsl x9, x9, x8
; CHECK-NEXT: lsr x0, x9, x8		; CHECK-NEXT: lsr x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%skip = zext i8 %numskipbits to i64		%skip = zext i8 %numskipbits to i64
%shifted = lshr i64 %val, %skip		%shifted = lshr i64 %val, %skip
Show All 19 Lines	; CHECK-NEXT: ret
%highbitscleared = shl i64 %shifted, %numhighbits		%highbitscleared = shl i64 %shifted, %numhighbits
%masked = lshr i64 %highbitscleared, %numhighbits		%masked = lshr i64 %highbitscleared, %numhighbits
ret i64 %masked		ret i64 %masked
}		}

define i64 @bextr64_d3_load_indexzext(ptr %w, i8 %numskipbits, i8 %numlowbits) nounwind {		define i64 @bextr64_d3_load_indexzext(ptr %w, i8 %numskipbits, i8 %numlowbits) nounwind {
; CHECK-LABEL: bextr64_d3_load_indexzext:		; CHECK-LABEL: bextr64_d3_load_indexzext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #64		; CHECK-NEXT: mov w8, #64 // =0x40
; CHECK-NEXT: ldr x9, [x0]		; CHECK-NEXT: ldr x9, [x0]
; CHECK-NEXT: sub w8, w8, w2		; CHECK-NEXT: sub w8, w8, w2
; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK-NEXT: lsr x9, x9, x1		; CHECK-NEXT: lsr x9, x9, x1
; CHECK-NEXT: lsl x9, x9, x8		; CHECK-NEXT: lsl x9, x9, x8
; CHECK-NEXT: lsr x0, x9, x8		; CHECK-NEXT: lsr x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%val = load i64, ptr %w		%val = load i64, ptr %w
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
; ---------------------------------------------------------------------------- ;		; ---------------------------------------------------------------------------- ;

; https://bugs.llvm.org/show_bug.cgi?id=38938		; https://bugs.llvm.org/show_bug.cgi?id=38938
define void @pr38938(ptr %a0, ptr %a1) nounwind {		define void @pr38938(ptr %a0, ptr %a1) nounwind {
; CHECK-LABEL: pr38938:		; CHECK-LABEL: pr38938:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldr x8, [x1]		; CHECK-NEXT: ldr x8, [x1]
; CHECK-NEXT: ubfx x8, x8, #21, #10		; CHECK-NEXT: ubfx x8, x8, #21, #10
; CHECK-NEXT: lsl x8, x8, #2		; CHECK-NEXT: ldr w9, [x0, x8, lsl #2]
; CHECK-NEXT: ldr w9, [x0, x8]
; CHECK-NEXT: add w9, w9, #1		; CHECK-NEXT: add w9, w9, #1
; CHECK-NEXT: str w9, [x0, x8]		; CHECK-NEXT: str w9, [x0, x8, lsl #2]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%tmp = load i64, ptr %a1, align 8		%tmp = load i64, ptr %a1, align 8
%tmp1 = lshr i64 %tmp, 21		%tmp1 = lshr i64 %tmp, 21
%tmp2 = and i64 %tmp1, 1023		%tmp2 = and i64 %tmp1, 1023
%tmp3 = getelementptr inbounds i32, ptr %a0, i64 %tmp2		%tmp3 = getelementptr inbounds i32, ptr %a0, i64 %tmp2
%tmp4 = load i32, ptr %tmp3, align 4		%tmp4 = load i32, ptr %tmp3, align 4
%tmp5 = add nsw i32 %tmp4, 1		%tmp5 = add nsw i32 %tmp4, 1
store i32 %tmp5, ptr %tmp3, align 4		store i32 %tmp5, ptr %tmp3, align 4
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines