This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/2
AArch64ISelDAGToDAG.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/1
aarch64-fold-lslfast.ll
1/1
arm64-addr-mode-folding.ll
1/1
arm64-fold-address.ll
-
arm64-fold-lsl.ll

Differential D155470

[AArch64] LSLFast to fold onto base address by default
Needs ReviewPublic

Authored by harviniriawan on Jul 17 2023, 8:14 AM.

Download Raw Diff

Details

Reviewers

dmgreen
chill
fhahn
Allen

Summary

Most CPUs have dedicated adder & shifter to compute base address of

loads and stores, hence they are always free to use

Older CPUs incur extra 1 cycle when doing load with left shift by 2, don't fold LSL to base address in these cases, add new feature for this

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	40 ms	x64 debian > LLVM.CodeGen/AArch64::arm64-addr-mode-folding.ll
	100 ms	x64 debian > LLVM.CodeGen/AArch64::extract-bits.ll

Event Timeline

harviniriawan created this revision.Jul 17 2023, 8:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 8:14 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

harviniriawan requested review of this revision.Jul 17 2023, 8:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2023, 8:14 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I agree that it makes sense to do this more aggressively from looking at some of the optimisation manuals. (lsl-fast has taken the odd route of being originally added to represent when shifts into addressing operands were quick, but has started now to just mean that the add with shift is cheap). Some of the older optimization manual mention shifts of 2 being slower, is that something that we need to take into account? I'm not sure about other non-arm cores too. Presumably there was a reason why people originally believed that operands with shifts would be better as separate instructions.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
660	Subtarget->hasLSLFast() \|\| FoldToBaseAddr
llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll
173–174	These CHECK prefixes can be removed now, if they are all expected to be the same.
llvm/test/CodeGen/AArch64/arm64-addr-mode-folding.ll
195	Make sure to remove the old CHECK lines. It is probably worth updating the check lines in a separate patch, so that just the differences can be shown here.
llvm/test/CodeGen/AArch64/arm64-fold-address.ll
62	This looks like it hasn't been generated properly. It might mean the update script doesn't like the triple.

Herald added a subscriber: StephenFan. · View Herald TranscriptJul 17 2023, 11:25 AM

Harbormaster completed remote builds in B245865: Diff 541046.Jul 17 2023, 12:19 PM

Agree with @dmgreen, fold small shift into addressing operands don't create the instruction latency and reduce register usage, GCC has done this folding, https://gcc.godbolt.org/z/68zEq8x81

harviniriawan updated this revision to Diff 542553.Jul 20 2023, 9:00 AM

harviniriawan marked 4 inline comments as done.

harviniriawan edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B246938: Diff 542553.Jul 20 2023, 12:14 PM

harviniriawan updated this revision to Diff 542804.Jul 21 2023, 1:52 AM

Harbormaster completed remote builds in B247134: Diff 542804.Jul 21 2023, 3:55 AM

Sorry for the delay. I've been looking at something very related recently (folding extends into address operands), so I may have become overly opinionated.

I think that it makes sense to split LSLFast into an address and alu part, but not as a part of this patch. From the optimization guides it then looks like we have 4 case where it is or isn't better to fold into the addressing operands (with multiple uses):

None. The current default. If it has multiple uses don't fold it.
LSLFast. The current LSLFast which folds LSL (not extends), and is used for the original cases of LSLFast on Kryo and Falkor.
Ext23Fast. Shifts of 2 and 3 (Scales of 4 and 8) are cheap, the other two are not. All extends are cheap. Used for all Arm OoO cores.
AllFast. Everything is cheap. Scales of 2 and 16 along with those above. Used for in order cores I think.

It looks like Ext24Fast should be the default for -mcpu=generic, as a conservative compromise between Ext24Fast and AllFast. (There is a chance that AllFast is better on all cores, but it looks like it takes the load/store pipe for an extra cycle and the ones I tested had an extra cycle latency). This patch seems to essentially switch the default to AllFast, with a target feature that makes LSL1 slow again (but now LSL4)? I like changing the default but I'm not sure we can change it to AllFast for all cpus.

llvm/lib/Target/AArch64/AArch64.td
386 ↗	(On Diff #542804)	Can we add Addr to the name of this feature, to explain that it is about address operands, not add+lsl's. Should we also use Scale2 or Shift1? From looking at the optimization guides and what we model in the scheduling model (https://github.com/llvm/llvm-project/blob/f6bdfb0b92690403ceef8c1d58adf7a3a733b543/llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td#L490), it looks like this should be slower for scale2 and scale16. scale4 and scale8 (and scale1, that one's easy) are fast.
774 ↗	(On Diff #542804)	Should FeatureShiftBy2Slow be slower on A55? I don't see that from the optimization guide.
786 ↗	(On Diff #542804)	From what I can see from the optimization guides, almost all AArch64 Arm cpus (except for the Cortex-A510) say that the latency of `Load vector reg, register offset, scale, S/D-form` is a cycle lower than `Load vector reg, register offset, scale, H/Q-form`.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
1221	Should this and the one below be true too?

I've put up a patch to split LSLFast into two parts in https://reviews.llvm.org/D157982.

dmgreen mentioned this in D152828: [MachineSink][AArch64] Sink instruction copies when they can replace copy into hard register or folded into addressing mode .Aug 16 2023, 2:27 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

35 lines

test/

CodeGen/

AArch64/

aarch64-fold-lslfast.ll

155 lines

arm64-addr-mode-folding.ll

99 lines

arm64-fold-address.ll

12 lines

arm64-fold-lsl.ll

194 lines

Diff 541046

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 446 Lines • ▼ Show 20 Lines	private:
bool SelectAddrModeUnscaled(SDValue N, unsigned Size, SDValue &Base,		bool SelectAddrModeUnscaled(SDValue N, unsigned Size, SDValue &Base,
SDValue &OffImm);		SDValue &OffImm);
bool SelectAddrModeWRO(SDValue N, unsigned Size, SDValue &Base,		bool SelectAddrModeWRO(SDValue N, unsigned Size, SDValue &Base,
SDValue &Offset, SDValue &SignExtend,		SDValue &Offset, SDValue &SignExtend,
SDValue &DoShift);		SDValue &DoShift);
bool SelectAddrModeXRO(SDValue N, unsigned Size, SDValue &Base,		bool SelectAddrModeXRO(SDValue N, unsigned Size, SDValue &Base,
SDValue &Offset, SDValue &SignExtend,		SDValue &Offset, SDValue &SignExtend,
SDValue &DoShift);		SDValue &DoShift);
bool isWorthFolding(SDValue V) const;		bool isWorthFolding(SDValue V, bool FoldToBaseAddr) const;
bool SelectExtendedSHL(SDValue N, unsigned Size, bool WantExtend,		bool SelectExtendedSHL(SDValue N, unsigned Size, bool WantExtend,
SDValue &Offset, SDValue &SignExtend);		SDValue &Offset, SDValue &SignExtend);

template<unsigned RegWidth>		template<unsigned RegWidth>
bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos) {		bool SelectCVTFixedPosOperand(SDValue N, SDValue &FixedPos) {
return SelectCVTFixedPosOperand(N, FixedPos, RegWidth);		return SelectCVTFixedPosOperand(N, FixedPos, RegWidth);
}		}

▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	case ISD::SRA:
return AArch64_AM::ASR;		return AArch64_AM::ASR;
case ISD::ROTR:		case ISD::ROTR:
return AArch64_AM::ROR;		return AArch64_AM::ROR;
}		}
}		}

/// Determine whether it is worth it to fold SHL into the addressing		/// Determine whether it is worth it to fold SHL into the addressing
/// mode.		/// mode.
static bool isWorthFoldingSHL(SDValue V) {		static bool isWorthFoldingSHL(SDValue V, bool FoldToBaseAddr = false) {
assert(V.getOpcode() == ISD::SHL && "invalid opcode");		assert(V.getOpcode() == ISD::SHL && "invalid opcode");
// It is worth folding logical shift of up to three places.		// It is worth folding logical shift of up to three places.
auto *CSD = dyn_cast<ConstantSDNode>(V.getOperand(1));		auto *CSD = dyn_cast<ConstantSDNode>(V.getOperand(1));
if (!CSD)		if (!CSD)
return false;		return false;
unsigned ShiftVal = CSD->getZExtValue();		unsigned ShiftVal = CSD->getZExtValue();
if (ShiftVal > 3)		if (ShiftVal > 3)
return false;		return false;

// Check if this particular node is reused in any non-memory related		// Check if this particular node is reused in any non-memory related
// operation. If yes, do not try to fold this node into the address		// operation. If yes, do not try to fold this node into the address
// computation, since the computation will be kept.		// computation, since the computation will be kept.
		if (!FoldToBaseAddr) {
const SDNode *Node = V.getNode();		const SDNode *Node = V.getNode();
for (SDNode *UI : Node->uses())		for (SDNode *UI : Node->uses())
if (!isa<MemSDNode>(*UI))		if (!isa<MemSDNode>(*UI))
for (SDNode *UII : UI->uses())		for (SDNode *UII : UI->uses())
if (!isa<MemSDNode>(*UII))		if (!isa<MemSDNode>(*UII))
return false;		return false;
		}
return true;		return true;
}		}

/// Determine whether it is worth to fold V into an extended register.		/// Determine whether it is worth to fold V into an extended register.
bool AArch64DAGToDAGISel::isWorthFolding(SDValue V) const {		bool AArch64DAGToDAGISel::isWorthFolding(SDValue V, bool FoldToBaseAddr = false) const {
		bool AllowLSLFast = Subtarget->hasLSLFast() ? true : FoldToBaseAddr;
		dmgreenUnsubmitted Done Reply Inline Actions Subtarget->hasLSLFast() \|\| FoldToBaseAddr dmgreen: Subtarget->hasLSLFast() \|\| FoldToBaseAddr
// Trivial if we are optimizing for code size or if there is only		// Trivial if we are optimizing for code size or if there is only
// one use of the value.		// one use of the value.
if (CurDAG->shouldOptForSize() \|\| V.hasOneUse())		if (CurDAG->shouldOptForSize() \|\| V.hasOneUse())
return true;		return true;
// If a subtarget has a fastpath LSL we can fold a logical shift into		// If a subtarget has a fastpath LSL we can fold a logical shift into
// the addressing mode and save a cycle.		// the addressing mode and save a cycle.
if (Subtarget->hasLSLFast() && V.getOpcode() == ISD::SHL &&		if (AllowLSLFast && V.getOpcode() == ISD::SHL &&
isWorthFoldingSHL(V))		isWorthFoldingSHL(V, FoldToBaseAddr))
return true;		return true;
if (Subtarget->hasLSLFast() && V.getOpcode() == ISD::ADD) {		if (AllowLSLFast && V.getOpcode() == ISD::ADD) {
const SDValue LHS = V.getOperand(0);		const SDValue LHS = V.getOperand(0);
const SDValue RHS = V.getOperand(1);		const SDValue RHS = V.getOperand(1);
if (LHS.getOpcode() == ISD::SHL && isWorthFoldingSHL(LHS))		if (LHS.getOpcode() == ISD::SHL && isWorthFoldingSHL(LHS, FoldToBaseAddr))
return true;		return true;
if (RHS.getOpcode() == ISD::SHL && isWorthFoldingSHL(RHS))		if (RHS.getOpcode() == ISD::SHL && isWorthFoldingSHL(RHS, FoldToBaseAddr))
return true;		return true;
}		}

// It hurts otherwise, since the value will be reused.		// It hurts otherwise, since the value will be reused.
return false;		return false;
}		}

/// and (shl/srl/sra, x, c), mask --> shl (srl/sra, x, c1), c2		/// and (shl/srl/sra, x, c), mask --> shl (srl/sra, x, c1), c2
▲ Show 20 Lines • Show All 499 Lines • ▼ Show 20 Lines	bool AArch64DAGToDAGISel::SelectAddrModeWRO(SDValue N, unsigned Size,
// computation, since the computation will be kept.		// computation, since the computation will be kept.
const SDNode *Node = N.getNode();		const SDNode *Node = N.getNode();
for (SDNode *UI : Node->uses()) {		for (SDNode *UI : Node->uses()) {
if (!isa<MemSDNode>(*UI))		if (!isa<MemSDNode>(*UI))
return false;		return false;
}		}

// Remember if it is worth folding N when it produces extended register.		// Remember if it is worth folding N when it produces extended register.
bool IsExtendedRegisterWorthFolding = isWorthFolding(N);		bool IsExtendedRegisterWorthFolding = isWorthFolding(N, /* FoldtoBaseAddr */true);

// Try to match a shifted extend on the RHS.		// Try to match a shifted extend on the RHS.
if (IsExtendedRegisterWorthFolding && RHS.getOpcode() == ISD::SHL &&		if (IsExtendedRegisterWorthFolding && RHS.getOpcode() == ISD::SHL &&
SelectExtendedSHL(RHS, Size, true, Offset, SignExtend)) {		SelectExtendedSHL(RHS, Size, true, Offset, SignExtend)) {
Base = LHS;		Base = LHS;
DoShift = CurDAG->getTargetConstant(true, dl, MVT::i32);		DoShift = CurDAG->getTargetConstant(true, dl, MVT::i32);
return true;		return true;
}		}
Show All 13 Lines	bool AArch64DAGToDAGISel::SelectAddrModeWRO(SDValue N, unsigned Size,
// Try to match an unshifted extend on the LHS.		// Try to match an unshifted extend on the LHS.
if (IsExtendedRegisterWorthFolding &&		if (IsExtendedRegisterWorthFolding &&
(Ext = getExtendTypeForNode(LHS, true)) !=		(Ext = getExtendTypeForNode(LHS, true)) !=
AArch64_AM::InvalidShiftExtend) {		AArch64_AM::InvalidShiftExtend) {
Base = RHS;		Base = RHS;
Offset = narrowIfNeeded(CurDAG, LHS.getOperand(0));		Offset = narrowIfNeeded(CurDAG, LHS.getOperand(0));
SignExtend = CurDAG->getTargetConstant(Ext == AArch64_AM::SXTW, dl,		SignExtend = CurDAG->getTargetConstant(Ext == AArch64_AM::SXTW, dl,
MVT::i32);		MVT::i32);
if (isWorthFolding(LHS))		if (isWorthFolding(LHS))
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this and the one below be true too? dmgreen: Should this and the one below be true too?
return true;		return true;
}		}

// Try to match an unshifted extend on the RHS.		// Try to match an unshifted extend on the RHS.
if (IsExtendedRegisterWorthFolding &&		if (IsExtendedRegisterWorthFolding &&
(Ext = getExtendTypeForNode(RHS, true)) !=		(Ext = getExtendTypeForNode(RHS, true)) !=
AArch64_AM::InvalidShiftExtend) {		AArch64_AM::InvalidShiftExtend) {
Base = LHS;		Base = LHS;
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (isa<ConstantSDNode>(RHS)) {
SDNode *MOVI =		SDNode *MOVI =
CurDAG->getMachineNode(AArch64::MOVi64imm, DL, MVT::i64, Ops);		CurDAG->getMachineNode(AArch64::MOVi64imm, DL, MVT::i64, Ops);
SDValue MOVIV = SDValue(MOVI, 0);		SDValue MOVIV = SDValue(MOVI, 0);
// This ADD of two X register will be selected into [Reg+Reg] mode.		// This ADD of two X register will be selected into [Reg+Reg] mode.
N = CurDAG->getNode(ISD::ADD, DL, MVT::i64, LHS, MOVIV);		N = CurDAG->getNode(ISD::ADD, DL, MVT::i64, LHS, MOVIV);
}		}

// Remember if it is worth folding N when it produces extended register.		// Remember if it is worth folding N when it produces extended register.
bool IsExtendedRegisterWorthFolding = isWorthFolding(N);		bool IsExtendedRegisterWorthFolding = isWorthFolding(N, /FoldToBaseAddr/ true);

// Try to match a shifted extend on the RHS.		// Try to match a shifted extend on the RHS.
if (IsExtendedRegisterWorthFolding && RHS.getOpcode() == ISD::SHL &&		if (IsExtendedRegisterWorthFolding && RHS.getOpcode() == ISD::SHL &&
SelectExtendedSHL(RHS, Size, false, Offset, SignExtend)) {		SelectExtendedSHL(RHS, Size, false, Offset, SignExtend)) {
Base = LHS;		Base = LHS;
DoShift = CurDAG->getTargetConstant(true, DL, MVT::i32);		DoShift = CurDAG->getTargetConstant(true, DL, MVT::i32);
return true;		return true;
}		}
▲ Show 20 Lines • Show All 5,389 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-fold-lslfast.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s --check-prefixes=CHECK,CHECK0		; RUN: llc < %s -mtriple=aarch64-linux-gnu \| FileCheck %s --check-prefixes=CHECK,CHECK0
; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+lsl-fast \| FileCheck %s --check-prefixes=CHECK,CHECK3		; RUN: llc < %s -mtriple=aarch64-linux-gnu -mattr=+lsl-fast \| FileCheck %s --check-prefixes=CHECK,CHECK3

%struct.a = type [256 x i16]		%struct.a = type [256 x i16]
%struct.b = type [256 x i32]		%struct.b = type [256 x i32]
%struct.c = type [256 x i64]		%struct.c = type [256 x i64]

declare void @foo()		declare void @foo()
define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {		define i16 @halfword(ptr %ctx, i32 %xor72) nounwind {
; CHECK0-LABEL: halfword:		; CHECK-LABEL: halfword:
; CHECK0: // %bb.0:		; CHECK: // %bb.0:
; CHECK0-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill		; CHECK-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK0-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK0-NEXT: ubfx x8, x1, #9, #8		; CHECK-NEXT: ubfx x21, x1, #9, #8
; CHECK0-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill		; CHECK-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK0-NEXT: lsl x21, x8, #1		; CHECK-NEXT: mov x19, x0
; CHECK0-NEXT: mov x19, x0		; CHECK-NEXT: ldrh w20, [x0, x21, lsl #1]
; CHECK0-NEXT: ldrh w20, [x0, x21]		; CHECK-NEXT: bl foo
; CHECK0-NEXT: bl foo		; CHECK-NEXT: mov w0, w20
; CHECK0-NEXT: mov w0, w20		; CHECK-NEXT: strh w20, [x19, x21, lsl #1]
; CHECK0-NEXT: strh w20, [x19, x21]		; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK0-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload		; CHECK-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK0-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload		; CHECK-NEXT: ret
; CHECK0-NEXT: ret
;
; CHECK3-LABEL: halfword:
; CHECK3: // %bb.0:
; CHECK3-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK3-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK3-NEXT: ubfx x21, x1, #9, #8
; CHECK3-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK3-NEXT: mov x19, x0
; CHECK3-NEXT: ldrh w20, [x0, x21, lsl #1]
; CHECK3-NEXT: bl foo
; CHECK3-NEXT: mov w0, w20
; CHECK3-NEXT: strh w20, [x19, x21, lsl #1]
; CHECK3-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK3-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK3-NEXT: ret
%shr81 = lshr i32 %xor72, 9		%shr81 = lshr i32 %xor72, 9
%conv82 = zext i32 %shr81 to i64		%conv82 = zext i32 %shr81 to i64
%idxprom83 = and i64 %conv82, 255		%idxprom83 = and i64 %conv82, 255
%arrayidx86 = getelementptr inbounds %struct.a, ptr %ctx, i64 0, i64 %idxprom83		%arrayidx86 = getelementptr inbounds %struct.a, ptr %ctx, i64 0, i64 %idxprom83
%result = load i16, ptr %arrayidx86, align 2		%result = load i16, ptr %arrayidx86, align 2
call void @foo()		call void @foo()
store i16 %result, ptr %arrayidx86, align 2		store i16 %result, ptr %arrayidx86, align 2
ret i16 %result		ret i16 %result
}		}

define i32 @word(ptr %ctx, i32 %xor72) nounwind {		define i32 @word(ptr %ctx, i32 %xor72) nounwind {
; CHECK0-LABEL: word:		; CHECK-LABEL: word:
; CHECK0: // %bb.0:		; CHECK: // %bb.0:
; CHECK0-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill		; CHECK-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK0-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK0-NEXT: ubfx x8, x1, #9, #8		; CHECK-NEXT: ubfx x21, x1, #9, #8
; CHECK0-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill		; CHECK-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK0-NEXT: lsl x21, x8, #2		; CHECK-NEXT: mov x19, x0
; CHECK0-NEXT: mov x19, x0		; CHECK-NEXT: ldr w20, [x0, x21, lsl #2]
; CHECK0-NEXT: ldr w20, [x0, x21]		; CHECK-NEXT: bl foo
; CHECK0-NEXT: bl foo		; CHECK-NEXT: mov w0, w20
; CHECK0-NEXT: mov w0, w20		; CHECK-NEXT: str w20, [x19, x21, lsl #2]
; CHECK0-NEXT: str w20, [x19, x21]		; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK0-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload		; CHECK-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK0-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload		; CHECK-NEXT: ret
; CHECK0-NEXT: ret
;
; CHECK3-LABEL: word:
; CHECK3: // %bb.0:
; CHECK3-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK3-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK3-NEXT: ubfx x21, x1, #9, #8
; CHECK3-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK3-NEXT: mov x19, x0
; CHECK3-NEXT: ldr w20, [x0, x21, lsl #2]
; CHECK3-NEXT: bl foo
; CHECK3-NEXT: mov w0, w20
; CHECK3-NEXT: str w20, [x19, x21, lsl #2]
; CHECK3-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK3-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK3-NEXT: ret
%shr81 = lshr i32 %xor72, 9		%shr81 = lshr i32 %xor72, 9
%conv82 = zext i32 %shr81 to i64		%conv82 = zext i32 %shr81 to i64
%idxprom83 = and i64 %conv82, 255		%idxprom83 = and i64 %conv82, 255
%arrayidx86 = getelementptr inbounds %struct.b, ptr %ctx, i64 0, i64 %idxprom83		%arrayidx86 = getelementptr inbounds %struct.b, ptr %ctx, i64 0, i64 %idxprom83
%result = load i32, ptr %arrayidx86, align 4		%result = load i32, ptr %arrayidx86, align 4
call void @foo()		call void @foo()
store i32 %result, ptr %arrayidx86, align 4		store i32 %result, ptr %arrayidx86, align 4
ret i32 %result		ret i32 %result
}		}

define i64 @doubleword(ptr %ctx, i32 %xor72) nounwind {		define i64 @doubleword(ptr %ctx, i32 %xor72) nounwind {
; CHECK0-LABEL: doubleword:		; CHECK-LABEL: doubleword:
; CHECK0: // %bb.0:		; CHECK: // %bb.0:
; CHECK0-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill		; CHECK-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK0-NEXT: // kill: def $w1 killed $w1 def $x1		; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK0-NEXT: ubfx x8, x1, #9, #8		; CHECK-NEXT: ubfx x21, x1, #9, #8
; CHECK0-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill		; CHECK-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK0-NEXT: lsl x21, x8, #3		; CHECK-NEXT: mov x19, x0
; CHECK0-NEXT: mov x19, x0		; CHECK-NEXT: ldr x20, [x0, x21, lsl #3]
; CHECK0-NEXT: ldr x20, [x0, x21]		; CHECK-NEXT: bl foo
; CHECK0-NEXT: bl foo		; CHECK-NEXT: mov x0, x20
; CHECK0-NEXT: mov x0, x20		; CHECK-NEXT: str x20, [x19, x21, lsl #3]
; CHECK0-NEXT: str x20, [x19, x21]		; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK0-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload		; CHECK-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK0-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload		; CHECK-NEXT: ret
; CHECK0-NEXT: ret
;
; CHECK3-LABEL: doubleword:
; CHECK3: // %bb.0:
; CHECK3-NEXT: stp x30, x21, [sp, #-32]! // 16-byte Folded Spill
; CHECK3-NEXT: // kill: def $w1 killed $w1 def $x1
; CHECK3-NEXT: ubfx x21, x1, #9, #8
; CHECK3-NEXT: stp x20, x19, [sp, #16] // 16-byte Folded Spill
; CHECK3-NEXT: mov x19, x0
; CHECK3-NEXT: ldr x20, [x0, x21, lsl #3]
; CHECK3-NEXT: bl foo
; CHECK3-NEXT: mov x0, x20
; CHECK3-NEXT: str x20, [x19, x21, lsl #3]
; CHECK3-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
; CHECK3-NEXT: ldp x30, x21, [sp], #32 // 16-byte Folded Reload
; CHECK3-NEXT: ret
%shr81 = lshr i32 %xor72, 9		%shr81 = lshr i32 %xor72, 9
%conv82 = zext i32 %shr81 to i64		%conv82 = zext i32 %shr81 to i64
%idxprom83 = and i64 %conv82, 255		%idxprom83 = and i64 %conv82, 255
%arrayidx86 = getelementptr inbounds %struct.c, ptr %ctx, i64 0, i64 %idxprom83		%arrayidx86 = getelementptr inbounds %struct.c, ptr %ctx, i64 0, i64 %idxprom83
%result = load i64, ptr %arrayidx86, align 8		%result = load i64, ptr %arrayidx86, align 8
call void @foo()		call void @foo()
store i64 %result, ptr %arrayidx86, align 8		store i64 %result, ptr %arrayidx86, align 8
ret i64 %result		ret i64 %result
Show All 27 Lines	falsebb:
br i1 %cmp2, label %exitbb, label %endbb		br i1 %cmp2, label %exitbb, label %endbb
exitbb:		exitbb:
ret i64 %mul1		ret i64 %mul1
endbb:		endbb:
ret i64 %mul2		ret i64 %mul2
}		}

define i64 @gep3(ptr %p, i64 %b) {		define i64 @gep3(ptr %p, i64 %b) {
; CHECK0-LABEL: gep3:		; CHECK-LABEL: gep3:
; CHECK0: // %bb.0:		; CHECK: // %bb.0:
; CHECK0-NEXT: lsl x9, x1, #3		; CHECK-NEXT: mov x8, x0
; CHECK0-NEXT: mov x8, x0		; CHECK-NEXT: ldr x0, [x0, x1, lsl #3]
; CHECK0-NEXT: ldr x0, [x0, x9]		; CHECK-NEXT: str x1, [x8, x1, lsl #3]
; CHECK0-NEXT: str x1, [x8, x9]		; CHECK-NEXT: ret
; CHECK0-NEXT: ret
;
; CHECK3-LABEL: gep3:
; CHECK3: // %bb.0:
; CHECK3-NEXT: mov x8, x0
; CHECK3-NEXT: ldr x0, [x0, x1, lsl #3]
; CHECK3-NEXT: str x1, [x8, x1, lsl #3]
; CHECK3-NEXT: ret
%g = getelementptr inbounds i64, ptr %p, i64 %b		%g = getelementptr inbounds i64, ptr %p, i64 %b
%l = load i64, ptr %g		%l = load i64, ptr %g
store i64 %b, ptr %g		store i64 %b, ptr %g
ret i64 %l		ret i64 %l
}		}

define i128 @gep4(ptr %p, i128 %a, i64 %b) {		define i128 @gep4(ptr %p, i128 %a, i64 %b) {
; CHECK-LABEL: gep4:		; CHECK-LABEL: gep4:
Show All 32 Lines
; CHECK-NEXT: eor x0, x9, x8		; CHECK-NEXT: eor x0, x9, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%x = shl i64 %a, 4		%x = shl i64 %a, 4
%y = add i64 %b, %x		%y = add i64 %b, %x
%z = sub i64 %b, %x		%z = sub i64 %b, %x
%r = xor i64 %y, %z		%r = xor i64 %y, %z
ret i64 %r		ret i64 %r
}		}
		;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
		; CHECK0: {{.*}}
		; CHECK3: {{.*}}
		dmgreenUnsubmitted Done Reply Inline Actions These CHECK prefixes can be removed now, if they are all expected to be the same. dmgreen: These CHECK prefixes can be removed now, if they are all expected to be the same.

llvm/test/CodeGen/AArch64/arm64-addr-mode-folding.ll

		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
; RUN: llc -O3 -mtriple arm64-apple-ios3 -aarch64-enable-gep-opt=false %s -o - \| FileCheck %s		; RUN: llc -O3 -mtriple arm64-apple-ios3 -aarch64-enable-gep-opt=false %s -o - \| FileCheck %s
; <rdar://problem/13621857>		; <rdar://problem/13621857>

@block = common global ptr null, align 8		@block = common global ptr null, align 8

define i32 @fct(i32 %i1, i32 %i2) {		define i32 @fct(i32 %i1, i32 %i2) {
; CHECK: @fct		; CHECK-LABEL: fct:
		; CHECK: ; %bb.0: ; %entry
		; CHECK-NEXT: Lloh0:
		; CHECK-NEXT: adrp x10, _block@GOTPAGE
		; CHECK-NEXT: ; kill: def $w1 killed $w1 def $x1
		; CHECK-NEXT: ; kill: def $w0 killed $w0 def $x0
		; CHECK-NEXT: sxtw x8, w0
		; CHECK-NEXT: sxtw x9, w1
		; CHECK-NEXT: Lloh1:
		; CHECK-NEXT: ldr x10, [x10, _block@GOTPAGEOFF]
		; CHECK-NEXT: Lloh2:
		; CHECK-NEXT: ldr x10, [x10]
		; CHECK-NEXT: ldrb w11, [x10, x8]
		; CHECK-NEXT: ldrb w12, [x10, x9]
		; CHECK-NEXT: cmp w11, w12
		; CHECK-NEXT: b.ne LBB0_3
		; CHECK-NEXT: ; %bb.1: ; %if.end
		; CHECK-NEXT: add x8, x8, x10
		; CHECK-NEXT: add x9, x9, x10
		; CHECK-NEXT: ldrb w10, [x8, #1]
		; CHECK-NEXT: ldrb w11, [x9, #1]
		; CHECK-NEXT: cmp w10, w11
		; CHECK-NEXT: b.ne LBB0_3
		; CHECK-NEXT: ; %bb.2: ; %if.end23
		; CHECK-NEXT: ldrb w8, [x8, #2]
		; CHECK-NEXT: ldrb w9, [x9, #2]
		; CHECK-NEXT: cmp w8, w9
		; CHECK-NEXT: mov w8, #1 ; =0x1
		; CHECK-NEXT: cset w9, hi
		; CHECK-NEXT: csel w0, w8, w9, eq
		; CHECK-NEXT: ret
		; CHECK-NEXT: LBB0_3: ; %if.then18
		; CHECK-NEXT: cset w0, hi
		; CHECK-NEXT: ret
		; CHECK-NEXT: .loh AdrpLdrGotLdr Lloh0, Lloh1, Lloh2
; Sign extension is used more than once, thus it should not be folded.		; Sign extension is used more than once, thus it should not be folded.
; CodeGenPrepare is not sharing sext across uses, thus this is folded because		; CodeGenPrepare is not sharing sext across uses, thus this is folded because
; of that.		; of that.
; _CHECK-NOT: , sxtw]		; _CHECK-NOT: , sxtw]
entry:		entry:
%idxprom = sext i32 %i1 to i64		%idxprom = sext i32 %i1 to i64
%0 = load ptr, ptr @block, align 8		%0 = load ptr, ptr @block, align 8
%arrayidx = getelementptr inbounds i8, ptr %0, i64 %idxprom		%arrayidx = getelementptr inbounds i8, ptr %0, i64 %idxprom
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if.then34: ; preds = %if.end23
br label %return		br label %return

return: ; preds = %if.end23, %if.then34, %if.then18, %if.then		return: ; preds = %if.end23, %if.then34, %if.then18, %if.then
%retval.0 = phi i32 [ %conv8, %if.then ], [ %conv22, %if.then18 ], [ %conv38, %if.then34 ], [ 1, %if.end23 ]		%retval.0 = phi i32 [ %conv8, %if.then ], [ %conv22, %if.then18 ], [ %conv38, %if.then34 ], [ 1, %if.end23 ]
ret i32 %retval.0		ret i32 %retval.0
}		}

define i32 @fct1(i32 %i1, i32 %i2) optsize {		define i32 @fct1(i32 %i1, i32 %i2) optsize {
; CHECK: @fct1		; CHECK-LABEL: fct1:
		; CHECK: ; %bb.0: ; %entry
		; CHECK-NEXT: Lloh3:
		; CHECK-NEXT: adrp x8, _block@GOTPAGE
		; CHECK-NEXT: ; kill: def $w1 killed $w1 def $x1
		; CHECK-NEXT: ; kill: def $w0 killed $w0 def $x0
		; CHECK-NEXT: Lloh4:
		; CHECK-NEXT: ldr x8, [x8, _block@GOTPAGEOFF]
		; CHECK-NEXT: Lloh5:
		; CHECK-NEXT: ldr x8, [x8]
		; CHECK-NEXT: ldrb w9, [x8, w0, sxtw]
		; CHECK-NEXT: ldrb w10, [x8, w1, sxtw]
		; CHECK-NEXT: cmp w9, w10
		; CHECK-NEXT: b.ne LBB1_3
		; CHECK-NEXT: ; %bb.1: ; %if.end
		; CHECK-NEXT: sxtw x9, w0
		; CHECK-NEXT: sxtw x10, w1
		; CHECK-NEXT: add x9, x9, x8
		; CHECK-NEXT: add x8, x10, x8
		; CHECK-NEXT: ldrb w10, [x9, #1]
		; CHECK-NEXT: ldrb w11, [x8, #1]
		; CHECK-NEXT: cmp w10, w11
		; CHECK-NEXT: b.ne LBB1_3
		; CHECK-NEXT: ; %bb.2: ; %if.end23
		; CHECK-NEXT: ldrb w9, [x9, #2]
		; CHECK-NEXT: ldrb w8, [x8, #2]
		; CHECK-NEXT: cmp w9, w8
		; CHECK-NEXT: mov w8, #1 ; =0x1
		; CHECK-NEXT: cset w9, hi
		; CHECK-NEXT: csel w0, w8, w9, eq
		; CHECK-NEXT: ret
		; CHECK-NEXT: LBB1_3: ; %if.then
		; CHECK-NEXT: cset w0, hi
		; CHECK-NEXT: ret
		; CHECK-NEXT: .loh AdrpLdrGotLdr Lloh3, Lloh4, Lloh5
; Addressing are folded when optimizing for code size.		; Addressing are folded when optimizing for code size.
; CHECK: , sxtw]
; CHECK: , sxtw]
entry:		entry:
%idxprom = sext i32 %i1 to i64		%idxprom = sext i32 %i1 to i64
%0 = load ptr, ptr @block, align 8		%0 = load ptr, ptr @block, align 8
%arrayidx = getelementptr inbounds i8, ptr %0, i64 %idxprom		%arrayidx = getelementptr inbounds i8, ptr %0, i64 %idxprom
%1 = load i8, ptr %arrayidx, align 1		%1 = load i8, ptr %arrayidx, align 1
%idxprom1 = sext i32 %i2 to i64		%idxprom1 = sext i32 %i2 to i64
%arrayidx2 = getelementptr inbounds i8, ptr %0, i64 %idxprom1		%arrayidx2 = getelementptr inbounds i8, ptr %0, i64 %idxprom1
%2 = load i8, ptr %arrayidx2, align 1		%2 = load i8, ptr %arrayidx2, align 1
Show All 40 Lines	if.then34: ; preds = %if.end23
br label %return		br label %return

return: ; preds = %if.end23, %if.then34, %if.then18, %if.then		return: ; preds = %if.end23, %if.then34, %if.then18, %if.then
%retval.0 = phi i32 [ %conv8, %if.then ], [ %conv22, %if.then18 ], [ %conv38, %if.then34 ], [ 1, %if.end23 ]		%retval.0 = phi i32 [ %conv8, %if.then ], [ %conv22, %if.then18 ], [ %conv38, %if.then34 ], [ 1, %if.end23 ]
ret i32 %retval.0		ret i32 %retval.0
}		}

; CHECK: @test		; CHECK: @test
; CHECK-NOT: , uxtw #2]		; CHECK-NOT: , uxtw #2]
		dmgreenUnsubmitted Done Reply Inline Actions Make sure to remove the old CHECK lines. It is probably worth updating the check lines in a separate patch, so that just the differences can be shown here. dmgreen: Make sure to remove the old CHECK lines. It is probably worth updating the check lines in a…
define i32 @test(ptr %array, i8 zeroext %c, i32 %arg) {		define i32 @test(ptr %array, i8 zeroext %c, i32 %arg) {
		; CHECK-LABEL: test:
		; CHECK: ; %bb.0: ; %entry
		; CHECK-NEXT: cmn w1, w2
		; CHECK-NEXT: b.ne LBB2_2
		; CHECK-NEXT: ; %bb.1:
		; CHECK-NEXT: mov w0, wzr
		; CHECK-NEXT: ret
		; CHECK-NEXT: LBB2_2: ; %if.then
		; CHECK-NEXT: ldr w8, [x0, w1, uxtw #2]
		; CHECK-NEXT: ldr w9, [x0, w1, uxtw #2]
		; CHECK-NEXT: add w0, w9, w8
		; CHECK-NEXT: ret
entry:		entry:
%conv = zext i8 %c to i32		%conv = zext i8 %c to i32
%add = sub i32 0, %arg		%add = sub i32 0, %arg
%tobool = icmp eq i32 %conv, %add		%tobool = icmp eq i32 %conv, %add
br i1 %tobool, label %if.end, label %if.then		br i1 %tobool, label %if.end, label %if.then

if.then: ; preds = %entry		if.then: ; preds = %entry
%idxprom = zext i8 %c to i64		%idxprom = zext i8 %c to i64
%arrayidx = getelementptr inbounds i32, ptr %array, i64 %idxprom		%arrayidx = getelementptr inbounds i32, ptr %array, i64 %idxprom
%0 = load volatile i32, ptr %arrayidx, align 4		%0 = load volatile i32, ptr %arrayidx, align 4
%1 = load volatile i32, ptr %arrayidx, align 4		%1 = load volatile i32, ptr %arrayidx, align 4
%add3 = add nsw i32 %1, %0		%add3 = add nsw i32 %1, %0
br label %if.end		br label %if.end

if.end: ; preds = %entry, %if.then		if.end: ; preds = %entry, %if.then
%res.0 = phi i32 [ %add3, %if.then ], [ 0, %entry ]		%res.0 = phi i32 [ %add3, %if.then ], [ 0, %entry ]
ret i32 %res.0		ret i32 %res.0
}		}


; CHECK: @test2		; CHECK: @test2
; CHECK: , uxtw #2]		; CHECK: , uxtw #2]
; CHECK: , uxtw #2]		; CHECK: , uxtw #2]
define i32 @test2(ptr %array, i8 zeroext %c, i32 %arg) optsize {		define i32 @test2(ptr %array, i8 zeroext %c, i32 %arg) optsize {
		; CHECK-LABEL: test2:
		; CHECK: ; %bb.0: ; %entry
		; CHECK-NEXT: cmn w1, w2
		; CHECK-NEXT: b.ne LBB3_2
		; CHECK-NEXT: ; %bb.1:
		; CHECK-NEXT: mov w0, wzr
		; CHECK-NEXT: ret
		; CHECK-NEXT: LBB3_2: ; %if.then
		; CHECK-NEXT: ldr w8, [x0, w1, uxtw #2]
		; CHECK-NEXT: ldr w9, [x0, w1, uxtw #2]
		; CHECK-NEXT: add w0, w9, w8
		; CHECK-NEXT: ret
entry:		entry:
%conv = zext i8 %c to i32		%conv = zext i8 %c to i32
%add = sub i32 0, %arg		%add = sub i32 0, %arg
%tobool = icmp eq i32 %conv, %add		%tobool = icmp eq i32 %conv, %add
br i1 %tobool, label %if.end, label %if.then		br i1 %tobool, label %if.end, label %if.then

if.then: ; preds = %entry		if.then: ; preds = %entry
%idxprom = zext i8 %c to i64		%idxprom = zext i8 %c to i64
Show All 10 Lines

llvm/test/CodeGen/AArch64/arm64-fold-address.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
	; RUN: llc < %s -O2 -mtriple=arm64-apple-darwin \| FileCheck %s			; RUN: llc < %s -O2 -mtriple=arm64-apple-darwin \| FileCheck %s

	%0 = type opaque			%0 = type opaque
	%struct.CGRect = type { %struct.CGPoint, %struct.CGSize }			%struct.CGRect = type { %struct.CGPoint, %struct.CGSize }
	%struct.CGPoint = type { double, double }			%struct.CGPoint = type { double, double }
	%struct.CGSize = type { double, double }			%struct.CGSize = type { double, double }

	@"OBJC_IVAR_$_UIScreen._bounds" = external hidden global i64, section "__DATA, __objc_ivar", align 8			@"OBJC_IVAR_$_UIScreen._bounds" = external hidden global i64, section "__DATA, __objc_ivar", align 8

	define hidden %struct.CGRect @nofold(ptr nocapture %self, ptr nocapture %_cmd) nounwind readonly optsize ssp {			define hidden %struct.CGRect @nofold(ptr nocapture %self, ptr nocapture %_cmd) nounwind readonly optsize ssp {
	entry:			entry:
	; CHECK-LABEL: nofold:
	; CHECK: add x[[REG:[0-9]+]], x0, x{{[0-9]+}}
	; CHECK: ldp d0, d1, [x[[REG]]]
	; CHECK: ldp d2, d3, [x[[REG]], #16]
	; CHECK: ret
	%ivar = load i64, ptr @"OBJC_IVAR_$_UIScreen._bounds", align 8, !invariant.load !4			%ivar = load i64, ptr @"OBJC_IVAR_$_UIScreen._bounds", align 8, !invariant.load !4
	%add.ptr = getelementptr inbounds i8, ptr %self, i64 %ivar			%add.ptr = getelementptr inbounds i8, ptr %self, i64 %ivar
	%tmp11 = load double, ptr %add.ptr, align 8			%tmp11 = load double, ptr %add.ptr, align 8
	%add.ptr.sum = add i64 %ivar, 8			%add.ptr.sum = add i64 %ivar, 8
	%add.ptr10.1 = getelementptr inbounds i8, ptr %self, i64 %add.ptr.sum			%add.ptr10.1 = getelementptr inbounds i8, ptr %self, i64 %add.ptr.sum
	%tmp12 = load double, ptr %add.ptr10.1, align 8			%tmp12 = load double, ptr %add.ptr10.1, align 8
	%add.ptr.sum17 = add i64 %ivar, 16			%add.ptr.sum17 = add i64 %ivar, 16
	%add.ptr4.1 = getelementptr inbounds i8, ptr %self, i64 %add.ptr.sum17			%add.ptr4.1 = getelementptr inbounds i8, ptr %self, i64 %add.ptr.sum17
	%tmp = load double, ptr %add.ptr4.1, align 8			%tmp = load double, ptr %add.ptr4.1, align 8
	%add.ptr4.1.sum = add i64 %ivar, 24			%add.ptr4.1.sum = add i64 %ivar, 24
	%add.ptr4.1.1 = getelementptr inbounds i8, ptr %self, i64 %add.ptr4.1.sum			%add.ptr4.1.1 = getelementptr inbounds i8, ptr %self, i64 %add.ptr4.1.sum
	%tmp5 = load double, ptr %add.ptr4.1.1, align 8			%tmp5 = load double, ptr %add.ptr4.1.1, align 8
	%insert14 = insertvalue %struct.CGPoint undef, double %tmp11, 0			%insert14 = insertvalue %struct.CGPoint undef, double %tmp11, 0
	%insert16 = insertvalue %struct.CGPoint %insert14, double %tmp12, 1			%insert16 = insertvalue %struct.CGPoint %insert14, double %tmp12, 1
	%insert = insertvalue %struct.CGRect undef, %struct.CGPoint %insert16, 0			%insert = insertvalue %struct.CGRect undef, %struct.CGPoint %insert16, 0
	%insert7 = insertvalue %struct.CGSize undef, double %tmp, 0			%insert7 = insertvalue %struct.CGSize undef, double %tmp, 0
	%insert9 = insertvalue %struct.CGSize %insert7, double %tmp5, 1			%insert9 = insertvalue %struct.CGSize %insert7, double %tmp5, 1
	%insert3 = insertvalue %struct.CGRect %insert, %struct.CGSize %insert9, 1			%insert3 = insertvalue %struct.CGRect %insert, %struct.CGSize %insert9, 1
	ret %struct.CGRect %insert3			ret %struct.CGRect %insert3
	}			}

	define hidden %struct.CGRect @fold(ptr nocapture %self, ptr nocapture %_cmd) nounwind readonly optsize ssp {			define hidden %struct.CGRect @fold(ptr nocapture %self, ptr nocapture %_cmd) nounwind readonly optsize ssp {
	entry:			entry:
	; CHECK-LABEL: fold:
	; CHECK: ldr d0, [x0, x{{[0-9]+}}]
	; CHECK-NOT: add x0, x0, x1
	; CHECK: ret
	%ivar = load i64, ptr @"OBJC_IVAR_$_UIScreen._bounds", align 8, !invariant.load !4			%ivar = load i64, ptr @"OBJC_IVAR_$_UIScreen._bounds", align 8, !invariant.load !4
	%add.ptr = getelementptr inbounds i8, ptr %self, i64 %ivar			%add.ptr = getelementptr inbounds i8, ptr %self, i64 %ivar
	%tmp11 = load double, ptr %add.ptr, align 8			%tmp11 = load double, ptr %add.ptr, align 8
	%add.ptr10.1 = getelementptr inbounds i8, ptr %self, i64 %ivar			%add.ptr10.1 = getelementptr inbounds i8, ptr %self, i64 %ivar
	%tmp12 = load double, ptr %add.ptr10.1, align 8			%tmp12 = load double, ptr %add.ptr10.1, align 8
	%add.ptr4.1 = getelementptr inbounds i8, ptr %self, i64 %ivar			%add.ptr4.1 = getelementptr inbounds i8, ptr %self, i64 %ivar
	%tmp = load double, ptr %add.ptr4.1, align 8			%tmp = load double, ptr %add.ptr4.1, align 8
	%add.ptr4.1.1 = getelementptr inbounds i8, ptr %self, i64 %ivar			%add.ptr4.1.1 = getelementptr inbounds i8, ptr %self, i64 %ivar
	Show All 10 Lines

	!llvm.module.flags = !{!0, !1, !2, !3}			!llvm.module.flags = !{!0, !1, !2, !3}

	!0 = !{i32 1, !"Objective-C Version", i32 2}			!0 = !{i32 1, !"Objective-C Version", i32 2}
	!1 = !{i32 1, !"Objective-C Image Info Version", i32 0}			!1 = !{i32 1, !"Objective-C Image Info Version", i32 0}
	!2 = !{i32 1, !"Objective-C Image Info Section", !"__DATA, __objc_imageinfo, regular, no_dead_strip"}			!2 = !{i32 1, !"Objective-C Image Info Section", !"__DATA, __objc_imageinfo, regular, no_dead_strip"}
	!3 = !{i32 4, !"Objective-C Garbage Collection", i32 0}			!3 = !{i32 4, !"Objective-C Garbage Collection", i32 0}
	!4 = !{}			!4 = !{}
				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				dmgreenUnsubmitted Done Reply Inline Actions This looks like it hasn't been generated properly. It might mean the update script doesn't like the triple. dmgreen: This looks like it hasn't been generated properly. It might mean the update script doesn't like…
				; CHECK: {{.*}}

llvm/test/CodeGen/AArch64/arm64-fold-lsl.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
	; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-eabi -aarch64-neon-syntax=apple \| FileCheck %s
	;			;
	; <rdar://problem/14486451>			; <rdar://problem/14486451>

	%struct.a = type [256 x i16]			%struct.a = type [256 x i16]
	%struct.b = type [256 x i32]			%struct.b = type [256 x i32]
	%struct.c = type [256 x i64]			%struct.c = type [256 x i64]

	define i16 @load_halfword(ptr %ctx, i32 %xor72) nounwind {			define i16 @load_halfword(ptr %ctx, i32 %xor72) nounwind {
	; CHECK-LABEL: load_halfword:			; CHECK-LABEL: load_halfword:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: // %bb.0:
	; CHECK: ldrh w0, [x0, [[REG]], lsl #1]			; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: ubfx x8, x1, #9, #8
				; CHECK-NEXT: ldrh w0, [x0, x8, lsl #1]
				; CHECK-NEXT: ret
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.a, ptr %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.a, ptr %ctx, i64 0, i64 %idxprom83
	%result = load i16, ptr %arrayidx86, align 2			%result = load i16, ptr %arrayidx86, align 2
	ret i16 %result			ret i16 %result
	}			}

	define i32 @load_word(ptr %ctx, i32 %xor72) nounwind {			define i32 @load_word(ptr %ctx, i32 %xor72) nounwind {
	; CHECK-LABEL: load_word:			; CHECK-LABEL: load_word:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: // %bb.0:
	; CHECK: ldr w0, [x0, [[REG]], lsl #2]			; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: ubfx x8, x1, #9, #8
				; CHECK-NEXT: ldr w0, [x0, x8, lsl #2]
				; CHECK-NEXT: ret
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.b, ptr %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.b, ptr %ctx, i64 0, i64 %idxprom83
	%result = load i32, ptr %arrayidx86, align 4			%result = load i32, ptr %arrayidx86, align 4
	ret i32 %result			ret i32 %result
	}			}

	define i64 @load_doubleword(ptr %ctx, i32 %xor72) nounwind {			define i64 @load_doubleword(ptr %ctx, i32 %xor72) nounwind {
	; CHECK-LABEL: load_doubleword:			; CHECK-LABEL: load_doubleword:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: // %bb.0:
	; CHECK: ldr x0, [x0, [[REG]], lsl #3]			; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: ubfx x8, x1, #9, #8
				; CHECK-NEXT: ldr x0, [x0, x8, lsl #3]
				; CHECK-NEXT: ret
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.c, ptr %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.c, ptr %ctx, i64 0, i64 %idxprom83
	%result = load i64, ptr %arrayidx86, align 8			%result = load i64, ptr %arrayidx86, align 8
	ret i64 %result			ret i64 %result
	}			}

	define void @store_halfword(ptr %ctx, i32 %xor72, i16 %val) nounwind {			define void @store_halfword(ptr %ctx, i32 %xor72, i16 %val) nounwind {
	; CHECK-LABEL: store_halfword:			; CHECK-LABEL: store_halfword:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: // %bb.0:
	; CHECK: strh w2, [x0, [[REG]], lsl #1]			; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: ubfx x8, x1, #9, #8
				; CHECK-NEXT: strh w2, [x0, x8, lsl #1]
				; CHECK-NEXT: ret
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.a, ptr %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.a, ptr %ctx, i64 0, i64 %idxprom83
	store i16 %val, ptr %arrayidx86, align 8			store i16 %val, ptr %arrayidx86, align 8
	ret void			ret void
	}			}

	define void @store_word(ptr %ctx, i32 %xor72, i32 %val) nounwind {			define void @store_word(ptr %ctx, i32 %xor72, i32 %val) nounwind {
	; CHECK-LABEL: store_word:			; CHECK-LABEL: store_word:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: // %bb.0:
	; CHECK: str w2, [x0, [[REG]], lsl #2]			; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: ubfx x8, x1, #9, #8
				; CHECK-NEXT: str w2, [x0, x8, lsl #2]
				; CHECK-NEXT: ret
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.b, ptr %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.b, ptr %ctx, i64 0, i64 %idxprom83
	store i32 %val, ptr %arrayidx86, align 8			store i32 %val, ptr %arrayidx86, align 8
	ret void			ret void
	}			}

	define void @store_doubleword(ptr %ctx, i32 %xor72, i64 %val) nounwind {			define void @store_doubleword(ptr %ctx, i32 %xor72, i64 %val) nounwind {
	; CHECK-LABEL: store_doubleword:			; CHECK-LABEL: store_doubleword:
	; CHECK: ubfx [[REG:x[0-9]+]], x1, #9, #8			; CHECK: // %bb.0:
	; CHECK: str x2, [x0, [[REG]], lsl #3]			; CHECK-NEXT: // kill: def $w1 killed $w1 def $x1
				; CHECK-NEXT: ubfx x8, x1, #9, #8
				; CHECK-NEXT: str x2, [x0, x8, lsl #3]
				; CHECK-NEXT: ret
	%shr81 = lshr i32 %xor72, 9			%shr81 = lshr i32 %xor72, 9
	%conv82 = zext i32 %shr81 to i64			%conv82 = zext i32 %shr81 to i64
	%idxprom83 = and i64 %conv82, 255			%idxprom83 = and i64 %conv82, 255
	%arrayidx86 = getelementptr inbounds %struct.c, ptr %ctx, i64 0, i64 %idxprom83			%arrayidx86 = getelementptr inbounds %struct.c, ptr %ctx, i64 0, i64 %idxprom83
	store i64 %val, ptr %arrayidx86, align 8			store i64 %val, ptr %arrayidx86, align 8
	ret void			ret void
	}			}

	; Check that we combine a shift into the offset instead of using a narrower load			; Check that we combine a shift into the offset instead of using a narrower load
	; when we have a load followed by a trunc			; when we have a load followed by a trunc

	define i32 @load_doubleword_trunc_word(ptr %ptr, i64 %off) {			define i32 @load_doubleword_trunc_word(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_word:			; CHECK-LABEL: load_doubleword_trunc_word:
	; CHECK: ldr x0, [x0, x1, lsl #3]			; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr x0, [x0, x1, lsl #3]
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i32			%trunc = trunc i64 %x to i32
	ret i32 %trunc			ret i32 %trunc
	}			}

	define i16 @load_doubleword_trunc_halfword(ptr %ptr, i64 %off) {			define i16 @load_doubleword_trunc_halfword(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_halfword:			; CHECK-LABEL: load_doubleword_trunc_halfword:
	; CHECK: ldr x0, [x0, x1, lsl #3]			; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr x0, [x0, x1, lsl #3]
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i16			%trunc = trunc i64 %x to i16
	ret i16 %trunc			ret i16 %trunc
	}			}

	define i8 @load_doubleword_trunc_byte(ptr %ptr, i64 %off) {			define i8 @load_doubleword_trunc_byte(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_byte:			; CHECK-LABEL: load_doubleword_trunc_byte:
	; CHECK: ldr x0, [x0, x1, lsl #3]			; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr x0, [x0, x1, lsl #3]
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i8			%trunc = trunc i64 %x to i8
	ret i8 %trunc			ret i8 %trunc
	}			}

	define i16 @load_word_trunc_halfword(ptr %ptr, i64 %off) {			define i16 @load_word_trunc_halfword(ptr %ptr, i64 %off) {
	entry:
	; CHECK-LABEL: load_word_trunc_halfword:			; CHECK-LABEL: load_word_trunc_halfword:
	; CHECK: ldr w0, [x0, x1, lsl #2]			; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr w0, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
				entry:
	%idx = getelementptr inbounds i32, ptr %ptr, i64 %off			%idx = getelementptr inbounds i32, ptr %ptr, i64 %off
	%x = load i32, ptr %idx, align 8			%x = load i32, ptr %idx, align 8
	%trunc = trunc i32 %x to i16			%trunc = trunc i32 %x to i16
	ret i16 %trunc			ret i16 %trunc
	}			}

	define i8 @load_word_trunc_byte(ptr %ptr, i64 %off) {			define i8 @load_word_trunc_byte(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_word_trunc_byte:			; CHECK-LABEL: load_word_trunc_byte:
	; CHECK: ldr w0, [x0, x1, lsl #2]			; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldr w0, [x0, x1, lsl #2]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i32, ptr %ptr, i64 %off			%idx = getelementptr inbounds i32, ptr %ptr, i64 %off
	%x = load i32, ptr %idx, align 8			%x = load i32, ptr %idx, align 8
	%trunc = trunc i32 %x to i8			%trunc = trunc i32 %x to i8
	ret i8 %trunc			ret i8 %trunc
	}			}

	define i8 @load_halfword_trunc_byte(ptr %ptr, i64 %off) {			define i8 @load_halfword_trunc_byte(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_halfword_trunc_byte:			; CHECK-LABEL: load_halfword_trunc_byte:
	; CHECK: ldrh w0, [x0, x1, lsl #1]			; CHECK: // %bb.0: // %entry
				; CHECK-NEXT: ldrh w0, [x0, x1, lsl #1]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i16, ptr %ptr, i64 %off			%idx = getelementptr inbounds i16, ptr %ptr, i64 %off
	%x = load i16, ptr %idx, align 8			%x = load i16, ptr %idx, align 8
	%trunc = trunc i16 %x to i8			%trunc = trunc i16 %x to i8
	ret i8 %trunc			ret i8 %trunc
	}			}

	; Check that we do use a narrower load, and so don't combine the shift, when			; Check that we do use a narrower load, and so don't combine the shift, when
	; the loaded value is zero-extended.			; the loaded value is zero-extended.

	define i64 @load_doubleword_trunc_word_zext(ptr %ptr, i64 %off) {			define i64 @load_doubleword_trunc_word_zext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_word_zext:			; CHECK-LABEL: load_doubleword_trunc_word_zext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldr w0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #3
				; CHECK-NEXT: ldr w0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i32			%trunc = trunc i64 %x to i32
	%ext = zext i32 %trunc to i64			%ext = zext i32 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_doubleword_trunc_halfword_zext(ptr %ptr, i64 %off) {			define i64 @load_doubleword_trunc_halfword_zext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_halfword_zext:			; CHECK-LABEL: load_doubleword_trunc_halfword_zext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldrh w0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #3
				; CHECK-NEXT: ldrh w0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i16			%trunc = trunc i64 %x to i16
	%ext = zext i16 %trunc to i64			%ext = zext i16 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_doubleword_trunc_byte_zext(ptr %ptr, i64 %off) {			define i64 @load_doubleword_trunc_byte_zext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_byte_zext:			; CHECK-LABEL: load_doubleword_trunc_byte_zext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldrb w0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #3
				; CHECK-NEXT: ldrb w0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i8			%trunc = trunc i64 %x to i8
	%ext = zext i8 %trunc to i64			%ext = zext i8 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_word_trunc_halfword_zext(ptr %ptr, i64 %off) {			define i64 @load_word_trunc_halfword_zext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_word_trunc_halfword_zext:			; CHECK-LABEL: load_word_trunc_halfword_zext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #2			; CHECK: // %bb.0: // %entry
	; CHECK: ldrh w0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #2
				; CHECK-NEXT: ldrh w0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i32, ptr %ptr, i64 %off			%idx = getelementptr inbounds i32, ptr %ptr, i64 %off
	%x = load i32, ptr %idx, align 8			%x = load i32, ptr %idx, align 8
	%trunc = trunc i32 %x to i16			%trunc = trunc i32 %x to i16
	%ext = zext i16 %trunc to i64			%ext = zext i16 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_word_trunc_byte_zext(ptr %ptr, i64 %off) {			define i64 @load_word_trunc_byte_zext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_word_trunc_byte_zext:			; CHECK-LABEL: load_word_trunc_byte_zext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #2			; CHECK: // %bb.0: // %entry
	; CHECK: ldrb w0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #2
				; CHECK-NEXT: ldrb w0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i32, ptr %ptr, i64 %off			%idx = getelementptr inbounds i32, ptr %ptr, i64 %off
	%x = load i32, ptr %idx, align 8			%x = load i32, ptr %idx, align 8
	%trunc = trunc i32 %x to i8			%trunc = trunc i32 %x to i8
	%ext = zext i8 %trunc to i64			%ext = zext i8 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_halfword_trunc_byte_zext(ptr %ptr, i64 %off) {			define i64 @load_halfword_trunc_byte_zext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_halfword_trunc_byte_zext:			; CHECK-LABEL: load_halfword_trunc_byte_zext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #1			; CHECK: // %bb.0: // %entry
	; CHECK: ldrb w0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #1
				; CHECK-NEXT: ldrb w0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i16, ptr %ptr, i64 %off			%idx = getelementptr inbounds i16, ptr %ptr, i64 %off
	%x = load i16, ptr %idx, align 8			%x = load i16, ptr %idx, align 8
	%trunc = trunc i16 %x to i8			%trunc = trunc i16 %x to i8
	%ext = zext i8 %trunc to i64			%ext = zext i8 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	; Check that we do use a narrower load, and so don't combine the shift, when			; Check that we do use a narrower load, and so don't combine the shift, when
	; the loaded value is sign-extended.			; the loaded value is sign-extended.

	define i64 @load_doubleword_trunc_word_sext(ptr %ptr, i64 %off) {			define i64 @load_doubleword_trunc_word_sext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_word_sext:			; CHECK-LABEL: load_doubleword_trunc_word_sext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldrsw x0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #3
				; CHECK-NEXT: ldrsw x0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i32			%trunc = trunc i64 %x to i32
	%ext = sext i32 %trunc to i64			%ext = sext i32 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_doubleword_trunc_halfword_sext(ptr %ptr, i64 %off) {			define i64 @load_doubleword_trunc_halfword_sext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_halfword_sext:			; CHECK-LABEL: load_doubleword_trunc_halfword_sext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldrsh x0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #3
				; CHECK-NEXT: ldrsh x0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i16			%trunc = trunc i64 %x to i16
	%ext = sext i16 %trunc to i64			%ext = sext i16 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_doubleword_trunc_byte_sext(ptr %ptr, i64 %off) {			define i64 @load_doubleword_trunc_byte_sext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_byte_sext:			; CHECK-LABEL: load_doubleword_trunc_byte_sext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldrsb x0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #3
				; CHECK-NEXT: ldrsb x0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i8			%trunc = trunc i64 %x to i8
	%ext = sext i8 %trunc to i64			%ext = sext i8 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_word_trunc_halfword_sext(ptr %ptr, i64 %off) {			define i64 @load_word_trunc_halfword_sext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_word_trunc_halfword_sext:			; CHECK-LABEL: load_word_trunc_halfword_sext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #2			; CHECK: // %bb.0: // %entry
	; CHECK: ldrsh x0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #2
				; CHECK-NEXT: ldrsh x0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i32, ptr %ptr, i64 %off			%idx = getelementptr inbounds i32, ptr %ptr, i64 %off
	%x = load i32, ptr %idx, align 8			%x = load i32, ptr %idx, align 8
	%trunc = trunc i32 %x to i16			%trunc = trunc i32 %x to i16
	%ext = sext i16 %trunc to i64			%ext = sext i16 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_word_trunc_byte_sext(ptr %ptr, i64 %off) {			define i64 @load_word_trunc_byte_sext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_word_trunc_byte_sext:			; CHECK-LABEL: load_word_trunc_byte_sext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #2			; CHECK: // %bb.0: // %entry
	; CHECK: ldrsb x0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #2
				; CHECK-NEXT: ldrsb x0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i32, ptr %ptr, i64 %off			%idx = getelementptr inbounds i32, ptr %ptr, i64 %off
	%x = load i32, ptr %idx, align 8			%x = load i32, ptr %idx, align 8
	%trunc = trunc i32 %x to i8			%trunc = trunc i32 %x to i8
	%ext = sext i8 %trunc to i64			%ext = sext i8 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @load_halfword_trunc_byte_sext(ptr %ptr, i64 %off) {			define i64 @load_halfword_trunc_byte_sext(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_halfword_trunc_byte_sext:			; CHECK-LABEL: load_halfword_trunc_byte_sext:
	; CHECK: lsl [[REG:x[0-9]+]], x1, #1			; CHECK: // %bb.0: // %entry
	; CHECK: ldrsb x0, [x0, [[REG]]]			; CHECK-NEXT: lsl x8, x1, #1
				; CHECK-NEXT: ldrsb x0, [x0, x8]
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i16, ptr %ptr, i64 %off			%idx = getelementptr inbounds i16, ptr %ptr, i64 %off
	%x = load i16, ptr %idx, align 8			%x = load i16, ptr %idx, align 8
	%trunc = trunc i16 %x to i8			%trunc = trunc i16 %x to i8
	%ext = sext i8 %trunc to i64			%ext = sext i8 %trunc to i64
	ret i64 %ext			ret i64 %ext
	}			}

	; Check that we don't combine the shift, and so will use a narrower load, when			; Check that we don't combine the shift, and so will use a narrower load, when
	; the shift is used more than once.			; the shift is used more than once.

	define i32 @load_doubleword_trunc_word_reuse_shift(ptr %ptr, i64 %off) {			define i32 @load_doubleword_trunc_word_reuse_shift(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_word_reuse_shift:			; CHECK-LABEL: load_doubleword_trunc_word_reuse_shift:
	; CHECK: lsl x[[REG1:[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldr w[[REG2:[0-9]+]], [x0, x[[REG1]]]			; CHECK-NEXT: lsl x8, x1, #3
	; CHECK: add w0, w[[REG2]], w[[REG1]]			; CHECK-NEXT: ldr w9, [x0, x8]
				; CHECK-NEXT: add w0, w9, w8
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i32			%trunc = trunc i64 %x to i32
	%lsl = shl i64 %off, 3			%lsl = shl i64 %off, 3
	%lsl.trunc = trunc i64 %lsl to i32			%lsl.trunc = trunc i64 %lsl to i32
	%add = add i32 %trunc, %lsl.trunc			%add = add i32 %trunc, %lsl.trunc
	ret i32 %add			ret i32 %add
	}			}

	define i16 @load_doubleword_trunc_halfword_reuse_shift(ptr %ptr, i64 %off) {			define i16 @load_doubleword_trunc_halfword_reuse_shift(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_halfword_reuse_shift:			; CHECK-LABEL: load_doubleword_trunc_halfword_reuse_shift:
	; CHECK: lsl x[[REG1:[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldrh w[[REG2:[0-9]+]], [x0, x[[REG1]]]			; CHECK-NEXT: lsl x8, x1, #3
	; CHECK: add w0, w[[REG2]], w[[REG1]]			; CHECK-NEXT: ldrh w9, [x0, x8]
				; CHECK-NEXT: add w0, w9, w8
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i16			%trunc = trunc i64 %x to i16
	%lsl = shl i64 %off, 3			%lsl = shl i64 %off, 3
	%lsl.trunc = trunc i64 %lsl to i16			%lsl.trunc = trunc i64 %lsl to i16
	%add = add i16 %trunc, %lsl.trunc			%add = add i16 %trunc, %lsl.trunc
	ret i16 %add			ret i16 %add
	}			}

	define i8 @load_doubleword_trunc_byte_reuse_shift(ptr %ptr, i64 %off) {			define i8 @load_doubleword_trunc_byte_reuse_shift(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_doubleword_trunc_byte_reuse_shift:			; CHECK-LABEL: load_doubleword_trunc_byte_reuse_shift:
	; CHECK: lsl x[[REG1:[0-9]+]], x1, #3			; CHECK: // %bb.0: // %entry
	; CHECK: ldrb w[[REG2:[0-9]+]], [x0, x[[REG1]]]			; CHECK-NEXT: lsl x8, x1, #3
	; CHECK: add w0, w[[REG2]], w[[REG1]]			; CHECK-NEXT: ldrb w9, [x0, x8]
				; CHECK-NEXT: add w0, w9, w8
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i64, ptr %ptr, i64 %off			%idx = getelementptr inbounds i64, ptr %ptr, i64 %off
	%x = load i64, ptr %idx, align 8			%x = load i64, ptr %idx, align 8
	%trunc = trunc i64 %x to i8			%trunc = trunc i64 %x to i8
	%lsl = shl i64 %off, 3			%lsl = shl i64 %off, 3
	%lsl.trunc = trunc i64 %lsl to i8			%lsl.trunc = trunc i64 %lsl to i8
	%add = add i8 %trunc, %lsl.trunc			%add = add i8 %trunc, %lsl.trunc
	ret i8 %add			ret i8 %add
	}			}

	define i16 @load_word_trunc_halfword_reuse_shift(ptr %ptr, i64 %off) {			define i16 @load_word_trunc_halfword_reuse_shift(ptr %ptr, i64 %off) {
	entry:
	; CHECK-LABEL: load_word_trunc_halfword_reuse_shift:			; CHECK-LABEL: load_word_trunc_halfword_reuse_shift:
	; CHECK: lsl x[[REG1:[0-9]+]], x1, #2			; CHECK: // %bb.0: // %entry
	; CHECK: ldrh w[[REG2:[0-9]+]], [x0, x[[REG1]]]			; CHECK-NEXT: lsl x8, x1, #2
	; CHECK: add w0, w[[REG2]], w[[REG1]]			; CHECK-NEXT: ldrh w9, [x0, x8]
				; CHECK-NEXT: add w0, w9, w8
				; CHECK-NEXT: ret
				entry:
	%idx = getelementptr inbounds i32, ptr %ptr, i64 %off			%idx = getelementptr inbounds i32, ptr %ptr, i64 %off
	%x = load i32, ptr %idx, align 8			%x = load i32, ptr %idx, align 8
	%trunc = trunc i32 %x to i16			%trunc = trunc i32 %x to i16
	%lsl = shl i64 %off, 2			%lsl = shl i64 %off, 2
	%lsl.trunc = trunc i64 %lsl to i16			%lsl.trunc = trunc i64 %lsl to i16
	%add = add i16 %trunc, %lsl.trunc			%add = add i16 %trunc, %lsl.trunc
	ret i16 %add			ret i16 %add
	}			}

	define i8 @load_word_trunc_byte_reuse_shift(ptr %ptr, i64 %off) {			define i8 @load_word_trunc_byte_reuse_shift(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_word_trunc_byte_reuse_shift:			; CHECK-LABEL: load_word_trunc_byte_reuse_shift:
	; CHECK: lsl x[[REG1:[0-9]+]], x1, #2			; CHECK: // %bb.0: // %entry
	; CHECK: ldrb w[[REG2:[0-9]+]], [x0, x[[REG1]]]			; CHECK-NEXT: lsl x8, x1, #2
	; CHECK: add w0, w[[REG2]], w[[REG1]]			; CHECK-NEXT: ldrb w9, [x0, x8]
				; CHECK-NEXT: add w0, w9, w8
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i32, ptr %ptr, i64 %off			%idx = getelementptr inbounds i32, ptr %ptr, i64 %off
	%x = load i32, ptr %idx, align 8			%x = load i32, ptr %idx, align 8
	%trunc = trunc i32 %x to i8			%trunc = trunc i32 %x to i8
	%lsl = shl i64 %off, 2			%lsl = shl i64 %off, 2
	%lsl.trunc = trunc i64 %lsl to i8			%lsl.trunc = trunc i64 %lsl to i8
	%add = add i8 %trunc, %lsl.trunc			%add = add i8 %trunc, %lsl.trunc
	ret i8 %add			ret i8 %add
	}			}

	define i8 @load_halfword_trunc_byte_reuse_shift(ptr %ptr, i64 %off) {			define i8 @load_halfword_trunc_byte_reuse_shift(ptr %ptr, i64 %off) {
	; CHECK-LABEL: load_halfword_trunc_byte_reuse_shift:			; CHECK-LABEL: load_halfword_trunc_byte_reuse_shift:
	; CHECK: lsl x[[REG1:[0-9]+]], x1, #1			; CHECK: // %bb.0: // %entry
	; CHECK: ldrb w[[REG2:[0-9]+]], [x0, x[[REG1]]]			; CHECK-NEXT: lsl x8, x1, #1
	; CHECK: add w0, w[[REG2]], w[[REG1]]			; CHECK-NEXT: ldrb w9, [x0, x8]
				; CHECK-NEXT: add w0, w9, w8
				; CHECK-NEXT: ret
	entry:			entry:
	%idx = getelementptr inbounds i16, ptr %ptr, i64 %off			%idx = getelementptr inbounds i16, ptr %ptr, i64 %off
	%x = load i16, ptr %idx, align 8			%x = load i16, ptr %idx, align 8
	%trunc = trunc i16 %x to i8			%trunc = trunc i16 %x to i8
	%lsl = shl i64 %off, 1			%lsl = shl i64 %off, 1
	%lsl.trunc = trunc i64 %lsl to i8			%lsl.trunc = trunc i64 %lsl to i8
	%add = add i8 %trunc, %lsl.trunc			%add = add i8 %trunc, %lsl.trunc
	ret i8 %add			ret i8 %add
	}			}