This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Only use writeback in the load/store optimizer when needed
AbandonedPublic

Authored by john.brawn on Dec 5 2017, 3:44 AM.

Download Raw Diff

Details

Reviewers

evandro
junbuml
fhahn
MatzeB
mcrosier

Summary

Currently the load/store optimizer always uses writeback when merging an add with a load or store, even if it isn't necessary, which prevents optimization in cases where a load destination is the same register as the base. This patch makes it only use writeback when necessary, allowing us to optimise such cases.

Diff Detail

Repository: rL LLVM

Event Timeline

john.brawn created this revision.Dec 5 2017, 3:44 AM

Herald added subscribers: kristof.beyls, javed.absar, rengolin, aemerson. · View Herald TranscriptDec 5 2017, 3:44 AM

junbuml added inline comments.Dec 5 2017, 7:14 AM

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1346	I'm not sure if it's safe enough to use the kill marker here? Is this information still valid for us to rely on?

junbuml added a reviewer: MatzeB.Dec 5 2017, 7:15 AM

john.brawn added inline comments.Dec 5 2017, 8:07 AM

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1346	The load/store optimizer in the ARM backend relies on it, so I would assume so. After some searching around I found TargetRegisterInfo::trackLivenessAfterRegAlloc which the AArch64 backend does return true for, and I couldn't find anything else that we would need to do to make sure it's valid.

gberry added a subscriber: gberry.Dec 5 2017, 10:59 AM

gberry added inline comments.

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp
1346	The kill markers are slowly being phased out. They should always be correct, but they are conservative (e.g. a lot of passes just delete them in regions of code they transform since that is the easiest way to keep them correct) and will become more so over time.

Yes please don't add new code that relies on kill flags, you will see them used less and less, your code will be less and less effective.

Post-RA you should perform all your liveness query needs with LiveRegUnits (or LivePhysRegs). They allow you to start from the end of a basic block where we have live-out information (it's actually live-in information of the successors) and simulating liveness while walking backwards to the point you are interested in. You have to be a bit careful about the performance implications (ideally you change your highlevel algorithms to work backwards through basic blocks as well so you only simulate once for all transformations you do in a basic block rather than simulating anew for every single transformation you perform).

mcrosier resigned from this revision.Mar 5 2018, 8:31 AM

I'm not sure what is gained by not performing this optimization, if I understood the gist of it from the new test case below. For even if the register is killed, the pointer adjustment is folded into the load or store and an instruction is eliminated. What other optimizations are expected to be happen should this patch be applied?

In D40831#1027276, @evandro wrote:

I'm not sure what is gained by not performing this optimization, if I understood the gist of it from the new test case below. For even if the register is killed, the pointer adjustment is folded into the load or store and an instruction is eliminated. What other optimizations are expected to be happen should this patch be applied?

I'm not quite sure what you're asking here? If you mean "what is gained by not using writeback in cases where we currently don't use writeback", then the answer is that we don't gain anything, but by not using writeback we can optimise cases that can't currently be optimised because the use of writeback would be invalid. Though the title and summary don't really convey that very well, perhaps "Allow more load/store optimization by not using writeback" or something would be better.

I'm not currently planning on working on this patch further at the moment though (but I may get back to it some time in the future), as changing to not using kill flags looks like it would take some work and I currently have other priorities.

Abandoning this old patch.

Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2020, 5:13 AM

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64LoadStoreOptimizer.cpp

55 lines

test/

CodeGen/

AArch64/

ldst-opt.ll

44 lines

ldst-opt.mir

30 lines

Diff 125492

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Show First 20 Lines • Show All 1,336 Lines • ▼ Show 20 Lines	if (++NextI == Update)
++NextI;		++NextI;

int Value = Update->getOperand(2).getImm();		int Value = Update->getOperand(2).getImm();
assert(AArch64_AM::getShiftValue(Update->getOperand(3).getImm()) == 0 &&		assert(AArch64_AM::getShiftValue(Update->getOperand(3).getImm()) == 0 &&
"Can't merge 1 << 12 offset into pre-/post-indexed load / store");		"Can't merge 1 << 12 offset into pre-/post-indexed load / store");
if (Update->getOpcode() == AArch64::SUBXri)		if (Update->getOpcode() == AArch64::SUBXri)
Value = -Value;		Value = -Value;

unsigned NewOpc = IsPreIdx ? getPreIndexedOpcode(I->getOpcode())		// We need to use writeback only when the base register is used afterwards.
		bool UseWriteback = !getLdStBaseOp(*I).isKill();
		junbumlUnsubmitted Not Done Reply Inline Actions I'm not sure if it's safe enough to use the kill marker here? Is this information still valid for us to rely on? junbuml: I'm not sure if it's safe enough to use the kill marker here? Is this information still valid…
		john.brawnAuthorUnsubmitted Not Done Reply Inline Actions The load/store optimizer in the ARM backend relies on it, so I would assume so. After some searching around I found TargetRegisterInfo::trackLivenessAfterRegAlloc which the AArch64 backend does return true for, and I couldn't find anything else that we would need to do to make sure it's valid. john.brawn: The load/store optimizer in the ARM backend relies on it, so I would assume so. After some…
		gberryUnsubmitted Not Done Reply Inline Actions The kill markers are slowly being phased out. They should always be correct, but they are conservative (e.g. a lot of passes just delete them in regions of code they transform since that is the easiest way to keep them correct) and will become more so over time. gberry: The kill markers are slowly being phased out. They should always be correct, but they are…
		unsigned NewOpc;
		if (UseWriteback) {
		NewOpc = IsPreIdx ? getPreIndexedOpcode(I->getOpcode())
: getPostIndexedOpcode(I->getOpcode());		: getPostIndexedOpcode(I->getOpcode());
MachineInstrBuilder MIB;
if (!isPairedLdSt(*I)) {
// Non-paired instruction.
MIB = BuildMI(*I->getParent(), I, I->getDebugLoc(), TII->get(NewOpc))
.add(getLdStRegOp(*Update))
.add(getLdStRegOp(*I))
.add(getLdStBaseOp(*I))
.addImm(Value)
.setMemRefs(I->memoperands_begin(), I->memoperands_end());
} else {		} else {
// Paired instruction.		assert(IsPreIdx);
int Scale = getMemScale(*I);		NewOpc = I->getOpcode();
MIB = BuildMI(*I->getParent(), I, I->getDebugLoc(), TII->get(NewOpc))		}
.add(getLdStRegOp(*Update))		int Scale = (!UseWriteback \|\| isPairedLdSt(I)) ? getMemScale(I) : 1;
.add(getLdStRegOp(*I, 0))		MachineInstrBuilder MIB = BuildMI(*I->getParent(), I, I->getDebugLoc(),
.add(getLdStRegOp(*I, 1))		TII->get(NewOpc));
.add(getLdStBaseOp(*I))		if (UseWriteback)
		MIB.add(getLdStRegOp(*Update));
		if (isPairedLdSt(*I))
		MIB.add(getLdStRegOp(I, 0)).add(getLdStRegOp(I, 1));
		else
		MIB.add(getLdStRegOp(*I));
		MIB.add(getLdStBaseOp(*I))
.addImm(Value / Scale)		.addImm(Value / Scale)
.setMemRefs(I->memoperands_begin(), I->memoperands_end());		.setMemRefs(I->memoperands_begin(), I->memoperands_end());
}
(void)MIB;		(void)MIB;

if (IsPreIdx) {		if (IsPreIdx) {
++NumPreFolded;		++NumPreFolded;
DEBUG(dbgs() << "Creating pre-indexed load/store.");		DEBUG(dbgs() << "Creating pre-indexed load/store.");
} else {		} else {
++NumPostFolded;		++NumPostFolded;
DEBUG(dbgs() << "Creating post-indexed load/store.");		DEBUG(dbgs() << "Creating post-indexed load/store.");
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	MachineBasicBlock::iterator AArch64LoadStoreOpt::findMatchingUpdateInsnBackward(
MachineBasicBlock::iterator I, unsigned Limit) {		MachineBasicBlock::iterator I, unsigned Limit) {
MachineBasicBlock::iterator B = I->getParent()->begin();		MachineBasicBlock::iterator B = I->getParent()->begin();
MachineBasicBlock::iterator E = I->getParent()->end();		MachineBasicBlock::iterator E = I->getParent()->end();
MachineInstr &MemMI = *I;		MachineInstr &MemMI = *I;
MachineBasicBlock::iterator MBBI = I;		MachineBasicBlock::iterator MBBI = I;

unsigned BaseReg = getLdStBaseOp(MemMI).getReg();		unsigned BaseReg = getLdStBaseOp(MemMI).getReg();
int Offset = getLdStOffsetOp(MemMI).getImm();		int Offset = getLdStOffsetOp(MemMI).getImm();
		bool UseWriteback = !getLdStBaseOp(MemMI).isKill();

// If the load/store is the first instruction in the block, there's obviously		// If the load/store is the first instruction in the block, there's obviously
// not any matching update. Ditto if the memory offset isn't zero.		// not any matching update. Ditto if the memory offset isn't zero.
if (MBBI == B \|\| Offset != 0)		if (MBBI == B \|\| Offset != 0)
return E;		return E;
// If the base register overlaps a destination register, we can't		// If the base register overlaps a destination register, and we need to use
// merge the update.		// writeback, then we can't merge the update.
bool IsPairedInsn = isPairedLdSt(MemMI);		bool IsPairedInsn = isPairedLdSt(MemMI);
		if (UseWriteback) {
for (unsigned i = 0, e = IsPairedInsn ? 2 : 1; i != e; ++i) {		for (unsigned i = 0, e = IsPairedInsn ? 2 : 1; i != e; ++i) {
unsigned DestReg = getLdStRegOp(MemMI, i).getReg();		unsigned DestReg = getLdStRegOp(MemMI, i).getReg();
if (DestReg == BaseReg \|\| TRI->isSubRegister(BaseReg, DestReg))		if (DestReg == BaseReg \|\| TRI->isSubRegister(BaseReg, DestReg))
return E;		return E;
}		}
		}

// Track which registers have been modified and used between the first insn		// Track which registers have been modified and used between the first insn
// (inclusive) and the second insn.		// (inclusive) and the second insn.
ModifiedRegs.reset();		ModifiedRegs.reset();
UsedRegs.reset();		UsedRegs.reset();
unsigned Count = 0;		unsigned Count = 0;
do {		do {
--MBBI;		--MBBI;
▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

test/CodeGen/AArch64/ldst-opt.ll

	Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines
	}			}

	; Check the following transform:			; Check the following transform:
	;			;
	; add x8, x8, #16			; add x8, x8, #16
	; ...			; ...
	; ldr X, [x8]			; ldr X, [x8]
	; ->			; ->
	; ldr X, [x8, #16]!			; ldr X, [x8, #16]
	;			;
	; with X being either w0, x0, s0, d0 or q0.			; with X being either w0, x0, s0, d0 or q0.

	%pre.struct.i32 = type { i32, i32, i32, i32, i32}			%pre.struct.i32 = type { i32, i32, i32, i32, i32}
	%pre.struct.i64 = type { i32, i64, i64, i64, i64}			%pre.struct.i64 = type { i32, i64, i64, i64, i64}
	%pre.struct.i128 = type { i32, <2 x i64>, <2 x i64>, <2 x i64>}			%pre.struct.i128 = type { i32, <2 x i64>, <2 x i64>, <2 x i64>}
	%pre.struct.float = type { i32, float, float, float}			%pre.struct.float = type { i32, float, float, float}
	%pre.struct.double = type { i32, double, double, double}			%pre.struct.double = type { i32, double, double, double}

	define i32 @load-pre-indexed-word2(%pre.struct.i32** %this, i1 %cond,			define i32 @load-pre-indexed-word2(%pre.struct.i32** %this, i1 %cond,
	%pre.struct.i32* %load2) nounwind {			%pre.struct.i32* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-word2			; CHECK-LABEL: load-pre-indexed-word2
	; CHECK: ldr w{{[0-9]+}}, [x{{[0-9]+}}, #4]!			; CHECK: ldr w{{[0-9]+}}, [x{{[0-9]+}}, #4]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i32, %pre.struct.i32* %this			%load1 = load %pre.struct.i32, %pre.struct.i32* %this
	%gep1 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi i32* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi i32* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load i32, i32* %retptr			%ret = load i32, i32* %retptr
	ret i32 %ret			ret i32 %ret
	}			}

	define i64 @load-pre-indexed-doubleword2(%pre.struct.i64** %this, i1 %cond,			define i64 @load-pre-indexed-doubleword2(%pre.struct.i64** %this, i1 %cond,
	%pre.struct.i64* %load2) nounwind {			%pre.struct.i64* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-doubleword2			; CHECK-LABEL: load-pre-indexed-doubleword2
	; CHECK: ldr x{{[0-9]+}}, [x{{[0-9]+}}, #8]!			; CHECK: ldr x{{[0-9]+}}, [x{{[0-9]+}}, #8]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i64, %pre.struct.i64* %this			%load1 = load %pre.struct.i64, %pre.struct.i64* %this
	%gep1 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi i64* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi i64* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load i64, i64* %retptr			%ret = load i64, i64* %retptr
	ret i64 %ret			ret i64 %ret
	}			}

	define <2 x i64> @load-pre-indexed-quadword2(%pre.struct.i128** %this, i1 %cond,			define <2 x i64> @load-pre-indexed-quadword2(%pre.struct.i128** %this, i1 %cond,
	%pre.struct.i128* %load2) nounwind {			%pre.struct.i128* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-quadword2			; CHECK-LABEL: load-pre-indexed-quadword2
	; CHECK: ldr q{{[0-9]+}}, [x{{[0-9]+}}, #16]!			; CHECK: ldr q{{[0-9]+}}, [x{{[0-9]+}}, #16]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i128, %pre.struct.i128* %this			%load1 = load %pre.struct.i128, %pre.struct.i128* %this
	%gep1 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi <2 x i64>* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi <2 x i64>* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load <2 x i64>, <2 x i64>* %retptr			%ret = load <2 x i64>, <2 x i64>* %retptr
	ret <2 x i64> %ret			ret <2 x i64> %ret
	}			}

	define float @load-pre-indexed-float2(%pre.struct.float** %this, i1 %cond,			define float @load-pre-indexed-float2(%pre.struct.float** %this, i1 %cond,
	%pre.struct.float* %load2) nounwind {			%pre.struct.float* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-float2			; CHECK-LABEL: load-pre-indexed-float2
	; CHECK: ldr s{{[0-9]+}}, [x{{[0-9]+}}, #4]!			; CHECK: ldr s{{[0-9]+}}, [x{{[0-9]+}}, #4]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.float, %pre.struct.float* %this			%load1 = load %pre.struct.float, %pre.struct.float* %this
	%gep1 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi float* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi float* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load float, float* %retptr			%ret = load float, float* %retptr
	ret float %ret			ret float %ret
	}			}

	define double @load-pre-indexed-double2(%pre.struct.double** %this, i1 %cond,			define double @load-pre-indexed-double2(%pre.struct.double** %this, i1 %cond,
	%pre.struct.double* %load2) nounwind {			%pre.struct.double* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-double2			; CHECK-LABEL: load-pre-indexed-double2
	; CHECK: ldr d{{[0-9]+}}, [x{{[0-9]+}}, #8]!			; CHECK: ldr d{{[0-9]+}}, [x{{[0-9]+}}, #8]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.double, %pre.struct.double* %this			%load1 = load %pre.struct.double, %pre.struct.double* %this
	%gep1 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi double* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi double* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load double, double* %retptr			%ret = load double, double* %retptr
	ret double %ret			ret double %ret
	}			}

	define i32 @load-pre-indexed-word3(%pre.struct.i32** %this, i1 %cond,			define i32 @load-pre-indexed-word3(%pre.struct.i32** %this, i1 %cond,
	%pre.struct.i32* %load2) nounwind {			%pre.struct.i32* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-word3			; CHECK-LABEL: load-pre-indexed-word3
	; CHECK: ldr w{{[0-9]+}}, [x{{[0-9]+}}, #12]!			; CHECK: ldr w{{[0-9]+}}, [x{{[0-9]+}}, #12]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i32, %pre.struct.i32* %this			%load1 = load %pre.struct.i32, %pre.struct.i32* %this
	%gep1 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load1, i64 0, i32 3			%gep1 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load1, i64 0, i32 3
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load2, i64 0, i32 4			%gep2 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load2, i64 0, i32 4
	br label %return			br label %return
	return:			return:
	%retptr = phi i32* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi i32* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load i32, i32* %retptr			%ret = load i32, i32* %retptr
	ret i32 %ret			ret i32 %ret
	}			}

	define i64 @load-pre-indexed-doubleword3(%pre.struct.i64** %this, i1 %cond,			define i64 @load-pre-indexed-doubleword3(%pre.struct.i64** %this, i1 %cond,
	%pre.struct.i64* %load2) nounwind {			%pre.struct.i64* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-doubleword3			; CHECK-LABEL: load-pre-indexed-doubleword3
	; CHECK: ldr x{{[0-9]+}}, [x{{[0-9]+}}, #16]!			; CHECK: ldr x{{[0-9]+}}, [x{{[0-9]+}}, #16]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i64, %pre.struct.i64* %this			%load1 = load %pre.struct.i64, %pre.struct.i64* %this
	%gep1 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load1, i64 0, i32 2			%gep1 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load1, i64 0, i32 2
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load2, i64 0, i32 3			%gep2 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load2, i64 0, i32 3
	br label %return			br label %return
	return:			return:
	%retptr = phi i64* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi i64* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load i64, i64* %retptr			%ret = load i64, i64* %retptr
	ret i64 %ret			ret i64 %ret
	}			}

	define <2 x i64> @load-pre-indexed-quadword3(%pre.struct.i128** %this, i1 %cond,			define <2 x i64> @load-pre-indexed-quadword3(%pre.struct.i128** %this, i1 %cond,
	%pre.struct.i128* %load2) nounwind {			%pre.struct.i128* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-quadword3			; CHECK-LABEL: load-pre-indexed-quadword3
	; CHECK: ldr q{{[0-9]+}}, [x{{[0-9]+}}, #32]!			; CHECK: ldr q{{[0-9]+}}, [x{{[0-9]+}}, #32]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i128, %pre.struct.i128* %this			%load1 = load %pre.struct.i128, %pre.struct.i128* %this
	%gep1 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load1, i64 0, i32 2			%gep1 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load1, i64 0, i32 2
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load2, i64 0, i32 3			%gep2 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load2, i64 0, i32 3
	br label %return			br label %return
	return:			return:
	%retptr = phi <2 x i64>* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi <2 x i64>* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load <2 x i64>, <2 x i64>* %retptr			%ret = load <2 x i64>, <2 x i64>* %retptr
	ret <2 x i64> %ret			ret <2 x i64> %ret
	}			}

	define float @load-pre-indexed-float3(%pre.struct.float** %this, i1 %cond,			define float @load-pre-indexed-float3(%pre.struct.float** %this, i1 %cond,
	%pre.struct.float* %load2) nounwind {			%pre.struct.float* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-float3			; CHECK-LABEL: load-pre-indexed-float3
	; CHECK: ldr s{{[0-9]+}}, [x{{[0-9]+}}, #8]!			; CHECK: ldr s{{[0-9]+}}, [x{{[0-9]+}}, #8]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.float, %pre.struct.float* %this			%load1 = load %pre.struct.float, %pre.struct.float* %this
	%gep1 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load1, i64 0, i32 2			%gep1 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load1, i64 0, i32 2
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load2, i64 0, i32 3			%gep2 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load2, i64 0, i32 3
	br label %return			br label %return
	return:			return:
	%retptr = phi float* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi float* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load float, float* %retptr			%ret = load float, float* %retptr
	ret float %ret			ret float %ret
	}			}

	define double @load-pre-indexed-double3(%pre.struct.double** %this, i1 %cond,			define double @load-pre-indexed-double3(%pre.struct.double** %this, i1 %cond,
	%pre.struct.double* %load2) nounwind {			%pre.struct.double* %load2) nounwind {
	; CHECK-LABEL: load-pre-indexed-double3			; CHECK-LABEL: load-pre-indexed-double3
	; CHECK: ldr d{{[0-9]+}}, [x{{[0-9]+}}, #16]!			; CHECK: ldr d{{[0-9]+}}, [x{{[0-9]+}}, #16]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.double, %pre.struct.double* %this			%load1 = load %pre.struct.double, %pre.struct.double* %this
	%gep1 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load1, i64 0, i32 2			%gep1 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load1, i64 0, i32 2
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load2, i64 0, i32 3			%gep2 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load2, i64 0, i32 3
	br label %return			br label %return
	return:			return:
	%retptr = phi double* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi double* [ %gep1, %if.then ], [ %gep2, %if.end ]
	%ret = load double, double* %retptr			%ret = load double, double* %retptr
	ret double %ret			ret double %ret
	}			}

	; Check the following transform:			; Check the following transform:
	;			;
	; add x8, x8, #16			; add x8, x8, #16
	; ...			; ...
	; str X, [x8]			; str X, [x8]
	; ->			; ->
	; str X, [x8, #16]!			; str X, [x8, #16]
	;			;
	; with X being either w0, x0, s0, d0 or q0.			; with X being either w0, x0, s0, d0 or q0.

	define void @store-pre-indexed-word2(%pre.struct.i32** %this, i1 %cond,			define void @store-pre-indexed-word2(%pre.struct.i32** %this, i1 %cond,
	%pre.struct.i32* %load2,			%pre.struct.i32* %load2,
	i32 %val) nounwind {			i32 %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-word2			; CHECK-LABEL: store-pre-indexed-word2
	; CHECK: str w{{[0-9]+}}, [x{{[0-9]+}}, #4]!			; CHECK: str w{{[0-9]+}}, [x{{[0-9]+}}, #4]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i32, %pre.struct.i32* %this			%load1 = load %pre.struct.i32, %pre.struct.i32* %this
	%gep1 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi i32* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi i32* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store i32 %val, i32* %retptr			store i32 %val, i32* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-doubleword2(%pre.struct.i64** %this, i1 %cond,			define void @store-pre-indexed-doubleword2(%pre.struct.i64** %this, i1 %cond,
	%pre.struct.i64* %load2,			%pre.struct.i64* %load2,
	i64 %val) nounwind {			i64 %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-doubleword2			; CHECK-LABEL: store-pre-indexed-doubleword2
	; CHECK: str x{{[0-9]+}}, [x{{[0-9]+}}, #8]!			; CHECK: str x{{[0-9]+}}, [x{{[0-9]+}}, #8]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i64, %pre.struct.i64* %this			%load1 = load %pre.struct.i64, %pre.struct.i64* %this
	%gep1 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi i64* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi i64* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store i64 %val, i64* %retptr			store i64 %val, i64* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-quadword2(%pre.struct.i128** %this, i1 %cond,			define void @store-pre-indexed-quadword2(%pre.struct.i128** %this, i1 %cond,
	%pre.struct.i128* %load2,			%pre.struct.i128* %load2,
	<2 x i64> %val) nounwind {			<2 x i64> %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-quadword2			; CHECK-LABEL: store-pre-indexed-quadword2
	; CHECK: str q{{[0-9]+}}, [x{{[0-9]+}}, #16]!			; CHECK: str q{{[0-9]+}}, [x{{[0-9]+}}, #16]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i128, %pre.struct.i128* %this			%load1 = load %pre.struct.i128, %pre.struct.i128* %this
	%gep1 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi <2 x i64>* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi <2 x i64>* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store <2 x i64> %val, <2 x i64>* %retptr			store <2 x i64> %val, <2 x i64>* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-float2(%pre.struct.float** %this, i1 %cond,			define void @store-pre-indexed-float2(%pre.struct.float** %this, i1 %cond,
	%pre.struct.float* %load2,			%pre.struct.float* %load2,
	float %val) nounwind {			float %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-float2			; CHECK-LABEL: store-pre-indexed-float2
	; CHECK: str s{{[0-9]+}}, [x{{[0-9]+}}, #4]!			; CHECK: str s{{[0-9]+}}, [x{{[0-9]+}}, #4]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.float, %pre.struct.float* %this			%load1 = load %pre.struct.float, %pre.struct.float* %this
	%gep1 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi float* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi float* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store float %val, float* %retptr			store float %val, float* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-double2(%pre.struct.double** %this, i1 %cond,			define void @store-pre-indexed-double2(%pre.struct.double** %this, i1 %cond,
	%pre.struct.double* %load2,			%pre.struct.double* %load2,
	double %val) nounwind {			double %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-double2			; CHECK-LABEL: store-pre-indexed-double2
	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+}}, #8]!			; CHECK: str d{{[0-9]+}}, [x{{[0-9]+}}, #8]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.double, %pre.struct.double* %this			%load1 = load %pre.struct.double, %pre.struct.double* %this
	%gep1 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load1, i64 0, i32 1			%gep1 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load1, i64 0, i32 1
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load2, i64 0, i32 2			%gep2 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load2, i64 0, i32 2
	br label %return			br label %return
	return:			return:
	%retptr = phi double* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi double* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store double %val, double* %retptr			store double %val, double* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-word3(%pre.struct.i32** %this, i1 %cond,			define void @store-pre-indexed-word3(%pre.struct.i32** %this, i1 %cond,
	%pre.struct.i32* %load2,			%pre.struct.i32* %load2,
	i32 %val) nounwind {			i32 %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-word3			; CHECK-LABEL: store-pre-indexed-word3
	; CHECK: str w{{[0-9]+}}, [x{{[0-9]+}}, #12]!			; CHECK: str w{{[0-9]+}}, [x{{[0-9]+}}, #12]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i32, %pre.struct.i32* %this			%load1 = load %pre.struct.i32, %pre.struct.i32* %this
	%gep1 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load1, i64 0, i32 3			%gep1 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load1, i64 0, i32 3
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load2, i64 0, i32 4			%gep2 = getelementptr inbounds %pre.struct.i32, %pre.struct.i32* %load2, i64 0, i32 4
	br label %return			br label %return
	return:			return:
	%retptr = phi i32* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi i32* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store i32 %val, i32* %retptr			store i32 %val, i32* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-doubleword3(%pre.struct.i64** %this, i1 %cond,			define void @store-pre-indexed-doubleword3(%pre.struct.i64** %this, i1 %cond,
	%pre.struct.i64* %load2,			%pre.struct.i64* %load2,
	i64 %val) nounwind {			i64 %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-doubleword3			; CHECK-LABEL: store-pre-indexed-doubleword3
	; CHECK: str x{{[0-9]+}}, [x{{[0-9]+}}, #24]!			; CHECK: str x{{[0-9]+}}, [x{{[0-9]+}}, #24]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i64, %pre.struct.i64* %this			%load1 = load %pre.struct.i64, %pre.struct.i64* %this
	%gep1 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load1, i64 0, i32 3			%gep1 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load1, i64 0, i32 3
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load2, i64 0, i32 4			%gep2 = getelementptr inbounds %pre.struct.i64, %pre.struct.i64* %load2, i64 0, i32 4
	br label %return			br label %return
	return:			return:
	%retptr = phi i64* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi i64* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store i64 %val, i64* %retptr			store i64 %val, i64* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-quadword3(%pre.struct.i128** %this, i1 %cond,			define void @store-pre-indexed-quadword3(%pre.struct.i128** %this, i1 %cond,
	%pre.struct.i128* %load2,			%pre.struct.i128* %load2,
	<2 x i64> %val) nounwind {			<2 x i64> %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-quadword3			; CHECK-LABEL: store-pre-indexed-quadword3
	; CHECK: str q{{[0-9]+}}, [x{{[0-9]+}}, #32]!			; CHECK: str q{{[0-9]+}}, [x{{[0-9]+}}, #32]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.i128, %pre.struct.i128* %this			%load1 = load %pre.struct.i128, %pre.struct.i128* %this
	%gep1 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load1, i64 0, i32 2			%gep1 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load1, i64 0, i32 2
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load2, i64 0, i32 3			%gep2 = getelementptr inbounds %pre.struct.i128, %pre.struct.i128* %load2, i64 0, i32 3
	br label %return			br label %return
	return:			return:
	%retptr = phi <2 x i64>* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi <2 x i64>* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store <2 x i64> %val, <2 x i64>* %retptr			store <2 x i64> %val, <2 x i64>* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-float3(%pre.struct.float** %this, i1 %cond,			define void @store-pre-indexed-float3(%pre.struct.float** %this, i1 %cond,
	%pre.struct.float* %load2,			%pre.struct.float* %load2,
	float %val) nounwind {			float %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-float3			; CHECK-LABEL: store-pre-indexed-float3
	; CHECK: str s{{[0-9]+}}, [x{{[0-9]+}}, #8]!			; CHECK: str s{{[0-9]+}}, [x{{[0-9]+}}, #8]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.float, %pre.struct.float* %this			%load1 = load %pre.struct.float, %pre.struct.float* %this
	%gep1 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load1, i64 0, i32 2			%gep1 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load1, i64 0, i32 2
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load2, i64 0, i32 3			%gep2 = getelementptr inbounds %pre.struct.float, %pre.struct.float* %load2, i64 0, i32 3
	br label %return			br label %return
	return:			return:
	%retptr = phi float* [ %gep1, %if.then ], [ %gep2, %if.end ]			%retptr = phi float* [ %gep1, %if.then ], [ %gep2, %if.end ]
	store float %val, float* %retptr			store float %val, float* %retptr
	ret void			ret void
	}			}

	define void @store-pre-indexed-double3(%pre.struct.double** %this, i1 %cond,			define void @store-pre-indexed-double3(%pre.struct.double** %this, i1 %cond,
	%pre.struct.double* %load2,			%pre.struct.double* %load2,
	double %val) nounwind {			double %val) nounwind {
	; CHECK-LABEL: store-pre-indexed-double3			; CHECK-LABEL: store-pre-indexed-double3
	; CHECK: str d{{[0-9]+}}, [x{{[0-9]+}}, #16]!			; CHECK: str d{{[0-9]+}}, [x{{[0-9]+}}, #16]
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end
	if.then:			if.then:
	%load1 = load %pre.struct.double, %pre.struct.double* %this			%load1 = load %pre.struct.double, %pre.struct.double* %this
	%gep1 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load1, i64 0, i32 2			%gep1 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load1, i64 0, i32 2
	br label %return			br label %return
	if.end:			if.end:
	%gep2 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load2, i64 0, i32 3			%gep2 = getelementptr inbounds %pre.struct.double, %pre.struct.double* %load2, i64 0, i32 3
	br label %return			br label %return
	▲ Show 20 Lines • Show All 1,024 Lines • Show Last 20 Lines

test/CodeGen/AArch64/ldst-opt.mir

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	bb.0:
RET %lr		RET %lr
...		...
# CHECK-LABEL: name: promote-load-from-store-trivial-kills		# CHECK-LABEL: name: promote-load-from-store-trivial-kills
# CHECK: STRXui %x0, %sp, 0		# CHECK: STRXui %x0, %sp, 0
# CHECK: STRXui %x0, %sp, 2		# CHECK: STRXui %x0, %sp, 2
# CHECK-NOT: LDRXui		# CHECK-NOT: LDRXui
# CHECK-NOT: ORR		# CHECK-NOT: ORR
# CHECK: BL $bar, csr_aarch64_aapcs, implicit-def %lr, implicit %sp, implicit %x0, implicit-def %sp		# CHECK: BL $bar, csr_aarch64_aapcs, implicit-def %lr, implicit %sp, implicit %x0, implicit-def %sp
		---
		name: pre-index-overlap-killed-base
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: %x0

		%x0 = ADDXri killed %x0, 4, 0
		%w0 = LDRWui killed %x0, 0
		...
		# When the base register overlaps the load register we can optimise by not
		# using writeback when the base is killed
		# CHECK-LABEL: name: pre-index-overlap-killed-base
		# CHECK: %w0 = LDRWui killed %x0, 1
		---
		name: pre-index-overlap-live-base
		tracksRegLiveness: true
		body: \|
		bb.0:
		liveins: %x0

		%x0 = ADDXri killed %x0, 4, 0
		%w0 = LDRWui %x0, 0
		...
		# When the base register overlaps the load register we can't optimise if the
		# base register is live
		# CHECK-LABEL: name: pre-index-overlap-live-base
		# CHECK-NOT: %w0 = LDRWui killed %x0, 1
		# CHECK: %x0 = ADDXri killed %x0, 4, 0
		# CHECK: %w0 = LDRWui %x0, 0