This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
7/7
PPCPreEmitPeephole.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
remove-redundant-load-imm.ll
1/4
remove-redundant-load-imm.mir

Differential D64220

[PowerPC] Remove redundant load immediate instructions
ClosedPublic

Authored by Yi-Hong.Lyu on Jul 4 2019, 12:59 PM.

Download Raw Diff

Details

Reviewers

hfinkel
jsji
echristo
nemanjai
stefanp

Commits

rG41a010a4ef16: [PowerPC] Remove redundant load immediate instructions
rL366840: [PowerPC] Remove redundant load immediate instructions

Summary

Currently PowerPC backend emits code like this:

r3 = li 0
std r3, 264(r1)
r3 = li 0
std r3, 272(r1)

This patch fixes that and other cases where a register already contains a value that is loaded so we will get:

r3 = li 0
std r3, 264(r1)
std r3, 272(r1)

Diff Detail

Event Timeline

Yi-Hong.Lyu created this revision.Jul 4 2019, 12:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2019, 12:59 PM

Herald added subscribers: llvm-commits, MaskRay, kbarton, hiraditya. · View Herald Transcript

steven.zhang added a subscriber: steven.zhang.Jul 4 2019, 7:46 PM

steven.zhang added inline comments.

llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.mir
6	Could you show me the LLVM IR that produce these two redundant LI ? So that, we could figure out if they can be avoided instead of removing it in the peephole.

Herald added a subscriber: • wuzish. · View Herald TranscriptJul 4 2019, 7:46 PM

nemanjai added inline comments.Jul 5 2019, 11:17 AM

llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.mir
6	More or less any IR that has multiple PHI nodes that have a zero coming in from the same block and the register has to be spilled. Perhaps we should add a test case such as that (with an inline asm call that clobbers registers thereby requiring spills).

Is this really a PowerPC-specific problem? Do we just want to have a general post-RA cleanup that checks for isMoveImmediate() and does this cleanup?

steven.zhang added inline comments.Jul 10 2019, 7:17 PM

llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.mir
6	Yeah, at least one case to indicate the scenario that produce this pattern,

Added LLVM IR testcases as requested.

Herald added a subscriber: shchenz. · View Herald TranscriptJul 17 2019, 7:42 AM

Yi-Hong.Lyu marked an inline comment as done.Jul 17 2019, 7:58 AM

Yi-Hong.Lyu added a subscriber: t.p.northover.

Yi-Hong.Lyu added inline comments.

llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.mir

There are two test cases in the added remove-redundant-load-imm.ll. The first one only has redundancy on PowerPC, the second one has redundancy for both PowerPC and armv8 (not arm64):

$ cat test1.ll
target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"

define void @hoge(i1 %arg7) {
bb:
  br label %bb10

bb10:                                             ; preds = %bb
  call void @barney.88(i1 %arg7, i32* null)
  ret void
}

declare void @barney.88(i1, i32*)

$ $ $ORI_BIN/llc -O3 -filetype=asm test1.ll -o -
...
# %bb.0:                                # %bb
        mflr 0
        andi. 3, 3, 1
        std 0, 16(1)
        stdu 1, -32(1)
        .cfi_def_cfa_offset 32
        .cfi_offset lr, 16
        li 3, 1
        li 4, 0
        isel 3, 3, 4, 1
        li 4, 0    # redundant load immediate
        bl barney.88
        nop
        addi 1, 1, 32
        ld 0, 16(1)
        mtlr 0
        blr
        .long   0
        .quad   0
...

$ $OPT_BIN/llc -O3 -filetype=asm test1.ll -o -
...
# %bb.0:                                # %bb
        mflr 0
        andi. 3, 3, 1
        std 0, 16(1)
        stdu 1, -32(1)
        .cfi_def_cfa_offset 32
        .cfi_offset lr, 16
        li 3, 1
        li 4, 0
        isel 3, 3, 4, 1
        bl barney.88
        nop
        addi 1, 1, 32
        ld 0, 16(1)
        mtlr 0
        blr
        .long   0
        .quad   0
...

$ cat test2.ll
target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"

@global.6 = external global i32*

declare void @barney.94(i8*, i32)

define void @0() {
  store i32* null, i32** @global.6
  call void @barney.94(i8* undef, i32 0)
  unreachable
}

$ $ORI_BIN/llc -O3 -filetype=asm test2.ll -o -
...
# %bb.0:
        mflr 0
        std 0, 16(1)
        stdu 1, -32(1)
        .cfi_def_cfa_offset 32
        .cfi_offset lr, 16
        addis 3, 2, .LC0@toc@ha
        li 4, 0
        ld 3, .LC0@toc@l(3)
        std 4, 0(3)
        li 4, 0    # redundant load immediate
        bl barney.94
        nop
        .long   0
        .quad   0
...

$ $OPT_BIN/llc -O3 -filetype=asm test2.ll -o -
...
# %bb.0:
        mflr 0
        std 0, 16(1)
        stdu 1, -32(1)
        .cfi_def_cfa_offset 32
        .cfi_offset lr, 16
        addis 3, 2, .LC0@toc@ha
        li 4, 0
        ld 3, .LC0@toc@l(3)
        std 4, 0(3)
        bl barney.94
        nop
        .long   0
        .quad   0
...

$ $ORI_BIN/llc -O3 -mtriple=armv8-eabi -filetype=asm test2.ll -o -
...
@ %bb.0:
        .save   {r11, lr}
        push    {r11, lr}
        movw    r0, :lower16:global.6
        mov     r1, #0
        movt    r0, :upper16:global.6
        str     r1, [r0]
        mov     r1, #0    @ redundant load immediate
        bl      barney.94
...

Note.

$ORI_BIN/llc is the llc without patch and $OPT_BIN/llc is the llc with patch.
We are evaluating the feasibility of platform independent implementation as Hal suggested and @t.p.northover for any input

I only have a number of very minor comments. Overall I think this looks good.
I think you can fix the comments when you commit.

LGTM.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
61	nit: use -> used
67	nit: Wording. `DeadOrKillToUnset is a pointer to the previous operand has kill/dead flag.` I think this next one sounds a little better: `DeadOrKillToUnset is a pointer to the previous operand that had the kill/dead flag set.` Similarly: `It tracks from the def register of BBI, use registers of AfterBBIs and def registers of AfterBBIs.` Sounds better: `It keeps track of the def register of BBI, the use registers of AfterBBIs and the def registers of AfterBBIs.` The above are just suggestions. If you think another wording sounds better you can use that.
76	I think it would be easier to read this loop if you just used `MBB.instrs()` and went through the range that way. The only place you actually need BBE is as the loop bounds for the inner loop and there you can use `MBB.instr_end()` directly.
86	nit: `which operand is a relocation` to `where the operand is a relocation`
106	So here you can probably just use `AfterBBI != MBB.instr_end()`. I know that technically saving the value once and using it in the loop is slightly more efficient in terms of performance but I tend to prefer code that is easier to read when the performance difference is really small like this. However, having said that, others might prefer the other way around so if others (you or other reviewers) prefer the other way around we can do that too.
157	You can use `NumRemovedInPreEmit += InstrsToErase.size()` outside of the loop.
159	nit: How about this? `return !InstrsToErase.empty()`

This revision is now accepted and ready to land.Jul 19 2019, 2:40 PM

Address Stefan's review comments

Yi-Hong.Lyu marked 7 inline comments as done.Jul 23 2019, 10:31 AM

Closed by commit rL366840: [PowerPC] Remove redundant load immediate instructions (authored by Yi-Hong.Lyu). · Explain WhyJul 23 2019, 12:11 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCPreEmitPeephole.cpp

103 lines

test/

CodeGen/

PowerPC/

remove-redundant-load-imm.ll

55 lines

remove-redundant-load-imm.mir

348 lines

Diff 210325

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

MachineFunctionProperties getRequiredProperties() const override {		MachineFunctionProperties getRequiredProperties() const override {
return MachineFunctionProperties().set(		return MachineFunctionProperties().set(
MachineFunctionProperties::Property::NoVRegs);		MachineFunctionProperties::Property::NoVRegs);
}		}

		// This function removes any redundant load immediates. It has two level
		// loops - The outer loop finds the load immediates BBI that could be use to
		stefanpUnsubmitted Done Reply Inline Actions nit: use -> used stefanp: nit: use -> used
		// replace following redundancy. The inner loop scans instructions that
		// after BBI to find redundancy and update kill/dead flags accordingly. If
		// AfterBBI is the same as BBI, it is redundant, otherwise any instructions
		// that modify the def register of BBI would break the scanning.
		// DeadOrKillToUnset is a pointer to the previous operand has kill/dead
		// flag. It tracks from the def register of BBI, use registers of AfterBBIs
		stefanpUnsubmitted Done Reply Inline Actions nit: Wording. `DeadOrKillToUnset is a pointer to the previous operand has kill/dead flag.` I think this next one sounds a little better: `DeadOrKillToUnset is a pointer to the previous operand that had the kill/dead flag set.` Similarly: `It tracks from the def register of BBI, use registers of AfterBBIs and def registers of AfterBBIs.` Sounds better: `It keeps track of the def register of BBI, the use registers of AfterBBIs and the def registers of AfterBBIs.` The above are just suggestions. If you think another wording sounds better you can use that. stefanp: nit: Wording. `DeadOrKillToUnset is a pointer to the previous operand has kill/dead flag.` I…
		// and def registers of AfterBBIs.
		bool removeRedundantLIs(MachineBasicBlock &MBB,
		const TargetRegisterInfo *TRI) {
		LLVM_DEBUG(dbgs() << "Remove redundant load immediates from MBB:\n";
		MBB.dump(); dbgs() << "\n");

		DenseSet<MachineInstr *> InstrsToErase;
		for (auto BBI = MBB.instr_begin(), BBE = MBB.instr_end(); BBI != BBE;
		++BBI) {
		stefanpUnsubmitted Done Reply Inline Actions I think it would be easier to read this loop if you just used `MBB.instrs()` and went through the range that way. The only place you actually need BBE is as the loop bounds for the inner loop and there you can use `MBB.instr_end()` directly. stefanp: I think it would be easier to read this loop if you just used `MBB.instrs()` and went through…
		// Skip load immediate that is marked to be erased later because it
		// cannot be used to replace any other instructions.
		if (InstrsToErase.find(&*BBI) != InstrsToErase.end())
		continue;
		// Skip non-load immediate.
		unsigned Opc = BBI->getOpcode();
		if (Opc != PPC::LI && Opc != PPC::LI8 && Opc != PPC::LIS &&
		Opc != PPC::LIS8)
		continue;
		// Skip load immediate, which operand is a relocation (e.g., $r3 = LI
		stefanpUnsubmitted Done Reply Inline Actions nit: `which operand is a relocation` to `where the operand is a relocation` stefanp: nit: `which operand is a relocation` to `where the operand is a relocation`
		// target-flags(ppc-lo) %const.0).
		if (!BBI->getOperand(1).isImm())
		continue;
		assert(BBI->getOperand(0).isReg() &&
		"Expected a register for the first operand");

		LLVM_DEBUG(dbgs() << "Scanning after load immediate: "; BBI->dump(););

		unsigned Reg = BBI->getOperand(0).getReg();
		int64_t Imm = BBI->getOperand(1).getImm();
		MachineOperand *DeadOrKillToUnset = nullptr;
		if (BBI->getOperand(0).isDead()) {
		DeadOrKillToUnset = &BBI->getOperand(0);
		LLVM_DEBUG(dbgs() << " Kill flag of " << *DeadOrKillToUnset
		<< " from load immediate " << *BBI
		<< " is a unsetting candidate\n");
		}
		// This loop scans instructions after BBI to see if there is any
		// redundant load immediate.
		for (auto AfterBBI = std::next(BBI); AfterBBI != BBE; ++AfterBBI) {
		stefanpUnsubmitted Done Reply Inline Actions So here you can probably just use `AfterBBI != MBB.instr_end()`. I know that technically saving the value once and using it in the loop is slightly more efficient in terms of performance but I tend to prefer code that is easier to read when the performance difference is really small like this. However, having said that, others might prefer the other way around so if others (you or other reviewers) prefer the other way around we can do that too. stefanp: So here you can probably just use `AfterBBI != MBB.instr_end()`. I know that technically saving…
		// Track the operand that kill Reg. We would unset the kill flag of
		// the operand if there is a following redundant load immediate.
		int KillIdx = AfterBBI->findRegisterUseOperandIdx(Reg, true, TRI);
		if (KillIdx != -1) {
		assert(!DeadOrKillToUnset && "Shouldn't kill same register twice");
		DeadOrKillToUnset = &AfterBBI->getOperand(KillIdx);
		LLVM_DEBUG(dbgs()
		<< " Kill flag of " << *DeadOrKillToUnset << " from "
		<< *AfterBBI << " is a unsetting candidate\n");
		}

		if (!AfterBBI->modifiesRegister(Reg, TRI))
		continue;
		assert(DeadOrKillToUnset &&
		"Shouldn't overwrite a register before it is killed");
		// Finish scanning because Reg is overwritten by a non-load
		// instruction.
		if (AfterBBI->getOpcode() != Opc)
		break;
		assert(AfterBBI->getOperand(0).isReg() &&
		"Expected a register for the first operand");
		// Finish scanning because Reg is overwritten by a relocation or a
		// different value.
		if (!AfterBBI->getOperand(1).isImm() \|\|
		AfterBBI->getOperand(1).getImm() != Imm)
		break;

		// It loads same immediate value to the same Reg, which is redundant.
		// We would unset kill flag in previous Reg usage to extend live range
		// of Reg first, then remove the redundancy.
		LLVM_DEBUG(dbgs() << " Unset dead/kill flag of " << *DeadOrKillToUnset
		<< " from " << *DeadOrKillToUnset->getParent());
		if (DeadOrKillToUnset->isDef())
		DeadOrKillToUnset->setIsDead(false);
		else
		DeadOrKillToUnset->setIsKill(false);
		DeadOrKillToUnset =
		AfterBBI->findRegisterDefOperand(Reg, true, true, TRI);
		if (DeadOrKillToUnset)
		LLVM_DEBUG(dbgs()
		<< " Dead flag of " << *DeadOrKillToUnset << " from "
		<< *AfterBBI << " is a unsetting candidate\n");
		InstrsToErase.insert(&*AfterBBI);
		LLVM_DEBUG(dbgs() << " Remove redundant load immediate: ";
		AfterBBI->dump());
		}
		}

		for (MachineInstr *MI : InstrsToErase) {
		MI->eraseFromParent();
		NumRemovedInPreEmit++;
		stefanpUnsubmitted Done Reply Inline Actions You can use `NumRemovedInPreEmit += InstrsToErase.size()` outside of the loop. stefanp: You can use `NumRemovedInPreEmit += InstrsToErase.size()` outside of the loop.
		}
		return InstrsToErase.size() > 0;
		stefanpUnsubmitted Done Reply Inline Actions nit: How about this? `return !InstrsToErase.empty()` stefanp: nit: How about this? `return !InstrsToErase.empty()`
		}

bool runOnMachineFunction(MachineFunction &MF) override {		bool runOnMachineFunction(MachineFunction &MF) override {
if (skipFunction(MF.getFunction()) \|\| !RunPreEmitPeephole)		if (skipFunction(MF.getFunction()) \|\| !RunPreEmitPeephole)
return false;		return false;
bool Changed = false;		bool Changed = false;
const PPCInstrInfo *TII = MF.getSubtarget<PPCSubtarget>().getInstrInfo();		const PPCInstrInfo *TII = MF.getSubtarget<PPCSubtarget>().getInstrInfo();
const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
SmallVector<MachineInstr *, 4> InstrsToErase;		SmallVector<MachineInstr *, 4> InstrsToErase;
for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
		Changed \|= removeRedundantLIs(MBB, TRI);
for (MachineInstr &MI : MBB) {		for (MachineInstr &MI : MBB) {
unsigned Opc = MI.getOpcode();		unsigned Opc = MI.getOpcode();
// Detect self copies - these can result from running AADB.		// Detect self copies - these can result from running AADB.
if (PPCInstrInfo::isSameClassPhysRegCopy(Opc)) {		if (PPCInstrInfo::isSameClassPhysRegCopy(Opc)) {
const MCInstrDesc &MCID = TII->get(Opc);		const MCInstrDesc &MCID = TII->get(Opc);
if (MCID.getNumOperands() == 3 &&		if (MCID.getNumOperands() == 3 &&
MI.getOperand(0).getReg() == MI.getOperand(1).getReg() &&		MI.getOperand(0).getReg() == MI.getOperand(1).getReg() &&
MI.getOperand(0).getReg() == MI.getOperand(2).getReg()) {		MI.getOperand(0).getReg() == MI.getOperand(2).getReg()) {
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mcpu=pwr9 -O3 < %s \| FileCheck %s -check-prefix=PPC64LE

				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-unknown-linux-gnu"

				@global.6 = external global i32*

				declare void @barney.88(i1, i32*)
				declare void @barney.94(i8*, i32)

				define void @redundancy_on_ppc_only(i1 %arg7) {
				; PPC64LE-LABEL: redundancy_on_ppc_only:
				; PPC64LE: # %bb.0: # %bb
				; PPC64LE-NEXT: mflr 0
				; PPC64LE-NEXT: andi. 3, 3, 1
				; PPC64LE-NEXT: std 0, 16(1)
				; PPC64LE-NEXT: stdu 1, -32(1)
				; PPC64LE-NEXT: .cfi_def_cfa_offset 32
				; PPC64LE-NEXT: .cfi_offset lr, 16
				; PPC64LE-NEXT: li 3, 1
				; PPC64LE-NEXT: li 4, 0
				; PPC64LE-NEXT: isel 3, 3, 4, 1
				; PPC64LE-NEXT: bl barney.88
				; PPC64LE-NEXT: nop
				; PPC64LE-NEXT: addi 1, 1, 32
				; PPC64LE-NEXT: ld 0, 16(1)
				; PPC64LE-NEXT: mtlr 0
				; PPC64LE-NEXT: blr
				bb:
				br label %bb10

				bb10: ; preds = %bb
				call void @barney.88(i1 %arg7, i32* null)
				ret void
				}

				define void @redundancy_on_ppc_and_other_targets() {
				; PPC64LE-LABEL: redundancy_on_ppc_and_other_targets:
				; PPC64LE: # %bb.0:
				; PPC64LE-NEXT: mflr 0
				; PPC64LE-NEXT: std 0, 16(1)
				; PPC64LE-NEXT: stdu 1, -32(1)
				; PPC64LE-NEXT: .cfi_def_cfa_offset 32
				; PPC64LE-NEXT: .cfi_offset lr, 16
				; PPC64LE-NEXT: addis 3, 2, .LC0@toc@ha
				; PPC64LE-NEXT: ld 3, .LC0@toc@l(3)
				; PPC64LE-NEXT: li 4, 0
				; PPC64LE-NEXT: std 4, 0(3)
				; PPC64LE-NEXT: bl barney.94
				; PPC64LE-NEXT: nop
				store i32* null, i32** @global.6
				call void @barney.94(i8* undef, i32 0)
				unreachable
				}

llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mcpu=pwr9 -mtriple=powerpc64le-unknown-unknown -run-pass ppc-pre-emit-peephole %s -o - \| FileCheck %s

				---
				name: t1
				alignment: 4
				steven.zhangUnsubmitted Not Done Reply Inline Actions Could you show me the LLVM IR that produce these two redundant LI ? So that, we could figure out if they can be avoided instead of removing it in the peephole. steven.zhang: Could you show me the LLVM IR that produce these two redundant LI ? So that, we could figure…
				nemanjaiUnsubmitted Not Done Reply Inline Actions More or less any IR that has multiple PHI nodes that have a zero coming in from the same block and the register has to be spilled. Perhaps we should add a test case such as that (with an inline asm call that clobbers registers thereby requiring spills). nemanjai: More or less any IR that has multiple PHI nodes that have a zero coming in from the same block…
				steven.zhangUnsubmitted Not Done Reply Inline Actions Yeah, at least one case to indicate the scenario that produce this pattern, steven.zhang: Yeah, at least one case to indicate the scenario that produce this pattern,
				Yi-Hong.LyuAuthorUnsubmitted Done Reply Inline Actions There are two test cases in the added remove-redundant-load-imm.ll. The first one only has redundancy on PowerPC, the second one has redundancy for both PowerPC and armv8 (not arm64): $ cat test1.ll target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" define void @hoge(i1 %arg7) { bb: br label %bb10 bb10: ; preds = %bb call void @barney.88(i1 %arg7, i32* null) ret void } declare void @barney.88(i1, i32) $ $ $ORI_BIN/llc -O3 -filetype=asm test1.ll -o - ... # %bb.0: # %bb mflr 0 andi. 3, 3, 1 std 0, 16(1) stdu 1, -32(1) .cfi_def_cfa_offset 32 .cfi_offset lr, 16 li 3, 1 li 4, 0 isel 3, 3, 4, 1 li 4, 0 # redundant load immediate bl barney.88 nop addi 1, 1, 32 ld 0, 16(1) mtlr 0 blr .long 0 .quad 0 ... $ $OPT_BIN/llc -O3 -filetype=asm test1.ll -o - ... # %bb.0: # %bb mflr 0 andi. 3, 3, 1 std 0, 16(1) stdu 1, -32(1) .cfi_def_cfa_offset 32 .cfi_offset lr, 16 li 3, 1 li 4, 0 isel 3, 3, 4, 1 bl barney.88 nop addi 1, 1, 32 ld 0, 16(1) mtlr 0 blr .long 0 .quad 0 ... $ cat test2.ll target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" @global.6 = external global i32 declare void @barney.94(i8, i32) define void @0() { store i32 null, i32** @global.6 call void @barney.94(i8* undef, i32 0) unreachable } $ $ORI_BIN/llc -O3 -filetype=asm test2.ll -o - ... # %bb.0: mflr 0 std 0, 16(1) stdu 1, -32(1) .cfi_def_cfa_offset 32 .cfi_offset lr, 16 addis 3, 2, .LC0@toc@ha li 4, 0 ld 3, .LC0@toc@l(3) std 4, 0(3) li 4, 0 # redundant load immediate bl barney.94 nop .long 0 .quad 0 ... $ $OPT_BIN/llc -O3 -filetype=asm test2.ll -o - ... # %bb.0: mflr 0 std 0, 16(1) stdu 1, -32(1) .cfi_def_cfa_offset 32 .cfi_offset lr, 16 addis 3, 2, .LC0@toc@ha li 4, 0 ld 3, .LC0@toc@l(3) std 4, 0(3) bl barney.94 nop .long 0 .quad 0 ... $ $ORI_BIN/llc -O3 -mtriple=armv8-eabi -filetype=asm test2.ll -o - ... @ %bb.0: .save {r11, lr} push {r11, lr} movw r0, :lower16:global.6 mov r1, #0 movt r0, :upper16:global.6 str r1, [r0] mov r1, #0 @ redundant load immediate bl barney.94 ... Note. `$ORI_BIN/llc` is the llc without patch and `$OPT_BIN/llc` is the llc with patch. We are evaluating the feasibility of platform independent implementation as Hal suggested and @t.p.northover for any input Yi-Hong.Lyu: There are two test cases in the added remove-redundant-load-imm.ll. The first one only has…
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: t1
				; CHECK: liveins: $x1
				; CHECK: renamable $x3 = LI8 0
				; CHECK: STD renamable $x3, 16, $x1
				; CHECK: STD killed renamable $x3, 8, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LI8 0
				STD killed renamable $x3, 16, $x1
				renamable $x3 = LI8 0
				STD killed renamable $x3, 8, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t2
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: t2
				; CHECK: liveins: $x1
				; CHECK: renamable $x3 = LI8 0
				; CHECK: STD renamable $x3, 32, $x1
				; CHECK: STD renamable $x3, 24, $x1
				; CHECK: STD renamable $x3, 16, $x1
				; CHECK: STD killed renamable $x3, 8, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LI8 0
				STD killed renamable $x3, 32, $x1
				renamable $x3 = LI8 0
				STD killed renamable $x3, 24, $x1
				renamable $x3 = LI8 0
				STD killed renamable $x3, 16, $x1
				renamable $x3 = LI8 0
				STD killed renamable $x3, 8, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t3
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: t3
				; CHECK: liveins: $x1
				; CHECK: renamable $x3 = LI8 0
				; CHECK: STD renamable $x3, 32, $x1
				; CHECK: STD renamable $x3, 24, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LI8 0
				STD killed renamable $x3, 32, $x1
				renamable $x3 = LI8 0
				STD renamable $x3, 24, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t4
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: t4
				; CHECK: liveins: $x1
				; CHECK: renamable $x3 = LI8 0
				; CHECK: STD renamable $x3, 16, $x1
				; CHECK: renamable $x4 = ADDI8 renamable $x3, 8
				; CHECK: STD killed renamable $x3, 8, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LI8 0
				STD killed renamable $x3, 16, $x1
				renamable $x3 = LI8 0
				renamable $x4 = ADDI8 killed renamable $x3, 8
				renamable $x3 = LI8 0
				STD killed renamable $x3, 8, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t5
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: t5
				; CHECK: liveins: $x1
				; CHECK: renamable $r3 = LI 0
				; CHECK: STW renamable $r3, 16, $x1
				; CHECK: STW killed renamable $r3, 12, $x1
				; CHECK: renamable $r3 = LI 1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $r3 = LI 0
				STW killed renamable $r3, 16, $x1
				renamable $r3 = LI 0
				STW killed renamable $r3, 12, $x1
				renamable $r3 = LI 1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t6
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: t6
				; CHECK: liveins: $x1
				; CHECK: renamable $x3 = LI8 0
				; CHECK: renamable $x4 = LI8 1
				; CHECK: STD renamable $x3, 32, $x1
				; CHECK: STD renamable $x4, 24, $x1
				; CHECK: STD killed renamable $x3, 16, $x1
				; CHECK: STD killed renamable $x4, 8, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LI8 0
				renamable $x4 = LI8 1
				STD killed renamable $x3, 32, $x1
				STD killed renamable $x4, 24, $x1
				renamable $x3 = LI8 0
				renamable $x4 = LI8 1
				STD killed renamable $x3, 16, $x1
				STD killed renamable $x4, 8, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t7
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1, $x4

				; CHECK-LABEL: name: t7
				; CHECK: liveins: $x1, $x4
				; CHECK: renamable $x3 = LI8 0
				; CHECK: STD killed renamable $x3, 32, $x1
				; CHECK: renamable $x3 = ADDI8 $x4, 6
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LI8 0
				STD killed renamable $x3, 32, $x1
				renamable $x3 = ADDI8 $x4, 6
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t8
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: t8
				; CHECK: liveins: $x1
				; CHECK: renamable $x3 = LI8 0
				; CHECK: STD renamable $x3, 32, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LI8 0
				STD killed renamable $x3, 32, $x1
				renamable $x3 = LI8 0
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t9
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				; CHECK-LABEL: name: t9
				; CHECK: bb.0.entry:
				; CHECK: successors: %bb.2(0x40000000), %bb.1(0x40000000)
				; CHECK: liveins: $x3
				; CHECK: renamable $r4 = LI 0, implicit-def $x4
				; CHECK: renamable $x24 = RLDICL renamable $x4, 0, 32
				; CHECK: renamable $cr0 = CMPLDI renamable $x3, 0
				; CHECK: BCC 68, killed renamable $cr0, %bb.1
				; CHECK: B %bb.2
				; CHECK: bb.1:
				; CHECK: liveins: $r4, $x1
				; CHECK: STW killed renamable $r4, 16, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				; CHECK: bb.2:
				; CHECK: liveins: $r4, $x1
				; CHECK: STW killed renamable $r4, 32, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				bb.0.entry:
				liveins: $x3
				successors: %bb.8, %bb.7

				renamable $r4 = LI 0, implicit-def $x4
				renamable $x24 = RLDICL killed renamable $x4, 0 , 32
				renamable $cr0 = CMPLDI renamable $x3, 0
				renamable $r4 = LI 0
				BCC 68, killed renamable $cr0, %bb.7
				B %bb.8

				bb.7:
				liveins: $r4, $x1
				STW killed renamable $r4, 16, $x1
				BLR8 implicit $lr8, implicit $rm

				bb.8:
				liveins: $r4, $x1
				STW killed renamable $r4, 32, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: t10
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: t10
				; CHECK: liveins: $x1
				; CHECK: renamable $x3 = LI8 24
				; CHECK: STD killed renamable $x3, 16, $x1
				; CHECK: renamable $r3 = LI 0
				; CHECK: STW killed renamable $r3, 26, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LI8 24
				STD killed renamable $x3, 16, $x1
				renamable $r3 = LI 0
				STW killed renamable $r3, 26, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: LIS8
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: LIS8
				; CHECK: liveins: $x1
				; CHECK: renamable $x3 = LIS8 0
				; CHECK: STD renamable $x3, 16, $x1
				; CHECK: STD killed renamable $x3, 8, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x3 = LIS8 0
				STD killed renamable $x3, 16, $x1
				renamable $x3 = LIS8 0
				STD killed renamable $x3, 8, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: LIS
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: LIS
				; CHECK: liveins: $x1
				; CHECK: renamable $r3 = LIS 0
				; CHECK: STW renamable $r3, 16, $x1
				; CHECK: STW killed renamable $r3, 12, $x1
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $r3 = LIS 0
				STW killed renamable $r3, 16, $x1
				renamable $r3 = LIS 0
				STW killed renamable $r3, 12, $x1
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: modify_and_kill_the_reg_in_the_same_inst
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:

				; CHECK-LABEL: name: modify_and_kill_the_reg_in_the_same_inst
				; CHECK: renamable $x6 = LI8 1
				; CHECK: renamable $x6 = RLDICR killed renamable $x6, 44, 19
				; CHECK: BLR8 implicit $lr8, implicit $rm
				renamable $x6 = LI8 1
				renamable $x6 = RLDICR killed renamable $x6, 44, 19
				BLR8 implicit $lr8, implicit $rm

				...
				---
				name: dead_load_immediate_followed_by_a_redundancy
				alignment: 4
				tracksRegLiveness: true
				machineFunctionInfo: {}
				body: \|
				bb.0.entry:
				liveins: $x1

				; CHECK-LABEL: name: dead_load_immediate_followed_by_a_redundancy
				; CHECK: liveins: $x1
				; CHECK: renamable $r3 = LI 128
				; CHECK: renamable $x4 = ADDI8 $x1, -128
				; CHECK: renamable $x5 = ADDI8 $x1, -128
				; CHECK: STW killed renamable $r3, 16, $x4
				; CHECK: BLR8 implicit $lr8, implicit $rm
				dead renamable $r3 = LI 128
				renamable $x4 = ADDI8 $x1, -128
				dead renamable $r3 = LI 128
				renamable $x5 = ADDI8 $x1, -128
				renamable $r3 = LI 128
				STW killed renamable $r3, 16, $x4
				BLR8 implicit $lr8, implicit $rm

				...

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Remove redundant load immediate instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 210325

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp

llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.ll

llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.mir

[PowerPC] Remove redundant load immediate instructions
ClosedPublic