This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
DeadMachineInstructionElim.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
elim-dead-mi.mir

Differential D91513

[DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead
ClosedPublic

Authored by guopeilin on Nov 15 2020, 7:22 PM.

Download Raw Diff

Details

Reviewers

sunfish
hliao
rampitec
wwei

Commits

rG1cd19fc5681b: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run…

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

guopeilin created this revision.Nov 15 2020, 7:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2020, 7:22 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

guopeilin requested review of this revision.Nov 15 2020, 7:22 PM

Harbormaster completed remote builds in B78911: Diff 305404.Nov 15 2020, 7:55 PM

guopeilin added reviewers: sunfish, hliao, rampitec, wwei.Nov 15 2020, 8:24 PM

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

In D91513#2397313, @hliao wrote:

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

guopeilin retitled this revision from [DeadMachineInstrctionElim] Iteratively run DeadMachineInstrcutionElim pass untill nothing dead to [DeadMachineInstrctionElim] Iteratively run DeadMachineInstrcutionElim pass until nothing dead.Nov 16 2020, 5:26 PM

guopeilin retitled this revision from [DeadMachineInstrctionElim] Iteratively run DeadMachineInstrcutionElim pass until nothing dead to [DeadMachineInstrctionElim] Iteratively run DeadMachineInstructionElim pass until nothing dead.

In D91513#2398528, @guopeilin wrote:

In D91513#2397313, @hliao wrote:

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

In D91513#2398697, @hliao wrote:

In D91513#2398528, @guopeilin wrote:

In D91513#2397313, @hliao wrote:

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

In D91513#2398717, @rampitec wrote:

In D91513#2398697, @hliao wrote:

In D91513#2398528, @guopeilin wrote:

In D91513#2397313, @hliao wrote:

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

It still works unless that value has a cyclic dependency through phi-node.

In D91513#2398726, @hliao wrote:

In D91513#2398717, @rampitec wrote:

In D91513#2398697, @hliao wrote:

In D91513#2398528, @guopeilin wrote:

In D91513#2397313, @hliao wrote:

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

It still works unless that value has a cyclic dependency through phi-node.

That's exactly what I had in mind, a phi node as the only way to get a cyclic dependency in SSA.

I tend to say this is LGTM. Although I wish to see a test with a cyclic dependency.

In D91513#2398733, @rampitec wrote:

In D91513#2398726, @hliao wrote:

In D91513#2398717, @rampitec wrote:

In D91513#2398697, @hliao wrote:

In D91513#2398528, @guopeilin wrote:

In D91513#2397313, @hliao wrote:

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

It still works unless that value has a cyclic dependency through phi-node.

That's exactly what I had in mind, a phi node as the only way to get a cyclic dependency in SSA.

I tend to say this is LGTM. Although I wish to see a test with a cyclic dependency.

That value with cyclic dependence cannot be removed unless that loop is deemed as dead. It's beyond the scope this DCE. Using the post order removes the need for iterative runs.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

0001-DeadMachineInstructionElim-Use-post-order-for-basic-.patch4 KBDownload

That test passed with this traverse order change.

In D91513#2400468, @hliao wrote:

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.
0001-DeadMachineInstructionElim-Use-post-order-for-basic-.patch4 KBDownload
That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

In D91513#2401470, @guopeilin wrote:

In D91513#2400468, @hliao wrote:

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.
0001-DeadMachineInstructionElim-Use-post-order-for-basic-.patch4 KBDownload
That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?

From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

In D91513#2401473, @guopeilin wrote:

In D91513#2401470, @guopeilin wrote:

In D91513#2400468, @hliao wrote:

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.
0001-DeadMachineInstructionElim-Use-post-order-for-basic-.patch4 KBDownload
That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?

From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

You are right, that's possible. That case should be rare as that's a def in the back-edge with acyclic dep. Could you merge the post-order change together with the iterative runs? so that, in the regular case, we at most run twice. Please keep on eye on compile time.

This revision is now accepted and ready to land.Nov 17 2020, 9:57 PM

BTW, please add a test case with that def in back-edge with acyclic dep.

In D91513#2401715, @hliao wrote:

In D91513#2401473, @guopeilin wrote:

In D91513#2401470, @guopeilin wrote:

In D91513#2400468, @hliao wrote:

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.
0001-DeadMachineInstructionElim-Use-post-order-for-basic-.patch4 KBDownload
That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?

From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

You are right, that's possible. That case should be rare as that's a def in the back-edge with acyclic dep. Could you merge the post-order change together with the iterative runs? so that, in the regular case, we at most run twice. Please keep on eye on compile time.

Post order visit and iteratively run are merged.

In D91513#2402005, @guopeilin wrote:

In D91513#2401715, @hliao wrote:

In D91513#2401473, @guopeilin wrote:

In D91513#2401470, @guopeilin wrote:

In D91513#2400468, @hliao wrote:

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.
0001-DeadMachineInstructionElim-Use-post-order-for-basic-.patch4 KBDownload
That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?

From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

You are right, that's possible. That case should be rare as that's a def in the back-edge with acyclic dep. Could you merge the post-order change together with the iterative runs? so that, in the regular case, we at most run twice. Please keep on eye on compile time.

Post order visit and iteratively run are merged.

Do you add a test case similar to the CFG you showed?

Do you have permission to commit?

In D91513#2407245, @hliao wrote:

Do you have permission to commit?

No, but I will apply for permission later. And I am still trying to write the test case that demonstrates a cycle within cfg manually, Although I found it difficult and cost time.

Closed by commit rG1cd19fc5681b: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run… (authored by wwei). · Explain WhyNov 20 2020, 9:03 AM

This revision was automatically updated to reflect the committed changes.

wwei added a commit: rG1cd19fc5681b: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run….

loladiro mentioned this in D97435: [Aarch64] Correct register class for pseudo instructions.Feb 24 2021, 8:06 PM

vtjnash mentioned this in rGe20f69f612dd: [Aarch64] Correct register class for pseudo instructions.Sep 9 2021, 11:33 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

DeadMachineInstructionElim.cpp

21 lines

test/

CodeGen/

AArch64/

elim-dead-mi.mir

61 lines

Diff 306709

llvm/lib/CodeGen/DeadMachineInstructionElim.cpp

//===- DeadMachineInstructionElim.cpp - Remove dead machine instructions --===//		//===- DeadMachineInstructionElim.cpp - Remove dead machine instructions --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This is an extremely simple MachineInstr-level dead-code-elimination pass.		// This is an extremely simple MachineInstr-level dead-code-elimination pass.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
Show All 22 Lines	public:

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

private:		private:
bool isDead(const MachineInstr *MI) const;		bool isDead(const MachineInstr *MI) const;

		bool eliminateDeadMI(MachineFunction &MF);
};		};
}		}
char DeadMachineInstructionElim::ID = 0;		char DeadMachineInstructionElim::ID = 0;
char &llvm::DeadMachineInstructionElimID = DeadMachineInstructionElim::ID;		char &llvm::DeadMachineInstructionElimID = DeadMachineInstructionElim::ID;

INITIALIZE_PASS(DeadMachineInstructionElim, DEBUG_TYPE,		INITIALIZE_PASS(DeadMachineInstructionElim, DEBUG_TYPE,
"Remove dead machine instructions", false, false)		"Remove dead machine instructions", false, false)

▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	#endif

// If there are no defs with uses, the instruction is dead.		// If there are no defs with uses, the instruction is dead.
return true;		return true;
}		}

bool DeadMachineInstructionElim::runOnMachineFunction(MachineFunction &MF) {		bool DeadMachineInstructionElim::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;
		bool AnyChanges = eliminateDeadMI(MF);
		while (AnyChanges && eliminateDeadMI(MF))
		;
		return AnyChanges;
		}

		bool DeadMachineInstructionElim::eliminateDeadMI(MachineFunction &MF) {
bool AnyChanges = false;		bool AnyChanges = false;
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
TRI = MF.getSubtarget().getRegisterInfo();		TRI = MF.getSubtarget().getRegisterInfo();
TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();

// Loop over all instructions in all blocks, from bottom to top, so that it's		// Loop over all instructions in all blocks, from bottom to top, so that it's
// more likely that chains of dependent but ultimately dead instructions will		// more likely that chains of dependent but ultimately dead instructions will
// be cleaned up.		// be cleaned up.
for (MachineBasicBlock &MBB : make_range(MF.rbegin(), MF.rend())) {		for (MachineBasicBlock *MBB : post_order(&MF)) {
// Start out assuming that reserved registers are live out of this block.		// Start out assuming that reserved registers are live out of this block.
LivePhysRegs = MRI->getReservedRegs();		LivePhysRegs = MRI->getReservedRegs();

// Add live-ins from successors to LivePhysRegs. Normally, physregs are not		// Add live-ins from successors to LivePhysRegs. Normally, physregs are not
// live across blocks, but some targets (x86) can have flags live out of a		// live across blocks, but some targets (x86) can have flags live out of a
// block.		// block.
for (MachineBasicBlock::succ_iterator S = MBB.succ_begin(),		for (MachineBasicBlock::succ_iterator S = MBB->succ_begin(),
E = MBB.succ_end(); S != E; S++)		E = MBB->succ_end();
		S != E; S++)
for (const auto &LI : (*S)->liveins())		for (const auto &LI : (*S)->liveins())
LivePhysRegs.set(LI.PhysReg);		LivePhysRegs.set(LI.PhysReg);

// Now scan the instructions and delete dead ones, tracking physreg		// Now scan the instructions and delete dead ones, tracking physreg
// liveness as we go.		// liveness as we go.
for (MachineBasicBlock::reverse_iterator MII = MBB.rbegin(),		for (MachineBasicBlock::reverse_iterator MII = MBB->rbegin(),
MIE = MBB.rend(); MII != MIE; ) {		MIE = MBB->rend();
		MII != MIE;) {
MachineInstr MI = &MII++;		MachineInstr MI = &MII++;

// If the instruction is dead, delete it!		// If the instruction is dead, delete it!
if (isDead(MI)) {		if (isDead(MI)) {
LLVM_DEBUG(dbgs() << "DeadMachineInstructionElim: DELETING: " << *MI);		LLVM_DEBUG(dbgs() << "DeadMachineInstructionElim: DELETING: " << *MI);
// It is possible that some DBG_VALUE instructions refer to this		// It is possible that some DBG_VALUE instructions refer to this
// instruction. They get marked as undef and will be deleted		// instruction. They get marked as undef and will be deleted
// in the live debug variable analysis.		// in the live debug variable analysis.
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/elim-dead-mi.mir

This file was added.

				# RUN: llc -mtriple=aarch64-arm-none-eabi -o - %s \
				# RUN: -run-pass dead-mi-elimination \| FileCheck %s
				--- \|
				@c = internal unnamed_addr global [3 x i8] zeroinitializer, align 4
				@d = common dso_local local_unnamed_addr global i32 0, align 4

				define dso_local i32 @main() local_unnamed_addr {
				%scevgep = getelementptr i8, i8* getelementptr inbounds ([3 x i8], [3 x i8]* @c, i64 0, i64 1), i64 0
				ret i32 0
				}
				...
				---
				name: main
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gpr64, preferred-register: '' }
				- { id: 1, class: gpr64common, preferred-register: '' }
				- { id: 2, class: gpr64, preferred-register: '' }
				- { id: 3, class: gpr64common, preferred-register: '' }
				- { id: 4, class: gpr32, preferred-register: '' }
				- { id: 5, class: gpr32all, preferred-register: '' }
				- { id: 6, class: gpr64, preferred-register: '' }
				body: \|
				bb.0:
				successors: %bb.4(0x30000000), %bb.5(0x50000000)

				%0:gpr64 = MOVaddr target-flags(aarch64-page) @c, target-flags(aarch64-pageoff, aarch64-nc) @c
				CBZX killed %0, %bb.4
				B %bb.5

				bb.1:
				successors: %bb.3(0x04000000), %bb.2(0x7c000000)

				%1:gpr64common = MOVaddr target-flags(aarch64-page) @c, target-flags(aarch64-pageoff, aarch64-nc) @c
				%2:gpr64 = SUBSXri %1, 2, 0, implicit-def $nzcv
				Bcc 0, %bb.3, implicit $nzcv
				B %bb.2

				bb.2:
				successors: %bb.1(0x80000000)
				%3:gpr64common = ADDXrr %6, %2
				%4:gpr32 = LDRBBui killed %3, 1 :: (load 1 from %ir.scevgep)
				%5:gpr32all = COPY %4
				B %bb.1

				bb.3:
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit $sp
				$x0 = MOVaddr target-flags(aarch64-page) @c, target-flags(aarch64-pageoff, aarch64-nc) @c
				ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit $sp
				RET_ReallyLR implicit $w0

				bb.4:
				successors: %bb.5(0x80000000)

				bb.5:
				successors: %bb.1(0x80000000)
				; CHECK: bb.5
				; CHECK-NOT: %6:gpr64 = MOVaddr target-flags(aarch64-page) @c, target-flags(aarch64-pageoff, aarch64-nc) @c
				%6:gpr64 = MOVaddr target-flags(aarch64-page) @c, target-flags(aarch64-pageoff, aarch64-nc) @c
				B %bb.1
				...