Download Raw Diff

Details

Reviewers

lkail
anton-afanasyev
dmgreen
lebedev.ri

Summary

For some scenarios like optimizing for size, we may want to do
aggressive MachineCSE on the whole function, so we add an option
--enable-aggressive-machine-cse and a target hook to enable this.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	80 ms	x64 debian > LLVM.Bindings/Go::go.test

Event Timeline

• pcwang-thead created this revision.Nov 22 2021, 4:15 AM

Herald added subscribers: luke957, frasercrmck, luismarques and 20 others. · View Herald TranscriptNov 22 2021, 4:15 AM

• pcwang-thead requested review of this revision.Nov 22 2021, 4:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 4:15 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

lebedev.ri added a subscriber: lebedev.ri.Nov 22 2021, 4:17 AM

lebedev.ri added inline comments.

llvm/lib/CodeGen/MachineCSE.cpp
914–916	You can not modify global variables like that

Amend commit message.

• pcwang-thead edited the summary of this revision. (Show Details)Nov 22 2021, 4:21 AM

Do not modify global variable EnableGlobalCSE.

• pcwang-thead added reviewers: lkail, anton-afanasyev.Nov 22 2021, 4:36 AM

• pcwang-thead added a reviewer: lebedev.ri.

Global is not a good name here. MachineCSE is working on the whole function, which is global in compiler's terminology, by walking through the DominatorTree.

llvm/lib/CodeGen/MachineCSE.cpp
463	Why only apply to this heuristics? Since your intention is reducing size, why not always consider profitable if `hasOptSize`?

jrtc27 added inline comments.Nov 22 2021, 5:08 AM

llvm/lib/CodeGen/MachineCSE.cpp
55	Putting this behaviour behind a cl::opt is a great way to ensure it's never used...
llvm/test/CodeGen/RISCV/enable-global-cse.ll
1 ↗	(On Diff #388880)	This should be a MIR test that runs just MachineCSE

In D114361#3146082, @lkail wrote:

Global is not a good name here. MachineCSE is working on the whole function, which is global in compiler's terminology, by walking through the DominatorTree.

Can we name it greedy?

llvm/lib/CodeGen/MachineCSE.cpp
55	LOL. So is there a better way? Maybe we can add a target hook to enable this?
463	You are right, my thought were limited.
llvm/test/CodeGen/RISCV/enable-global-cse.ll
1 ↗	(On Diff #388880)	Thanks, I will update it later.

Harbormaster completed remote builds in B135400: Diff 388880.Nov 22 2021, 7:21 AM

Change global to aggressive.
Return true immediately if aggressive MachineCSE is enabled.
Change test case to MIR test.

• pcwang-thead edited the summary of this revision. (Show Details)Nov 23 2021, 12:16 AM

Added dmgreen since embeded-arm should be sensitive to size optimizations.

Thanks. What targets have you tested with this? And what kind of codesize differences have you observed?

Harbormaster completed remote builds in B135559: Diff 389104.Nov 23 2021, 2:06 AM

In D114361#3148156, @dmgreen wrote:

Thanks. What targets have you tested with this? And what kind of codesize differences have you observed?

RISCV.
(And it seems that I need to modify other targets' tests)

The differences(in scope of functions):

Some loads of immediates are redundant.
Some loads of global symbols are redundant.
etc.

These redundancies are in nonadjacent(non-local?) blocks , so they can't be eliminated according to Heuristics #1 in MachineCSE::isProfitableToCSE.

shchenz added a subscriber: shchenz.Nov 23 2021, 5:14 AM

shchenz added inline comments.

llvm/lib/CodeGen/MachineCSE.cpp
441	If the register pressure is increased, doing more CSEs may introduce register spill/reload and thus it will generate worse code even for optimization for size?
465	Can we estimate the register pressure here to do a more aggressive CSE? If so, we should not limit this only for "optimization for size".

RISCV.
(And it seems that I need to modify other targets' tests)

The differences(in scope of functions):

Some loads of immediates are redundant.

Some loads of global symbols are redundant.

etc.

These redundancies are in nonadjacent(non-local?) blocks , so they can't be eliminated according to Heuristics #1 in MachineCSE::isProfitableToCSE.

OK, that's a good start. I was expected something among the lines of "I have tested RISCV on the llvm test suite or some other large codebase under Oz and it reduced the total codesize by 0.16%".

My experiments on ARM and AArch64 are not as great. This seems to increase codesize more than it reduces it, especially on ARM. The AArch64 numbers were dominated by one large increase, with some of the smaller cases being smaller. I would be interested in what the tests in-tree showed too.

You might want to check X86 as it's easy to run. If I was making target independent changed like this I would expect to test at least a couple of architecture combos (say, X86 with Arm and AArch64 for 32bit and 64bit variants), and potentially add target overrides where needed. In this case the default should maybe be kept as before, unless we have some evidence this is beneficial across most architectures.

craig.topper added a subscriber: craig.topper.Nov 25 2021, 12:53 PM

craig.topper added inline comments.

llvm/test/CodeGen/RISCV/enable-agressive-machine-cse.mir
1 ↗	(On Diff #389104)	Please use update_mir_test_checks.py

Address comments.
Only apply aggressive CSE to Heuristics 1.
Make enableAggressiveMachineCSE return false by default.
Remove RISCV MIR test.

lebedev.ri resigned from this revision.Nov 26 2021, 2:01 AM

Harbormaster completed remote builds in B136168: Diff 389948.Nov 26 2021, 2:53 AM

OK, that's a good start. I was expected something among the lines of "I have tested RISCV on the llvm test suite or some other large codebase under Oz and it reduced the total codesize by 0.16%".

My experiments on ARM and AArch64 are not as great. This seems to increase codesize more than it reduces it, especially on ARM. The AArch64 numbers were dominated by one large increase, with some of the smaller cases being smaller. I would be interested in what the tests in-tree showed too.

You might want to check X86 as it's easy to run. If I was making target independent changed like this I would expect to test at least a couple of architecture combos (say, X86 with Arm and AArch64 for 32bit and 64bit variants), and potentially add target overrides where needed. In this case the default should maybe be kept as before, unless we have some evidence this is beneficial across most architectures.

Thank you for your nice advice.

I have tested RISCV on SPECINT 2006 under Oz, here is the result:

                code size
400.perlbench    +0.438%
401.bzip2        0%
403.gcc          -1.128%
429.mcf          0%
445.gobmk        -0.221%
456.hmmer        -1.682%
458.sjeng        0%
462.libquantum   0%
464.h264ref      -0.858%
471.omnetpp      -0.616%
473.astar        0%

perlbench got increased code size.

The result may not be convincing with outdated benchmarks, so I tested it on OpenCV codebase.

Most of executable files and libraries had no code size change, while some large files got smaller, like:

opencv_perf_imgproc  -0.069%
opencv_perf_video    -0.288%
opencv_test_calib3d  -0.407%
opencv_test_core     -0.249%
opencv_test_dnn      -0.182%
opencv_test_imgproc  -0.246%
libopencv_imgproc.so -0.247%
……

Besides, third-party libraries used by OpenCV(like libquirc, libwebp, libjpeg-turbo, libtiff, etc.) got smaller code size.
Some small examples of OpenCV increased a few bytes, as a result of increment of register pressure.

I have made aggressive MachineCSE disabled by default, targets may override it if it's profitable.

In fact, I think this work-around can be more elegant via live intervals analysis as @shchenz said. At least, we should do CSE on Extended Basic Blocks instead of local or adjacent blocks.

llvm/lib/CodeGen/MachineCSE.cpp
441	Yes, you are right. `AggressiveMachineCSE` should be placed after `MayIncreasePressure`.
465	Absolutely! IMO, the key point is that we should do some live range analysis here?

ping

craig.topper added inline comments.Dec 15 2021, 9:35 PM

llvm/lib/CodeGen/MachineCSE.cpp
463	Can this info be cached from runOnMachineFunction? No need to make a virtual call for something that wont' change per instruction.

Rebase.
Address comment.

• pcwang-thead marked an inline comment as done.Dec 15 2021, 10:31 PM

Harbormaster completed remote builds in B139575: Diff 394746.Dec 15 2021, 11:31 PM

• pcwang-thead abandoned this revision.Jun 12 2023, 1:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2023, 1:13 AM

Herald added a subscriber: StephenFan. · View Herald Transcript

Diff 394746

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines	public:
/// Return true if the instruction is as cheap as a move instruction.		/// Return true if the instruction is as cheap as a move instruction.
///		///
/// Targets for different archs need to override this, and different		/// Targets for different archs need to override this, and different
/// micro-architectures can also be finely tuned inside.		/// micro-architectures can also be finely tuned inside.
virtual bool isAsCheapAsAMove(const MachineInstr &MI) const {		virtual bool isAsCheapAsAMove(const MachineInstr &MI) const {
return MI.isAsCheapAsAMove();		return MI.isAsCheapAsAMove();
}		}

		/// Return true if we want to do aggressive MachineCSE.
		///
		/// Aggressive MachineCSE can be enabled when optimizing for size.
		virtual bool enableAggressiveMachineCSE(const MachineFunction &MF) const {
		return false;
		}

/// Return true if the instruction should be sunk by MachineSink.		/// Return true if the instruction should be sunk by MachineSink.
///		///
/// MachineSink determines on its own whether the instruction is safe to sink;		/// MachineSink determines on its own whether the instruction is safe to sink;
/// this gives the target a hook to override the default behavior with regards		/// this gives the target a hook to override the default behavior with regards
/// to which instructions should be sunk.		/// to which instructions should be sunk.
virtual bool shouldSink(const MachineInstr &MI) const { return true; }		virtual bool shouldSink(const MachineInstr &MI) const { return true; }

/// Re-issue the specified 'original' instruction at the		/// Re-issue the specified 'original' instruction at the
▲ Show 20 Lines • Show All 1,651 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineCSE.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
#include <iterator>		#include <iterator>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "machine-cse"		#define DEBUG_TYPE "machine-cse"

		static cl::opt<bool> EnableAggressiveMachineCSE(
		"enable-aggressive-machine-cse", cl::Hidden, cl::init(false),
		jrtc27Unsubmitted Done Reply Inline Actions Putting this behaviour behind a cl::opt is a great way to ensure it's never used... jrtc27: Putting this behaviour behind a cl::opt is a great way to ensure it's never used...
		pcwang-theadAuthorUnsubmitted Done Reply Inline Actions LOL. So is there a better way? Maybe we can add a target hook to enable this? pcwang-thead: LOL. So is there a better way? Maybe we can add a target hook to enable this?
		cl::desc("Enable aggressive machine CSE on the whole function."));

STATISTIC(NumCoalesces, "Number of copies coalesced");		STATISTIC(NumCoalesces, "Number of copies coalesced");
STATISTIC(NumCSEs, "Number of common subexpression eliminated");		STATISTIC(NumCSEs, "Number of common subexpression eliminated");
STATISTIC(NumPREs, "Number of partial redundant expression"		STATISTIC(NumPREs, "Number of partial redundant expression"
" transformed to fully redundant");		" transformed to fully redundant");
STATISTIC(NumPhysCSEs,		STATISTIC(NumPhysCSEs,
"Number of physreg referencing common subexpr eliminated");		"Number of physreg referencing common subexpr eliminated");
STATISTIC(NumCrossBBCSEs,		STATISTIC(NumCrossBBCSEs,
"Number of cross-MBB physreg referencing CS eliminated");		"Number of cross-MBB physreg referencing CS eliminated");
STATISTIC(NumCommutes, "Number of copies coalesced after commuting");		STATISTIC(NumCommutes, "Number of copies coalesced after commuting");

namespace {		namespace {

class MachineCSE : public MachineFunctionPass {		class MachineCSE : public MachineFunctionPass {
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
AliasAnalysis *AA;		AliasAnalysis *AA;
MachineDominatorTree *DT;		MachineDominatorTree *DT;
MachineRegisterInfo *MRI;		MachineRegisterInfo *MRI;
MachineBlockFrequencyInfo *MBFI;		MachineBlockFrequencyInfo *MBFI;
		bool AggressiveMachineCSE;

public:		public:
static char ID; // Pass identification		static char ID; // Pass identification

MachineCSE() : MachineFunctionPass(ID) {		MachineCSE() : MachineFunctionPass(ID) {
initializeMachineCSEPass(*PassRegistry::getPassRegistry());		initializeMachineCSEPass(*PassRegistry::getPassRegistry());
}		}

▲ Show 20 Lines • Show All 347 Lines • ▼ Show 20 Lines

/// isProfitableToCSE - Return true if it's profitable to eliminate MI with a		/// isProfitableToCSE - Return true if it's profitable to eliminate MI with a
/// common expression that defines Reg. CSBB is basic block where CSReg is		/// common expression that defines Reg. CSBB is basic block where CSReg is
/// defined.		/// defined.
bool MachineCSE::isProfitableToCSE(Register CSReg, Register Reg,		bool MachineCSE::isProfitableToCSE(Register CSReg, Register Reg,
MachineBasicBlock CSBB, MachineInstr MI) {		MachineBasicBlock CSBB, MachineInstr MI) {
// FIXME: Heuristics that works around the lack the live range splitting.		// FIXME: Heuristics that works around the lack the live range splitting.

// If CSReg is used at all uses of Reg, CSE should not increase register		// If CSReg is used at all uses of Reg, CSE should not increase register
		shchenzUnsubmitted Done Reply Inline Actions If the register pressure is increased, doing more CSEs may introduce register spill/reload and thus it will generate worse code even for optimization for size? shchenz: If the register pressure is increased, doing more CSEs may introduce register spill/reload and…
		pcwang-theadAuthorUnsubmitted Done Reply Inline Actions Yes, you are right. `AggressiveMachineCSE` should be placed after `MayIncreasePressure`. pcwang-thead: Yes, you are right. `AggressiveMachineCSE` should be placed after `MayIncreasePressure`.
// pressure of CSReg.		// pressure of CSReg.
bool MayIncreasePressure = true;		bool MayIncreasePressure = true;
if (Register::isVirtualRegister(CSReg) && Register::isVirtualRegister(Reg)) {		if (Register::isVirtualRegister(CSReg) && Register::isVirtualRegister(Reg)) {
MayIncreasePressure = false;		MayIncreasePressure = false;
SmallPtrSet<MachineInstr*, 8> CSUses;		SmallPtrSet<MachineInstr*, 8> CSUses;
for (MachineInstr &MI : MRI->use_nodbg_instructions(CSReg)) {		for (MachineInstr &MI : MRI->use_nodbg_instructions(CSReg)) {
CSUses.insert(&MI);		CSUses.insert(&MI);
}		}
for (MachineInstr &MI : MRI->use_nodbg_instructions(Reg)) {		for (MachineInstr &MI : MRI->use_nodbg_instructions(Reg)) {
if (!CSUses.count(&MI)) {		if (!CSUses.count(&MI)) {
MayIncreasePressure = true;		MayIncreasePressure = true;
break;		break;
}		}
}		}
}		}
if (!MayIncreasePressure) return true;		if (!MayIncreasePressure) return true;

// Heuristics #1: Don't CSE "cheap" computation if the def is not local or in		// Heuristics #1: Don't CSE "cheap" computation if the def is not local or in
// an immediate predecessor. We don't want to increase register pressure and		// an immediate predecessor. We don't want to increase register pressure and
// end up causing other computation to be spilled.		// end up causing other computation to be spilled.
if (TII->isAsCheapAsAMove(*MI)) {		if (TII->isAsCheapAsAMove(*MI) && !AggressiveMachineCSE) {
MachineBasicBlock *BB = MI->getParent();		MachineBasicBlock *BB = MI->getParent();
		lkailUnsubmitted Done Reply Inline Actions Why only apply to this heuristics? Since your intention is reducing size, why not always consider profitable if `hasOptSize`? lkail: Why only apply to this heuristics? Since your intention is reducing size, why not always…
		pcwang-theadAuthorUnsubmitted Done Reply Inline Actions You are right, my thought were limited. pcwang-thead: You are right, my thought were limited.
		craig.topperUnsubmitted Done Reply Inline Actions Can this info be cached from runOnMachineFunction? No need to make a virtual call for something that wont' change per instruction. craig.topper: Can this info be cached from runOnMachineFunction? No need to make a virtual call for something…
if (CSBB != BB && !CSBB->isSuccessor(BB))		if (CSBB != BB && !CSBB->isSuccessor(BB))
return false;		return false;
		shchenzUnsubmitted Done Reply Inline Actions Can we estimate the register pressure here to do a more aggressive CSE? If so, we should not limit this only for "optimization for size". shchenz: Can we estimate the register pressure here to do a more aggressive CSE? If so, we should not…
		pcwang-theadAuthorUnsubmitted Done Reply Inline Actions Absolutely! IMO, the key point is that we should do some live range analysis here? pcwang-thead: Absolutely! IMO, the key point is that we should do some live range analysis here?
}		}

// Heuristics #2: If the expression doesn't not use a vr and the only use		// Heuristics #2: If the expression doesn't not use a vr and the only use
// of the redundant computation are copies, do not cse.		// of the redundant computation are copies, do not cse.
bool HasVRegUse = false;		bool HasVRegUse = false;
for (const MachineOperand &MO : MI->operands()) {		for (const MachineOperand &MO : MI->operands()) {
if (MO.isReg() && MO.isUse() && Register::isVirtualRegister(MO.getReg())) {		if (MO.isReg() && MO.isUse() && Register::isVirtualRegister(MO.getReg())) {
HasVRegUse = true;		HasVRegUse = true;
▲ Show 20 Lines • Show All 432 Lines • ▼ Show 20 Lines	bool MachineCSE::isProfitableToHoistInto(MachineBasicBlock *CandidateBB,
return MBFI->getBlockFreq(CandidateBB) <=		return MBFI->getBlockFreq(CandidateBB) <=
MBFI->getBlockFreq(MBB) + MBFI->getBlockFreq(MBB1);		MBFI->getBlockFreq(MBB) + MBFI->getBlockFreq(MBB1);
}		}

bool MachineCSE::runOnMachineFunction(MachineFunction &MF) {		bool MachineCSE::runOnMachineFunction(MachineFunction &MF) {
if (skipFunction(MF.getFunction()))		if (skipFunction(MF.getFunction()))
return false;		return false;

TII = MF.getSubtarget().getInstrInfo();		TII = MF.getSubtarget().getInstrInfo();
TRI = MF.getSubtarget().getRegisterInfo();		TRI = MF.getSubtarget().getRegisterInfo();
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
		lebedev.riUnsubmitted Done Reply Inline Actions You can not modify global variables like that lebedev.ri: You can not modify global variables like that
AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
DT = &getAnalysis<MachineDominatorTree>();		DT = &getAnalysis<MachineDominatorTree>();
MBFI = &getAnalysis<MachineBlockFrequencyInfo>();		MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
LookAheadLimit = TII->getMachineCSELookAheadLimit();		LookAheadLimit = TII->getMachineCSELookAheadLimit();
		AggressiveMachineCSE =
		EnableAggressiveMachineCSE \|\| TII->enableAggressiveMachineCSE(MF);

bool ChangedPRE, ChangedCSE;		bool ChangedPRE, ChangedCSE;
ChangedPRE = PerformSimplePRE(DT);		ChangedPRE = PerformSimplePRE(DT);
ChangedCSE = PerformCSE(DT->getRootNode());		ChangedCSE = PerformCSE(DT->getRootNode());
return ChangedPRE \|\| ChangedCSE;		return ChangedPRE \|\| ChangedCSE;
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCSE] Add an option to enable global CSE
AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 394746

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/lib/CodeGen/MachineCSE.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[MachineCSE] Add an option to enable global CSEAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 394746

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/lib/CodeGen/MachineCSE.cpp

[MachineCSE] Add an option to enable global CSE
AbandonedPublic