This is an archive of the discontinued LLVM Phabricator instance.

Differential D20276

[MBP] Reduce code size by running tail merging in MBP
ClosedPublic

Authored by haicheng on May 15 2016, 2:22 PM.

Download Raw Diff

Details

Reviewers

deadalnix
mssimpso
mcrosier

Commits

rG77ea344786ab: [MBP] Reduce code size by running tail merging in MBP.
rL271925: [MBP] Reduce code size by running tail merging in MBP.

Summary

The code layout that TailMerging (inside BranchFolding) works on is not the final layout optimized based on the branch probability. Generally, after BlockPlacement, many new merging opportunities emerge. My motivation example (also the test case) is like this in ARM assembly

     b       L5
L1:
     mov     w9, w8
     b       L5
L2:
     mov     w9, w8
     b       L5
L3:
     mov     w9, w8
     b       L5
L4:
     mov     w9, w8
L5:
      ldr     x8, [x21,#624]

L1-L5 can only be branched into. The example can be reduced to

     b       L5
L4:
     mov     w9, w8
L5:
   ldr     x8, [x21,#624]

The predecessors of L1-L4 now all branch into L4. Branch Folding should be able to simplify the code in the tail-merge phase, but it fails. In this example, the tiny MBBs (L1-L4) in the Branch Folding pass are at the places where they are fallthroughs of their individual predecessors. Merging L1-L4 in the Branch Folding requires inserting extra unconditional branches which makes Tail Merging give up. After MBP, L1-L4 are no long fallthroughs and can be easily merged as shown in the example.

This patch calls Tail Merging when it finds MBP changes the branches and calls MBP again if Tail Merging merges anything. Tail merging updates MachineLoopInfo and MachineBlockFreqInfo so that MBP can use them later. The table below shows the number of instructions removed and the impact to the performance in a AArch64 device (plus is improvement) when running SPEC2006.

	perf (%)	static instruction count
INT
astar	-0.49	-7
bzip2	0.40	-110
gcc	-0.11	-13,006
gobmk	1.48	-1,716
h264ref	0.47	-684
hmmer	-0.32	-391
libquantum	0.90	-4
mcf	-0.14	-4
omnetpp	-0.58	-1,980
perlbench	1.53	-4,176
sjeng	-0.77	-338
xalancbmk	-0.55	-4,183
FLOAT
soplex	-0.22	-395
dealII	0.34	-186
milc	-0.16	-34
namd	-0.18	-104
povray	2.07	-1,785
sphinx3	-0.11	-112

This patch also depends on three other trivial patches: D20177 (make it possible to optimize the branch directions after tail merging), D20184 (make it possible to use updated MachineBlockFreqInfo in MBP), and D19955 (make it possible to know the branches are updated by MBP or not)

One test case (arm-and-tst-peephole.ll) is slightly rewritten by reversing one branch condition. The probability of both branch directions are 50/50 so I think the modification is okay.

Diff Detail

Repository: rL LLVM

Event Timeline

haicheng updated this revision to Diff 57307.May 15 2016, 2:22 PM

haicheng retitled this revision from to [MBP] Reduce code size by running tail merging in MBP.

haicheng updated this object.

haicheng set the repository for this revision to rL LLVM.

Herald added subscribers: mcrosier, aemerson. · View Herald TranscriptMay 15 2016, 2:22 PM

haicheng added parent revisions: D20184: [BranchFolding] Replace MachineBlockFrequencyInfo with MBFIWrapper. NFC., D20177: [MBP] Factor out the optimizations on branch conditions and unanalyzable branches. NFCI. , D19955: [MBB] Let MachineBasicBlock::updateTerminator() return if it updates the terminator or not. NFC..May 15 2016, 2:26 PM

haicheng updated this object.May 15 2016, 2:42 PM

haicheng added reviewers: gberry, mssimpso, mcrosier.

haicheng added subscribers: hfinkel, iteratee, deadalnix and 4 others.

I've noticed this bad pattern in my code, thanks for tackling this.

lib/CodeGen/MachineBlockPlacement.cpp
1350 ↗	(On Diff #57307)	This is a bit confusing as this can return false in cases where basic blocks are scrambled but no branch was updated. Not that it is incorrect, but it goes against conventions. Also are we sure that this is the only cases that can crate opportunity for the optimization to take place ? Have you tried to just run the optimization every time ? I'd be interested to know if there is any changes.
1467 ↗	(On Diff #57307)	Maybe you could use a unique pointer here.
1479 ↗	(On Diff #57307)	Out of curiosity, do you have example of hardware doing this ?
1496 ↗	(On Diff #57307)	Worth looping maybe ?
1518 ↗	(On Diff #57307)	Remove once you use unique ptr.

majnemer added a subscriber: majnemer.May 16 2016, 9:26 AM

majnemer added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
1479 ↗	(On Diff #57307)	WebAssembly IIRC.

haicheng marked an inline comment as done.May 16 2016, 12:55 PM

haicheng added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
1479 ↗	(On Diff #57307)	I know old AMD GPU had this requirement. Its IL has instructions like "if_xxx" and "endif".

sunfish added a subscriber: sunfish.May 16 2016, 1:23 PM

sunfish added inline comments.

lib/CodeGen/MachineBlockPlacement.cpp
1479 ↗	(On Diff #57307)	WebAssembly has flexible control flow and is ok with tail merging and most other optimizations, so we're no longer using the requiresStructuredCFG() hook.

haicheng added inline comments.May 18 2016, 2:54 PM

lib/CodeGen/MachineBlockPlacement.cpp
1350 ↗	(On Diff #57307)	Thank you very much, Amaury. I did some research about this. In short, running the optimization every time can capture more cases and I think I will do it in my next version. Below is the detail of my findings Running the optimization every time captures five more cases in spec2000/2006. The same reason behind these cases is that the CFG is changed after the last time we call BranchFolding. These new opportunities exist before we start MBP. You are right that it is possible to change the layout without changing any branch. I managed to create such a case, but there is no new tail merging opportunity in this case. On average across spec2006, 74% of MFs need to update at least one branch to change the layout. The rest either does not need to change the layout or can change the layout without modifying any branch. I think updateTerminator() can capture all the changes of fallthrough MBBs which is the only thing interesting to me. In D19955, if updateTerminator() finds its layout successor is inconsistent with the current branch, it always returns true. So, if running tail merging every time is too much, my current approach can still find all the cases caused by the MBP (I definitely need to make some changes to comply with the convention). I will look into your looping suggestion as next.

Removing myself as others seem to be handling review

Post Escha's comment.

lib/CodeGen/MachineBlockPlacement.cpp
1479 ↗	(On Diff #57307)	Ours has that requirement. We don’t generally allow any CFG modification after structurization, which takes place in the pre-ISel phase. Generic block placement is fine though. —escha

haicheng added inline comments.May 24 2016, 11:46 AM

lib/CodeGen/MachineBlockPlacement.cpp
1496 ↗	(On Diff #57307)	I did some research on spec2006 about looping. Nine benchmarks can further remove instructions. gcc, omnetpp, and perlbench can remove 1000+ instructions. 83.5% MFs across spec2006 only need one iteration, 15.3% MFs need two iterations. The largest iteration number is five which is needed by 0.03% MFs. It seems promising to pursue, but I think I will do the looping in the next patch after this one stays in the tree for a while.

Officially adding Amaury as a review.

Address Amaury's comments. Rebase, use unique_ptr, always call tail merging.

haicheng removed a parent revision: D19955: [MBB] Let MachineBasicBlock::updateTerminator() return if it updates the terminator or not. NFC..May 25 2016, 1:52 PM

haicheng mentioned this in D19955: [MBB] Let MachineBasicBlock::updateTerminator() return if it updates the terminator or not. NFC..May 25 2016, 3:25 PM

Looks to be in good shape.

test/CodeGen/AArch64/tailmerging_in_mbp.ll
1 ↗	(On Diff #58498)	Please add a space before RUN.

This revision is now accepted and ready to land.Jun 3 2016, 1:23 PM

deadalnix added inline comments.Jun 5 2016, 1:24 PM

lib/CodeGen/MachineBlockPlacement.cpp
1360 ↗	(On Diff #58498)	Alright, that is some very good findings. I don't mind if this is taken care of now or in subsequent diffs, but it seems like there is some more juice to squeeze on that one.

Closed by commit rL271925: [MBP] Reduce code size by running tail merging in MBP. (authored by haicheng). · Explain WhyJun 6 2016, 11:42 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

BranchFolding.h

12 lines

BranchFolding.cpp

66 lines

MachineBlockPlacement.cpp

39 lines

test/

CodeGen/

AArch64/

tailmerging_in_mbp.ll

63 lines

ARM/

arm-and-tst-peephole.ll

2 lines

Diff 59756

llvm/trunk/lib/CodeGen/BranchFolding.h

Show All 14 Lines
#include "llvm/Support/BlockFrequency.h"		#include "llvm/Support/BlockFrequency.h"
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {
class MachineBlockFrequencyInfo;		class MachineBlockFrequencyInfo;
class MachineBranchProbabilityInfo;		class MachineBranchProbabilityInfo;
class MachineFunction;		class MachineFunction;
class MachineModuleInfo;		class MachineModuleInfo;
		class MachineLoopInfo;
class RegScavenger;		class RegScavenger;
class TargetInstrInfo;		class TargetInstrInfo;
class TargetRegisterInfo;		class TargetRegisterInfo;

class LLVM_LIBRARY_VISIBILITY BranchFolder {		class LLVM_LIBRARY_VISIBILITY BranchFolder {
public:		public:
class MBFIWrapper;		class MBFIWrapper;

explicit BranchFolder(bool defaultEnableTailMerge, bool CommonHoist,		explicit BranchFolder(bool defaultEnableTailMerge, bool CommonHoist,
MBFIWrapper &MBFI,		MBFIWrapper &MBFI,
const MachineBranchProbabilityInfo &MBPI);		const MachineBranchProbabilityInfo &MBPI);

bool OptimizeFunction(MachineFunction &MF,		bool OptimizeFunction(MachineFunction &MF, const TargetInstrInfo *tii,
const TargetInstrInfo *tii,		const TargetRegisterInfo tri, MachineModuleInfo mmi,
const TargetRegisterInfo *tri,		MachineLoopInfo *mli = nullptr,
MachineModuleInfo *mmi);		bool AfterPlacement = false);

private:		private:
class MergePotentialsElt {		class MergePotentialsElt {
unsigned Hash;		unsigned Hash;
MachineBasicBlock *Block;		MachineBasicBlock *Block;
public:		public:
MergePotentialsElt(unsigned h, MachineBasicBlock *b)		MergePotentialsElt(unsigned h, MachineBasicBlock *b)
: Hash(h), Block(b) {}		: Hash(h), Block(b) {}

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	public:
getMergePotentialsElt().setBlock(MBB);		getMergePotentialsElt().setBlock(MBB);
}		}
void setTailStartPos(MachineBasicBlock::iterator Pos) {		void setTailStartPos(MachineBasicBlock::iterator Pos) {
TailStartPos = Pos;		TailStartPos = Pos;
}		}
};		};
std::vector<SameTailElt> SameTails;		std::vector<SameTailElt> SameTails;

		bool AfterBlockPlacement;
bool EnableTailMerge;		bool EnableTailMerge;
bool EnableHoistCommonCode;		bool EnableHoistCommonCode;
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
MachineModuleInfo *MMI;		MachineModuleInfo *MMI;
		MachineLoopInfo *MLI;
RegScavenger *RS;		RegScavenger *RS;

public:		public:
/// \brief This class keeps track of branch frequencies of newly created		/// \brief This class keeps track of branch frequencies of newly created
/// blocks and tail-merged blocks.		/// blocks and tail-merged blocks.
class MBFIWrapper {		class MBFIWrapper {
public:		public:
MBFIWrapper(const MachineBlockFrequencyInfo &I) : MBFI(I) {}		MBFIWrapper(const MachineBlockFrequencyInfo &I) : MBFI(I) {}
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/BranchFolding.cpp

Show All 21 Lines
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/Analysis.h"		#include "llvm/CodeGen/Analysis.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"		#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"		#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	while (!MBB->succ_empty())
MBB->removeSuccessor(MBB->succ_end()-1);		MBB->removeSuccessor(MBB->succ_end()-1);

// Avoid matching if this pointer gets reused.		// Avoid matching if this pointer gets reused.
TriedMerging.erase(MBB);		TriedMerging.erase(MBB);

// Remove the block.		// Remove the block.
MF->erase(MBB);		MF->erase(MBB);
FuncletMembership.erase(MBB);		FuncletMembership.erase(MBB);
		if (MLI)
		MLI->removeBlock(MBB);
}		}

/// OptimizeImpDefsBlock - If a basic block is just a bunch of implicit_def		/// OptimizeImpDefsBlock - If a basic block is just a bunch of implicit_def
/// followed by terminators, and if the implicitly defined registers are not		/// followed by terminators, and if the implicitly defined registers are not
/// used by the terminators, remove those implicit_def's. e.g.		/// used by the terminators, remove those implicit_def's. e.g.
/// BB1:		/// BB1:
/// r0 = implicit_def		/// r0 = implicit_def
/// r1 = implicit_def		/// r1 = implicit_def
Show All 40 Lines	while (I != FirstTerm) {
++I;		++I;
MBB->erase(ImpDefMI);		MBB->erase(ImpDefMI);
}		}

return true;		return true;
}		}

/// OptimizeFunction - Perhaps branch folding, tail merging and other		/// OptimizeFunction - Perhaps branch folding, tail merging and other
/// CFG optimizations on the given function.		/// CFG optimizations on the given function. Block placement changes the layout
		/// and may create new tail merging opportunities.
bool BranchFolder::OptimizeFunction(MachineFunction &MF,		bool BranchFolder::OptimizeFunction(MachineFunction &MF,
const TargetInstrInfo *tii,		const TargetInstrInfo *tii,
const TargetRegisterInfo *tri,		const TargetRegisterInfo *tri,
MachineModuleInfo *mmi) {		MachineModuleInfo *mmi,
		MachineLoopInfo *mli, bool AfterPlacement) {
if (!tii) return false;		if (!tii) return false;

TriedMerging.clear();		TriedMerging.clear();

		AfterBlockPlacement = AfterPlacement;
TII = tii;		TII = tii;
TRI = tri;		TRI = tri;
MMI = mmi;		MMI = mmi;
		MLI = mli;
RS = nullptr;		RS = nullptr;

// Use a RegScavenger to help update liveness when required.		// Use a RegScavenger to help update liveness when required.
MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
if (MRI.tracksLiveness() && TRI->trackLivenessAfterRegAlloc(MF))		if (MRI.tracksLiveness() && TRI->trackLivenessAfterRegAlloc(MF))
RS = new RegScavenger();		RS = new RegScavenger();
else		else
MRI.invalidateLiveness();		MRI.invalidateLiveness();
Show All 9 Lines	bool BranchFolder::OptimizeFunction(MachineFunction &MF,
}		}

// Recalculate funclet membership.		// Recalculate funclet membership.
FuncletMembership = getFuncletMembership(MF);		FuncletMembership = getFuncletMembership(MF);

bool MadeChangeThisIteration = true;		bool MadeChangeThisIteration = true;
while (MadeChangeThisIteration) {		while (MadeChangeThisIteration) {
MadeChangeThisIteration = TailMergeBlocks(MF);		MadeChangeThisIteration = TailMergeBlocks(MF);
		// No need to clean up if tail merging does not change anything after the
		// block placement.
		if (!AfterBlockPlacement \|\| MadeChangeThisIteration)
MadeChangeThisIteration \|= OptimizeBranches(MF);		MadeChangeThisIteration \|= OptimizeBranches(MF);
if (EnableHoistCommonCode)		if (EnableHoistCommonCode)
MadeChangeThisIteration \|= HoistCommonCode(MF);		MadeChangeThisIteration \|= HoistCommonCode(MF);
MadeChange \|= MadeChangeThisIteration;		MadeChange \|= MadeChangeThisIteration;
}		}

// See if any jump tables have become dead as the code generator		// See if any jump tables have become dead as the code generator
// did its thing.		// did its thing.
MachineJumpTableInfo *JTI = MF.getJumpTableInfo();		MachineJumpTableInfo *JTI = MF.getJumpTableInfo();
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	MachineBasicBlock *BranchFolder::SplitMBBAt(MachineBasicBlock &CurMBB,
NewMBB->transferSuccessors(&CurMBB);		NewMBB->transferSuccessors(&CurMBB);

// Add an edge from CurMBB to NewMBB for the fall-through.		// Add an edge from CurMBB to NewMBB for the fall-through.
CurMBB.addSuccessor(NewMBB);		CurMBB.addSuccessor(NewMBB);

// Splice the code over.		// Splice the code over.
NewMBB->splice(NewMBB->end(), &CurMBB, BBI1, CurMBB.end());		NewMBB->splice(NewMBB->end(), &CurMBB, BBI1, CurMBB.end());

		// NewMBB belongs to the same loop as CurMBB.
		if (MLI)
		if (MachineLoop *ML = MLI->getLoopFor(&CurMBB))
		ML->addBasicBlockToLoop(NewMBB, MLI->getBase());

// NewMBB inherits CurMBB's block frequency.		// NewMBB inherits CurMBB's block frequency.
MBBFreqInfo.setBlockFreq(NewMBB, MBBFreqInfo.getBlockFreq(&CurMBB));		MBBFreqInfo.setBlockFreq(NewMBB, MBBFreqInfo.getBlockFreq(&CurMBB));

// For targets that use the register scavenger, we must maintain LiveIns.		// For targets that use the register scavenger, we must maintain LiveIns.
MaintainLiveIns(&CurMBB, NewMBB);		MaintainLiveIns(&CurMBB, NewMBB);

// Add the new block to the funclet.		// Add the new block to the funclet.
const auto &FuncletI = FuncletMembership.find(&CurMBB);		const auto &FuncletI = FuncletMembership.find(&CurMBB);
▲ Show 20 Lines • Show All 471 Lines • ▼ Show 20 Lines	bool BranchFolder::TryTailMergeBlocks(MachineBasicBlock *SuccBB,
return MadeChange;		return MadeChange;
}		}

bool BranchFolder::TailMergeBlocks(MachineFunction &MF) {		bool BranchFolder::TailMergeBlocks(MachineFunction &MF) {
bool MadeChange = false;		bool MadeChange = false;
if (!EnableTailMerge) return MadeChange;		if (!EnableTailMerge) return MadeChange;

// First find blocks with no successors.		// First find blocks with no successors.
		// Block placement does not create new tail merging opportunities for these
		// blocks.
		if (!AfterBlockPlacement) {
MergePotentials.clear();		MergePotentials.clear();
for (MachineBasicBlock &MBB : MF) {		for (MachineBasicBlock &MBB : MF) {
if (MergePotentials.size() == TailMergeThreshold)		if (MergePotentials.size() == TailMergeThreshold)
break;		break;
if (!TriedMerging.count(&MBB) && MBB.succ_empty())		if (!TriedMerging.count(&MBB) && MBB.succ_empty())
MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB), &MBB));		MergePotentials.push_back(MergePotentialsElt(HashEndOfMBB(MBB), &MBB));
}		}

// If this is a large problem, avoid visiting the same basic blocks		// If this is a large problem, avoid visiting the same basic blocks
// multiple times.		// multiple times.
if (MergePotentials.size() == TailMergeThreshold)		if (MergePotentials.size() == TailMergeThreshold)
for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)		for (unsigned i = 0, e = MergePotentials.size(); i != e; ++i)
TriedMerging.insert(MergePotentials[i].getBlock());		TriedMerging.insert(MergePotentials[i].getBlock());

// See if we can do any tail merging on those.		// See if we can do any tail merging on those.
if (MergePotentials.size() >= 2)		if (MergePotentials.size() >= 2)
MadeChange \|= TryTailMergeBlocks(nullptr, nullptr);		MadeChange \|= TryTailMergeBlocks(nullptr, nullptr);
		}

// Look at blocks (IBB) with multiple predecessors (PBB).		// Look at blocks (IBB) with multiple predecessors (PBB).
// We change each predecessor to a canonical form, by		// We change each predecessor to a canonical form, by
// (1) temporarily removing any unconditional branch from the predecessor		// (1) temporarily removing any unconditional branch from the predecessor
// to IBB, and		// to IBB, and
// (2) alter conditional branches so they branch to the other block		// (2) alter conditional branches so they branch to the other block
// not IBB; this may require adding back an unconditional branch to IBB		// not IBB; this may require adding back an unconditional branch to IBB
// later, where there wasn't one coming in. E.g.		// later, where there wasn't one coming in. E.g.
Show All 30 Lines	for (MachineBasicBlock *PBB : I->predecessors()) {
// Visit each predecessor only once.		// Visit each predecessor only once.
if (!UniquePreds.insert(PBB).second)		if (!UniquePreds.insert(PBB).second)
continue;		continue;

// Skip blocks which may jump to a landing pad. Can't tail merge these.		// Skip blocks which may jump to a landing pad. Can't tail merge these.
if (PBB->hasEHPadSuccessor())		if (PBB->hasEHPadSuccessor())
continue;		continue;

		// Bail out if the loop header (IBB) is not the top of the loop chain
		// after the block placement. Otherwise, the common tail of IBB's
		// predecessors may become the loop top if block placement is called again
		// and the predecessors may branch to this common tail.
		// FIXME: Relaxed this check if the algorithm of finding loop top is
		// changed in MBP.
		if (AfterBlockPlacement && MLI)
		if (MachineLoop *ML = MLI->getLoopFor(IBB))
		if (IBB == ML->getHeader() && ML == MLI->getLoopFor(PBB))
		continue;

MachineBasicBlock TBB = nullptr, FBB = nullptr;		MachineBasicBlock TBB = nullptr, FBB = nullptr;
SmallVector<MachineOperand, 4> Cond;		SmallVector<MachineOperand, 4> Cond;
if (!TII->AnalyzeBranch(*PBB, TBB, FBB, Cond, true)) {		if (!TII->AnalyzeBranch(*PBB, TBB, FBB, Cond, true)) {
// Failing case: IBB is the target of a cbr, and we cannot reverse the		// Failing case: IBB is the target of a cbr, and we cannot reverse the
// branch.		// branch.
SmallVector<MachineOperand, 4> NewCond(Cond);		SmallVector<MachineOperand, 4> NewCond(Cond);
if (!Cond.empty() && TBB == IBB) {		if (!Cond.empty() && TBB == IBB) {
if (TII->ReverseBranchCondition(NewCond))		if (TII->ReverseBranchCondition(NewCond))
▲ Show 20 Lines • Show All 882 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp

Show All 20 Lines
// sequential chains where allowed by the CFG (or demanded by heavy		// sequential chains where allowed by the CFG (or demanded by heavy
// probabilities). Finally, it walks the blocks in topological order, and the		// probabilities). Finally, it walks the blocks in topological order, and the
// first time it reaches a chain of basic blocks, it schedules them in the		// first time it reaches a chain of basic blocks, it schedules them in the
// function in-order.		// function in-order.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
		#include "llvm/CodeGen/TargetPassConfig.h"
		#include "BranchFolding.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"		#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	cl::desc("Cost that models the probablistic risk of an instruction "
"misfetch due to a jump comparing to falling through, whose cost "		"misfetch due to a jump comparing to falling through, whose cost "
"is zero."),		"is zero."),
cl::init(1), cl::Hidden);		cl::init(1), cl::Hidden);

static cl::opt<unsigned> JumpInstCost("jump-inst-cost",		static cl::opt<unsigned> JumpInstCost("jump-inst-cost",
cl::desc("Cost of jump instructions."),		cl::desc("Cost of jump instructions."),
cl::init(1), cl::Hidden);		cl::init(1), cl::Hidden);

		static cl::opt<bool>
		BranchFoldPlacement("branch-fold-placement",
		cl::desc("Perform branch folding during placement. "
		"Reduces code size."),
		cl::init(true), cl::Hidden);

extern cl::opt<unsigned> StaticLikelyProb;		extern cl::opt<unsigned> StaticLikelyProb;

namespace {		namespace {
class BlockChain;		class BlockChain;
/// \brief Type for our function-wide basic block -> block chain mapping.		/// \brief Type for our function-wide basic block -> block chain mapping.
typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;		typedef DenseMap<MachineBasicBlock , BlockChain > BlockToChainMapType;
}		}

▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
class MachineBlockPlacement : public MachineFunctionPass {		class MachineBlockPlacement : public MachineFunctionPass {
/// \brief A typedef for a block filter set.		/// \brief A typedef for a block filter set.
typedef SmallPtrSet<MachineBasicBlock *, 16> BlockFilterSet;		typedef SmallPtrSet<MachineBasicBlock *, 16> BlockFilterSet;

/// \brief A handle to the branch probability pass.		/// \brief A handle to the branch probability pass.
const MachineBranchProbabilityInfo *MBPI;		const MachineBranchProbabilityInfo *MBPI;

/// \brief A handle to the function-wide block frequency pass.		/// \brief A handle to the function-wide block frequency pass.
const MachineBlockFrequencyInfo *MBFI;		std::unique_ptr<BranchFolder::MBFIWrapper> MBFI;

/// \brief A handle to the loop info.		/// \brief A handle to the loop info.
const MachineLoopInfo *MLI;		MachineLoopInfo *MLI;

/// \brief A handle to the target's instruction info.		/// \brief A handle to the target's instruction info.
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;

/// \brief A handle to the target's lowering info.		/// \brief A handle to the target's lowering info.
const TargetLoweringBase *TLI;		const TargetLoweringBase *TLI;

/// \brief A handle to the post dominator tree.		/// \brief A handle to the post dominator tree.
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	public:

bool runOnMachineFunction(MachineFunction &F) override;		bool runOnMachineFunction(MachineFunction &F) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<MachineBranchProbabilityInfo>();		AU.addRequired<MachineBranchProbabilityInfo>();
AU.addRequired<MachineBlockFrequencyInfo>();		AU.addRequired<MachineBlockFrequencyInfo>();
AU.addRequired<MachineDominatorTree>();		AU.addRequired<MachineDominatorTree>();
AU.addRequired<MachineLoopInfo>();		AU.addRequired<MachineLoopInfo>();
		AU.addRequired<TargetPassConfig>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};
}		}

char MachineBlockPlacement::ID = 0;		char MachineBlockPlacement::ID = 0;
char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID;		char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID;
INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement",		INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement",
▲ Show 20 Lines • Show All 1,123 Lines • ▼ Show 20 Lines	bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &F) {
if (skipFunction(*F.getFunction()))		if (skipFunction(*F.getFunction()))
return false;		return false;

// Check for single-block functions and skip them.		// Check for single-block functions and skip them.
if (std::next(F.begin()) == F.end())		if (std::next(F.begin()) == F.end())
return false;		return false;

MBPI = &getAnalysis<MachineBranchProbabilityInfo>();		MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
MBFI = &getAnalysis<MachineBlockFrequencyInfo>();		MBFI = llvm::make_unique<BranchFolder::MBFIWrapper>(
		getAnalysis<MachineBlockFrequencyInfo>());
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
TII = F.getSubtarget().getInstrInfo();		TII = F.getSubtarget().getInstrInfo();
TLI = F.getSubtarget().getTargetLowering();		TLI = F.getSubtarget().getTargetLowering();
MDT = &getAnalysis<MachineDominatorTree>();		MDT = &getAnalysis<MachineDominatorTree>();
assert(BlockToChain.empty());		assert(BlockToChain.empty());

buildCFGChains(F);		buildCFGChains(F);

		// Changing the layout can create new tail merging opportunities.
		TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>();
		// TailMerge can create jump into if branches that make CFG irreducible for
		// HW that requires structurized CFG.
		bool EnableTailMerge = !F.getTarget().requiresStructuredCFG() &&
		PassConfig->getEnableTailMerge() &&
		BranchFoldPlacement;
		// No tail merging opportunities if the block number is less than four.
		if (F.size() > 3 && EnableTailMerge) {
		BranchFolder BF(/EnableTailMerge=/true, /CommonHoist=/false, *MBFI,
		*MBPI);

		if (BF.OptimizeFunction(F, TII, F.getSubtarget().getRegisterInfo(),
		getAnalysisIfAvailable<MachineModuleInfo>(), MLI,
		/AfterBlockPlacement=/true)) {
		// Redo the layout if tail merging creates/removes/moves blocks.
		BlockToChain.clear();
		ChainAllocator.DestroyAll();
		buildCFGChains(F);
		}
		}

optimizeBranches(F);		optimizeBranches(F);
alignBlocks(F);		alignBlocks(F);

BlockToChain.clear();		BlockToChain.clear();
ChainAllocator.DestroyAll();		ChainAllocator.DestroyAll();

if (AlignAllBlock)		if (AlignAllBlock)
// Align all of the blocks in the function to a specific alignment.		// Align all of the blocks in the function to a specific alignment.
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/tailmerging_in_mbp.ll

				; RUN: llc <%s -march=aarch64 \| FileCheck %s

				; CHECK-LABEL: test:
				; CHECK: .LBB0_7
				; CHECK: b.hi .LBB0_2
				; CHECK-NEXT: b .LBB0_9
				; CHECK-NEXT: .LBB0_8
				; CHECK-NEXT: mov x8, x9
				; CHECK-NEXT: .LBB0_9
				define i64 @test(i64 %n, i64* %a, i64* %b, i64* %c, i64* %d, i64* %e, i64* %f) {
				entry:
				%cmp28 = icmp sgt i64 %n, 1
				br i1 %cmp28, label %for.body, label %for.end

				for.body: ; preds = %for.body.lr.ph, %if.end
				%j = phi i64 [ %n, %entry ], [ %div, %if.end ]
				%div = lshr i64 %j, 1
				%a.arrayidx = getelementptr inbounds i64, i64* %a, i64 %div
				%a.j = load i64, i64* %a.arrayidx
				%b.arrayidx = getelementptr inbounds i64, i64* %b, i64 %div
				%b.j = load i64, i64* %b.arrayidx
				%cmp.i = icmp slt i64 %a.j, %b.j
				br i1 %cmp.i, label %for.end.loopexit, label %cond.false.i

				cond.false.i: ; preds = %for.body
				%cmp4.i = icmp sgt i64 %a.j, %b.j
				br i1 %cmp4.i, label %if.end, label %cond.false6.i

				cond.false6.i: ; preds = %cond.false.i
				%c.arrayidx = getelementptr inbounds i64, i64* %c, i64 %div
				%c.j = load i64, i64* %c.arrayidx
				%d.arrayidx = getelementptr inbounds i64, i64* %d, i64 %div
				%d.j = load i64, i64* %d.arrayidx
				%cmp9.i = icmp slt i64 %c.j, %d.j
				br i1 %cmp9.i, label %for.end.loopexit, label %cond.false11.i

				cond.false11.i: ; preds = %cond.false6.i
				%cmp14.i = icmp sgt i64 %c.j, %d.j
				br i1 %cmp14.i, label %if.end, label %cond.false12.i

				cond.false12.i: ; preds = %cond.false11.i
				%e.arrayidx = getelementptr inbounds i64, i64* %e, i64 %div
				%e.j = load i64, i64* %e.arrayidx
				%f.arrayidx = getelementptr inbounds i64, i64* %f, i64 %div
				%f.j = load i64, i64* %f.arrayidx
				%cmp19.i = icmp sgt i64 %e.j, %f.j
				br i1 %cmp19.i, label %if.end, label %for.end.loopexit

				if.end: ; preds = %cond.false12.i, %cond.false11.i, %cond.false.i
				%cmp = icmp ugt i64 %j, 3
				br i1 %cmp, label %for.body, label %for.end.loopexit

				for.end.loopexit: ; preds = %cond.false12.i, %cond.false6.i, %for.body, %if.end
				%j.0.lcssa.ph = phi i64 [ %j, %cond.false12.i ], [ %j, %cond.false6.i ], [ %j, %for.body ], [ %div, %if.end ]
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				%j.0.lcssa = phi i64 [ %n, %entry ], [ %j.0.lcssa.ph, %for.end.loopexit ]
				%j.2 = add i64 %j.0.lcssa, %n
				%j.3 = mul i64 %j.2, %n
				%j.4 = add i64 %j.3, 10
				ret i64 %j.4
				}

llvm/trunk/test/CodeGen/ARM/arm-and-tst-peephole.ll

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; V8-LABEL: %tailrecurse.switch			; V8-LABEL: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: beq			; V8-NEXT: beq
	; V8-NEXT: %tailrecurse.switch			; V8-NEXT: %tailrecurse.switch
	; V8: cmp			; V8: cmp
	; V8-NEXT: bne			; V8-NEXT: beq
	; V8-NEXT: b			; V8-NEXT: b
	; The trailing space in the last line checks that the branch is unconditional			; The trailing space in the last line checks that the branch is unconditional
	switch i32 %and, label %sw.epilog [			switch i32 %and, label %sw.epilog [
	i32 1, label %sw.bb			i32 1, label %sw.bb
	i32 3, label %sw.bb6			i32 3, label %sw.bb6
	i32 2, label %sw.bb8			i32 2, label %sw.bb8
	], !prof !1			], !prof !1

	▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines