This is an archive of the discontinued LLVM Phabricator instance.

Differential D17235

[Greedy Regalloc] Reg splitting based on loops
Needs ReviewPublic

Authored by wmi on Feb 12 2016, 9:40 PM.

Download Raw Diff

Details

Reviewers

qcolombet

Summary

The patch is to add reg splitting based on loops, especially hot loops. It is a supplement to current region splitting.

During analysis of PR26597, PR24278 and PR24348, we find some typical senarios which cannot be well handled by current region splitting.

Vreg1 has reference inside loop1 and lives through loop2. Vreg2 is both defined and used inside loop2. Vreg1 gets a physical register before Vreg2's allocation. Loop2 has high register pressure, so vreg2 cannot find available physical register. Vreg1 has higher weight than vreg2 because it is used in loop1 (The hotness of loop1 and loop2 are comparable), so vreg2 cannot evict vreg1 and can only be spilled. However, it is better to split vreg1 around loop2. If vreg1 is splitted around loop2, the vreg1' generated around loop2 will only have define and use at the loop preheader and exit block (cold compared with loop body), which means it has low weight and can be evicted by vreg2. After vreg2 evicts vreg1', vreg2 will get physical register and there is less spill in loop2. Current region splitting will not try splitting vreg1 when allocating for vreg2 -- it will only try splitting vreg2. PR26597 belongs to this case.

Both vreg1 and vreg2 have references inside a loop. The reference of vreg2 in the loop can be rematerialized but vreg1 cannot, which indicates vreg2 should have lower weight than vreg1. But in reality vreg2 still has higher weight than vreg1 because it has shorter live range than vreg1, which is another factor affecting live range weight. After vreg2 gets physical register, vreg1 cannot evict vreg2 because it has lower weight. vreg1 cannot be rematerialized so there will be a spill inserted in the loop. For this case, it is also better to split vreg1 around loop, so vreg1' generated by split around loop will have higher weight than vreg2 because it is shorter. vreg1' will evict vreg2 and vreg2 reference in loop will be rematerialized. There will be less spill in loop2. Current region splitting will also split vreg1, but the splitting point is inside the loop. That is because vreg1 has reference inside loop not interfering with a physical register. In order to let vreg1 references inside loop be allocated to physical register as many as possible, region based spliitting choose to split inside loop. Although the splitted vreg1' inside loop evict vreg2 and get physical register in the end, there are extra reg movs left inside the loop. PR24278 belongs to this case.

Vreg1 has only 1 reference in loop1 but vreg2 has 4 reference in loop1. vreg1 also has many reference in loop2 so overall vreg1 has higher weight than vreg2. Loop1 has high register pressure and only one physical register is left for vreg1 and vreg2. Finally vreg1 get the physical register because it has higher weight. However, it is unoptimal for loop1 because vreg2 actually has more reference in loop1. For this case, it is better to split both vreg1 and vreg2 around loop1, so the weight of vreg1' and vreg2' around the loop after split can be computed all based on the references inside loop1, and then vreg2' will have higher weight than vreg1' and it will get the physical register. Current region splitting only tries to find holes from the live ranges of physical register being used. It will not try the splitting here. This is an assumed case. I didn't try to catch such testcase from testsuites but I think it probably exists.

From the cases analysis above, to optimize register allocation for a hot loop with high register pressure to the best, we may want to split all the virtregs which live across the loop at the loop boundary. In such way, the hot loop will become an independent regalloc region isolated from regalloc outside the region. The patch here implements such splitting.

The logic to implement splitting around loop has two major steps. The first step is: If a vreg lives across a hot loop and it has real reference inside the loop, we will only split the vreg around the loop. If the new vreg around loop generated after splitting still cannot get physical register, we will try the second step - to split all the vregs living across the loop once in all. The intuition of the first step is, if the new vreg around loop after splitting still cannot get physical register, it tells us definitively the loop has high register pressure, and only then we will try the second step.

Another part necessary is to split critical edges for loop entry and exit so that the splitting can happen at the loop boundary. A separate pass is added to do that. I find it is also helpful for phi-elimination and shrinkwrapping.

Performance on x86-64:
0.5% improvement on avg for google internal benchmarks.
Some improvements on llvm testsuite, no apparent regression saw.

SingleSource/Benchmarks/Shootout/matrix: 10.54%
SingleSource/Benchmarks/Adobe-C++/stepanov_vector: 3.41%
MultiSource/Benchmarks/sim/sim: 7.31%
MultiSource/Applications/siod/siod: 9.59%
(The above four are improved because of reg splitting around loop)
SingleSource/Benchmarks/Shootout-C++/ackermann: 21.3%
SingleSource/Benchmarks/BenchmarkGame/recursive: 11.02%
(These two are improved because splitting critical edges enables more shrinkwrapping)
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk: 1.73%
(This testcase is improved because after splitting critical edges, phi-elimination can generate a copy outside loop)

Compile time:
There is 1.12% compile time increase for spec2006 all C/C++ benchmarks. For one file I looked at, the increased compile is majorly spent on propagateSiblingValues. When I use -split-spill-mode=size for both base and comparison, the compile time increase drops to 0.8%. I will work on it and try to reduce the compile time increase more.

Problems to solve - Better hints handling.
Loop splitting introduces more vreg copies at the loop boundary. It depends on register allocation to coalesce such copies using hints. However, current hint setting and using in regalloc is not optimal.
A typical case we found is like this: vreg2 and vreg3 are splitted from vreg1. vreg1 has a hint of R8 so vreg2 and vreg3 both have hint of R8. However vreg2 finally gets R10 assigned and then vreg3 also gets R10 because its hint changes from R8 to R10. At this time, the hint of vreg2 is still R8. Depriving R10 from vreg2 will not be regarded as a BrokenHint, which leads to easy deprivation of R10 from vreg2. The copy between vreg2 and vreg3 cannot be removed.
The case we saw was at the loop boundary in a cold outer loop so it didn't cause perf regression, but it is possible that it happened elsewhere I didn't notice.
The hint handling problem exposed here is a general issue and may worth some effort to improve.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi updated this revision to Diff 47896.Feb 12 2016, 9:40 PM

wmi retitled this revision from to [Greedy Regalloc] Reg splitting based on loops.

wmi updated this object.

wmi added a reviewer: qcolombet.

wmi set the repository for this revision to rL LLVM.

wmi added subscribers: llvm-commits, davidxl.

Herald added subscribers: qcolombet, MatzeB. · View Herald TranscriptFeb 12 2016, 9:40 PM

Hi Wei,

Thanks for looking into this.
I haven't looked at the details of the patch, but here are a few comments:

The PHIElimination pass already has an option to SplitAllCriticalEdges. Please check how we can use this instead of introducing a new pass.
The split around regions should already consider loops. Why is this not doing the right thing? Could you work toward improving the heuristic there instead of introducing another splitting scheme?

Cheers,
-Quentin

Thanks for taking a look at it.

The PHIElimination pass already has an option to SplitAllCriticalEdges. Please check how we can use this instead of introducing a new pass.

Thanks, I will check it.

The split around regions should already consider loops. Why is this not doing the right thing? Could you work toward improving the heuristic there instead of introducing another splitting scheme?

The split around regions split current vreg into sub vregs to fit holes unused by physical registers. It considers loop only in order not to put the splitting point inside loop when it is unnecessary. The motivation of splitting is to fill holes.

For the first case listed above, it is required to split the current vreg around loop so the sub vreg used inside loop can evict other vregs occupying physical register but having no use inside loop. The splitting there is to better evict other unimportant vregs, so the motivation is notably different with region split.

As for where to split, current region splitting algorithm uses Hopfield Networks to decide it. loop splitting doesn't need that because where to split is determined -- .i.e, at the loop boundary.

Another difference is current region splitting is limited to split current vreg while the loop splitting proposed here can split multiple vregs at the same time, no matter those vregs are in the queue or already being assigned physical registers.

These are the reasons I feel it is not easy to fit the loop splitting requirement into current region splitting algorithm. However, logic to decide when and how to do loop splitting is relative easy, plus existing splitting transformation kit and live interval update are reused, so there is not too much new stuff added in the patch here actually.

Thanks,
Wei.

Thanks Wei for the additional inputs.
That’ll help me when I get to the review.
That being said, the fact that we consider several live-ranges when doing the splitting worries me complexity-wise. I’ll see if my concerns are real or not.

Cheers,
-Quentin

LoopBase<BlockT, LoopT>::getExitBlocks needs to iterate all the blocks inside the loop everytime it is called. Because getExitBlocks gets called many times in loop based reg splitting, it is better to cache the ExitBlocks for every loop (Used a map LoopExitBlocks in RAGreedy).

With this change, the compile time increase drops from 0.8% to 0.43% when -split-spill-mode=size is on.

jonpa added a subscriber: jonpa.May 23 2017, 10:16 AM

myatsina added a subscriber: myatsina.Jun 4 2017, 6:13 AM

lkail added a subscriber: lkail.Apr 19 2022, 1:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2022, 1:20 AM

Herald added subscribers: pengfei, mgorny. · View Herald Transcript

Revision Contents

Path

Size

include/

llvm/

CodeGen/

Passes.h

3 lines

InitializePasses.h

1 line

lib/

CodeGen/

1 line

1 line

6 lines

324 lines

222 lines

5 lines

104 lines

Diff 48236

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.

/// PostMachineScheduler - This pass schedules machine instructions postRA.		/// PostMachineScheduler - This pass schedules machine instructions postRA.
extern char &PostMachineSchedulerID;		extern char &PostMachineSchedulerID;

/// SpillPlacement analysis. Suggest optimal placement of spill code between		/// SpillPlacement analysis. Suggest optimal placement of spill code between
/// basic blocks.		/// basic blocks.
extern char &SpillPlacementID;		extern char &SpillPlacementID;

		/// Split critical edges for loops pass.
		extern char &SplitCritEdgesID;

/// ShrinkWrap pass. Look for the best place to insert save and restore		/// ShrinkWrap pass. Look for the best place to insert save and restore
// instruction and update the MachineFunctionInfo with that information.		// instruction and update the MachineFunctionInfo with that information.
extern char &ShrinkWrapID;		extern char &ShrinkWrapID;

/// VirtRegRewriter pass. Rewrite virtual registers to physical registers as		/// VirtRegRewriter pass. Rewrite virtual registers to physical registers as
/// assigned in VirtRegMap.		/// assigned in VirtRegMap.
extern char &VirtRegRewriterID;		extern char &VirtRegRewriterID;

▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
	void initializeSafeStackPass(PassRegistry&);			void initializeSafeStackPass(PassRegistry&);
	void initializeSCCPPass(PassRegistry&);			void initializeSCCPPass(PassRegistry&);
	void initializeSROALegacyPassPass(PassRegistry&);			void initializeSROALegacyPassPass(PassRegistry&);
	void initializeSROA_DTPass(PassRegistry&);			void initializeSROA_DTPass(PassRegistry&);
	void initializeSROA_SSAUpPass(PassRegistry&);			void initializeSROA_SSAUpPass(PassRegistry&);
	void initializeSCEVAAWrapperPassPass(PassRegistry&);			void initializeSCEVAAWrapperPassPass(PassRegistry&);
	void initializeScalarEvolutionWrapperPassPass(PassRegistry&);			void initializeScalarEvolutionWrapperPassPass(PassRegistry&);
	void initializeShrinkWrapPass(PassRegistry &);			void initializeShrinkWrapPass(PassRegistry &);
				void initializeSplitCritEdgesPass(PassRegistry &);
	void initializeSimpleInlinerPass(PassRegistry&);			void initializeSimpleInlinerPass(PassRegistry&);
	void initializeShadowStackGCLoweringPass(PassRegistry&);			void initializeShadowStackGCLoweringPass(PassRegistry&);
	void initializeRegisterCoalescerPass(PassRegistry&);			void initializeRegisterCoalescerPass(PassRegistry&);
	void initializeSingleLoopExtractorPass(PassRegistry&);			void initializeSingleLoopExtractorPass(PassRegistry&);
	void initializeSinkingPass(PassRegistry&);			void initializeSinkingPass(PassRegistry&);
	void initializeSeparateConstOffsetFromGEPPass(PassRegistry &);			void initializeSeparateConstOffsetFromGEPPass(PassRegistry &);
	void initializeSlotIndexesPass(PassRegistry&);			void initializeSlotIndexesPass(PassRegistry&);
	void initializeSpillPlacementPass(PassRegistry&);			void initializeSpillPlacementPass(PassRegistry&);
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
TargetLoweringObjectFileImpl.cpp		TargetLoweringObjectFileImpl.cpp
TargetOptionsImpl.cpp		TargetOptionsImpl.cpp
TargetRegisterInfo.cpp		TargetRegisterInfo.cpp
TargetSchedule.cpp		TargetSchedule.cpp
TwoAddressInstructionPass.cpp		TwoAddressInstructionPass.cpp
UnreachableBlockElim.cpp		UnreachableBlockElim.cpp
VirtRegMap.cpp		VirtRegMap.cpp
WinEHPrepare.cpp		WinEHPrepare.cpp
		SplitCritEdges.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen		${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen
${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen/PBQP		${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen/PBQP

LINK_LIBS ${system_libs}		LINK_LIBS ${system_libs}
)		)

add_dependencies(LLVMCodeGen intrinsics_gen)		add_dependencies(LLVMCodeGen intrinsics_gen)

add_subdirectory(SelectionDAG)		add_subdirectory(SelectionDAG)
add_subdirectory(AsmPrinter)		add_subdirectory(AsmPrinter)
add_subdirectory(MIRParser)		add_subdirectory(MIRParser)

lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializePHIEliminationPass(Registry);		initializePHIEliminationPass(Registry);
initializePeepholeOptimizerPass(Registry);		initializePeepholeOptimizerPass(Registry);
initializePostMachineSchedulerPass(Registry);		initializePostMachineSchedulerPass(Registry);
initializePostRASchedulerPass(Registry);		initializePostRASchedulerPass(Registry);
initializeProcessImplicitDefsPass(Registry);		initializeProcessImplicitDefsPass(Registry);
initializeRegisterCoalescerPass(Registry);		initializeRegisterCoalescerPass(Registry);
initializeShrinkWrapPass(Registry);		initializeShrinkWrapPass(Registry);
initializeSlotIndexesPass(Registry);		initializeSlotIndexesPass(Registry);
		initializeSplitCritEdgesPass(Registry);
initializeStackColoringPass(Registry);		initializeStackColoringPass(Registry);
initializeStackMapLivenessPass(Registry);		initializeStackMapLivenessPass(Registry);
initializeLiveDebugValuesPass(Registry);		initializeLiveDebugValuesPass(Registry);
initializeStackProtectorPass(Registry);		initializeStackProtectorPass(Registry);
initializeStackSlotColoringPass(Registry);		initializeStackSlotColoringPass(Registry);
initializeTailDuplicatePassPass(Registry);		initializeTailDuplicatePassPass(Registry);
initializeTargetPassConfigPass(Registry);		initializeTargetPassConfigPass(Registry);
initializeTwoAddressInstructionPassPass(Registry);		initializeTwoAddressInstructionPassPass(Registry);
Show All 11 Lines

lib/CodeGen/Passes.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",		static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",
cl::Hidden, cl::desc("Disable ConstantHoisting"));		cl::Hidden, cl::desc("Disable ConstantHoisting"));
static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,		static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,
cl::desc("Disable Codegen Prepare"));		cl::desc("Disable Codegen Prepare"));
static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,		static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,
cl::desc("Disable Copy Propagation pass"));		cl::desc("Disable Copy Propagation pass"));
static cl::opt<bool> DisablePartialLibcallInlining("disable-partial-libcall-inlining",		static cl::opt<bool> DisablePartialLibcallInlining("disable-partial-libcall-inlining",
cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));		cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));
		static cl::opt<bool>
		DisableSCEdge("disable-scedge", cl::Hidden,
		cl::desc("Disable Split Critical Edge for loops pass"));
static cl::opt<bool> EnableImplicitNullChecks(		static cl::opt<bool> EnableImplicitNullChecks(
"enable-implicit-null-checks",		"enable-implicit-null-checks",
cl::desc("Fold null checks into faulting memory operations"),		cl::desc("Fold null checks into faulting memory operations"),
cl::init(false));		cl::init(false));
static cl::opt<bool> PrintLSR("print-lsr-output", cl::Hidden,		static cl::opt<bool> PrintLSR("print-lsr-output", cl::Hidden,
cl::desc("Print LLVM IR produced by the loop-reduce pass"));		cl::desc("Print LLVM IR produced by the loop-reduce pass"));
static cl::opt<bool> PrintISelInput("print-isel-input", cl::Hidden,		static cl::opt<bool> PrintISelInput("print-isel-input", cl::Hidden,
cl::desc("Print LLVM IR input to isel pass"));		cl::desc("Print LLVM IR input to isel pass"));
▲ Show 20 Lines • Show All 455 Lines • ▼ Show 20 Lines	if (getOptLevel() != CodeGenOpt::None) {
// If the target requests it, assign local variables to stack slots relative		// If the target requests it, assign local variables to stack slots relative
// to one another and simplify frame index references where possible.		// to one another and simplify frame index references where possible.
addPass(&LocalStackSlotAllocationID, false);		addPass(&LocalStackSlotAllocationID, false);
}		}

// Run pre-ra passes.		// Run pre-ra passes.
addPreRegAlloc();		addPreRegAlloc();

		if (!DisableSCEdge)
		addPass(&SplitCritEdgesID);

// Run register allocation and passes that are tightly coupled with it,		// Run register allocation and passes that are tightly coupled with it,
// including phi elimination and scheduling.		// including phi elimination and scheduling.
if (getOptimizeRegAlloc())		if (getOptimizeRegAlloc())
addOptimizedRegAlloc(createRegAllocPass(true));		addOptimizedRegAlloc(createRegAllocPass(true));
else		else
addFastRegAlloc(createRegAllocPass(false));		addFastRegAlloc(createRegAllocPass(false));

// Run post-ra passes.		// Run post-ra passes.
▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

lib/CodeGen/RegAllocGreedy.cpp

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
CSRFirstTimeCost("regalloc-csr-first-time-cost",		CSRFirstTimeCost("regalloc-csr-first-time-cost",
cl::desc("Cost for first time use of callee-saved register."),		cl::desc("Cost for first time use of callee-saved register."),
cl::init(0), cl::Hidden);		cl::init(0), cl::Hidden);

static RegisterRegAlloc greedyRegAlloc("greedy", "greedy register allocator",		static RegisterRegAlloc greedyRegAlloc("greedy", "greedy register allocator",
createGreedyRegisterAllocator);		createGreedyRegisterAllocator);

namespace {		namespace {
		typedef DenseMap<MachineLoop , SmallPtrSet<MachineBasicBlock , 8>>
		LoopExitBlocksMap;

class RAGreedy : public MachineFunctionPass,		class RAGreedy : public MachineFunctionPass,
public RegAllocBase,		public RegAllocBase,
private LiveRangeEdit::Delegate {		private LiveRangeEdit::Delegate {
// Convenient shortcuts.		// Convenient shortcuts.
typedef std::priority_queue<std::pair<unsigned, unsigned> > PQueue;		typedef std::priority_queue<std::pair<unsigned, unsigned> > PQueue;
typedef SmallPtrSet<LiveInterval *, 4> SmallLISet;		typedef SmallPtrSet<LiveInterval *, 4> SmallLISet;
typedef SmallSet<unsigned, 16> SmallVirtRegSet;		typedef SmallSet<unsigned, 16> SmallVirtRegSet;

Show All 37 Lines	enum LiveRangeStage {
RS_New,		RS_New,

/// Only attempt assignment and eviction. Then requeue as RS_Split.		/// Only attempt assignment and eviction. Then requeue as RS_Split.
RS_Assign,		RS_Assign,

/// Attempt live range splitting if assignment is impossible.		/// Attempt live range splitting if assignment is impossible.
RS_Split,		RS_Split,

		/// Attempt live range splitting around loops.
		RS_LoopSplit,

/// Attempt more aggressive live range splitting that is guaranteed to make		/// Attempt more aggressive live range splitting that is guaranteed to make
/// progress. This is used for split products that may not be making		/// progress. This is used for split products that may not be making
/// progress.		/// progress.
RS_Split2,		RS_Split2,

/// Live range will be spilled. No more splitting will be attempted.		/// Live range will be spilled. No more splitting will be attempted.
RS_Spill,		RS_Spill,


/// Live range is in memory. Because of other evictions, it might get moved		/// Live range is in memory. Because of other evictions, it might get moved
/// in a register in the end.		/// in a register in the end.
RS_Memory,		RS_Memory,

/// There is nothing more we can do to this live range. Abort compilation		/// There is nothing more we can do to this live range. Abort compilation
/// if it can't be assigned.		/// if it can't be assigned.
RS_Done		RS_Done
};		};
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	#endif

/// Run or not the local reassignment heuristic. This information is		/// Run or not the local reassignment heuristic. This information is
/// obtained from the TargetSubtargetInfo.		/// obtained from the TargetSubtargetInfo.
bool EnableLocalReassign;		bool EnableLocalReassign;

/// Set of broken hints that may be reconciled later because of eviction.		/// Set of broken hints that may be reconciled later because of eviction.
SmallSetVector<LiveInterval *, 8> SetOfBrokenHints;		SmallSetVector<LiveInterval *, 8> SetOfBrokenHints;

		/// Set of Loops for which all the vregs crossing its entry or exit have been
		/// splitted.
		SmallPtrSet<MachineLoop *, 16> SplittedLoops;

		/// Set of LiveInterval which should be removed from Queue.
		SmallPtrSet<LiveInterval *, 32> RemovedFromQ;

		/// Map from Loop to its ExitBlock set.
		LoopExitBlocksMap LoopExitBlocks;

public:		public:
RAGreedy();		RAGreedy();

/// Return the pass name.		/// Return the pass name.
const char* getPassName() const override {		const char* getPassName() const override {
return "Greedy Register Allocator";		return "Greedy Register Allocator";
}		}

Show All 32 Lines	private:
unsigned canReassign(LiveInterval &VirtReg, unsigned PhysReg);		unsigned canReassign(LiveInterval &VirtReg, unsigned PhysReg);
bool shouldEvict(LiveInterval &A, bool, LiveInterval &B, bool);		bool shouldEvict(LiveInterval &A, bool, LiveInterval &B, bool);
bool canEvictInterference(LiveInterval&, unsigned, bool, EvictionCost&);		bool canEvictInterference(LiveInterval&, unsigned, bool, EvictionCost&);
void evictInterference(LiveInterval&, unsigned,		void evictInterference(LiveInterval&, unsigned,
SmallVectorImpl<unsigned>&);		SmallVectorImpl<unsigned>&);
bool mayRecolorAllInterferences(unsigned PhysReg, LiveInterval &VirtReg,		bool mayRecolorAllInterferences(unsigned PhysReg, LiveInterval &VirtReg,
SmallLISet &RecoloringCandidates,		SmallLISet &RecoloringCandidates,
const SmallVirtRegSet &FixedRegisters);		const SmallVirtRegSet &FixedRegisters);
		void splitAllAroundLoop(MachineLoop *Loop,
		SmallVectorImpl<unsigned> &NewVRegs);

unsigned tryAssign(LiveInterval&, AllocationOrder&,		unsigned tryAssign(LiveInterval&, AllocationOrder&,
SmallVectorImpl<unsigned>&);		SmallVectorImpl<unsigned>&);
unsigned tryEvict(LiveInterval&, AllocationOrder&,		unsigned tryEvict(LiveInterval&, AllocationOrder&,
SmallVectorImpl<unsigned>&, unsigned = ~0u);		SmallVectorImpl<unsigned>&, unsigned = ~0u);
		bool trySplitRegAroundLoop(LiveInterval &VirtReg,
		SmallVectorImpl<unsigned> &NewVRegs);
		bool trySplitLoopInAll(LiveInterval &VirtReg,
		SmallVectorImpl<unsigned> &NewVRegs);
unsigned tryRegionSplit(LiveInterval&, AllocationOrder&,		unsigned tryRegionSplit(LiveInterval&, AllocationOrder&,
SmallVectorImpl<unsigned>&);		SmallVectorImpl<unsigned>&);
/// Calculate cost of region splitting.		/// Calculate cost of region splitting.
unsigned calculateRegionSplitCost(LiveInterval &VirtReg,		unsigned calculateRegionSplitCost(LiveInterval &VirtReg,
AllocationOrder &Order,		AllocationOrder &Order,
BlockFrequency &BestCost,		BlockFrequency &BestCost,
unsigned &NumCands, bool IgnoreCSR);		unsigned &NumCands, bool IgnoreCSR);
/// Perform region splitting.		/// Perform region splitting.
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	private:
bool isUnusedCalleeSavedReg(unsigned PhysReg) const;		bool isUnusedCalleeSavedReg(unsigned PhysReg) const;
};		};
} // end anonymous namespace		} // end anonymous namespace

char RAGreedy::ID = 0;		char RAGreedy::ID = 0;

#ifndef NDEBUG		#ifndef NDEBUG
const char *const RAGreedy::StageName[] = {		const char *const RAGreedy::StageName[] = {
"RS_New",		"RS_New", "RS_Assign", "RS_Split", "RS_LoopSplit",
"RS_Assign",		"RS_Split2", "RS_Spill", "RS_Memory", "RS_Done"};
"RS_Split",
"RS_Split2",
"RS_Spill",
"RS_Memory",
"RS_Done"
};
#endif		#endif

// Hysteresis to use when comparing floats.		// Hysteresis to use when comparing floats.
// This helps stabilize decisions based on float comparisons.		// This helps stabilize decisions based on float comparisons.
const float Hysteresis = (2007 / 2048.0f); // 0.97998046875		const float Hysteresis = (2007 / 2048.0f); // 0.97998046875


FunctionPass* llvm::createGreedyRegisterAllocator() {		FunctionPass* llvm::createGreedyRegisterAllocator() {
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	void RAGreedy::enqueue(PQueue &CurQueue, LiveInterval *LI) {
// The virtual register number is a tie breaker for same-sized ranges.		// The virtual register number is a tie breaker for same-sized ranges.
// Give lower vreg numbers higher priority to assign them first.		// Give lower vreg numbers higher priority to assign them first.
CurQueue.push(std::make_pair(Prio, ~Reg));		CurQueue.push(std::make_pair(Prio, ~Reg));
}		}

LiveInterval *RAGreedy::dequeue() { return dequeue(Queue); }		LiveInterval *RAGreedy::dequeue() { return dequeue(Queue); }

LiveInterval *RAGreedy::dequeue(PQueue &CurQueue) {		LiveInterval *RAGreedy::dequeue(PQueue &CurQueue) {
if (CurQueue.empty())		while (!CurQueue.empty()) {
return nullptr;
LiveInterval *LI = &LIS->getInterval(~CurQueue.top().second);		LiveInterval *LI = &LIS->getInterval(~CurQueue.top().second);
CurQueue.pop();		CurQueue.pop();
		if (!RemovedFromQ.count(LI))
return LI;		return LI;
}		}
		return nullptr;
		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Direct Assignment		// Direct Assignment
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// tryAssign - Try to assign VirtReg to an available register.		/// tryAssign - Try to assign VirtReg to an available register.
unsigned RAGreedy::tryAssign(LiveInterval &VirtReg,		unsigned RAGreedy::tryAssign(LiveInterval &VirtReg,
▲ Show 20 Lines • Show All 720 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = LREdit.size(); i != e; ++i) {
// Other intervals are treated as new. This includes local intervals created		// Other intervals are treated as new. This includes local intervals created
// for blocks with multiple uses, and anything created by DCE.		// for blocks with multiple uses, and anything created by DCE.
}		}

if (VerifyEnabled)		if (VerifyEnabled)
MF->verify(this, "After splitting live range around region");		MF->verify(this, "After splitting live range around region");
}		}

		/// isHotLoop - the loop is regarded as hot if the freq of its header is much
		/// larger than its preheader.
		static bool isHotLoop(MachineLoop Loop, MachineBlockFrequencyInfo MBFI) {
		if (Loop->getNumBlocks() > 50)
		return false;
		MachineBasicBlock *Preheader = Loop->getLoopPreheader();
		if (!Preheader)
		return false;
		MachineBasicBlock *Header = Loop->getHeader();
		unsigned Ratio = MBFI->getBlockFreq(Header).getFrequency() /
		MBFI->getBlockFreq(Preheader).getFrequency();
		return Ratio >= 10;
		}

		/// noCriticalEdge - The loop has no critical edges if the entry and exit edges
		/// are not critical edges.
		static bool noCriticalEdge(MachineLoop *Loop,
		LoopExitBlocksMap &LoopExitBlocks) {
		MachineBasicBlock *Preheader = Loop->getLoopPreheader();
		if (!Preheader)
		return false;

		SmallVector<MachineBasicBlock *, 8> ExitBlocks;
		if (!LoopExitBlocks.count(Loop)) {
		Loop->getExitBlocks(ExitBlocks);
		LoopExitBlocks[Loop].insert(ExitBlocks.begin(), ExitBlocks.end());
		}
		for (auto ExitBlock : LoopExitBlocks[Loop]) {
		if (ExitBlock->pred_size() > 1)
		return false;
		}
		return true;
		}

		/// crossLoopBoundary - Literally, if the VirtReg is livein at the entry of
		/// an exit block or liveout at the exit of the preheader block, we say it
		/// crosses the loop boundary.
		/// However, a VirtReg after being splitted for a loop still cross the loop
		/// boundary. To avoid recursively splitting the same VirtReg, the
		/// concept of crossLoopBoundary is extended a little bit: If the VirtReg
		/// lives through the entry of an exit block or lives through the exit of
		/// the preheader block, we say it crosses the loop boundary.
		static bool crossLoopBoundary(LiveInterval &VirtReg, MachineLoop *Loop,
		LiveIntervals *LIS,
		LoopExitBlocksMap &LoopExitBlocks) {
		// VirtReg crosses loop entry boundary if it lives through the loop
		// preheader block.
		bool Cross = LIS->isLiveInToMBB(VirtReg, Loop->getLoopPreheader()) &&
		LIS->isLiveOutOfMBB(VirtReg, Loop->getLoopPreheader());

		// VirtReg crosses loop entry boundary if it lives through any loop
		// exit block.
		SmallVector<MachineBasicBlock *, 8> ExitBlocks;
		if (!LoopExitBlocks.count(Loop)) {
		Loop->getExitBlocks(ExitBlocks);
		LoopExitBlocks[Loop].insert(ExitBlocks.begin(), ExitBlocks.end());
		}
		for (auto ExitBlock : LoopExitBlocks[Loop])
		Cross = Cross \|\| (LIS->isLiveOutOfMBB(VirtReg, ExitBlock) &&
		LIS->isLiveInToMBB(VirtReg, ExitBlock));
		return Cross;
		}

		/// trySplitLoopInAll - For the innermost hot loop covering the live range of
		/// VirtReg, Split all the virtregs which live across the loop boundary.
		/// Covering here means VirtReg is totally inside the loop, or its live range
		/// starts at the loop Preheader and ends inside the loop, or its live range
		/// starts inside the loop and ends at the loop ExitBlock.
		/// The fact that VirtReg didn't get a Physical reg means the loop covering
		/// it has high register pressure. Splitting all the crossing virtregs at the
		/// loop boundary helps the best register allocation for virtreg references
		/// inside the loop.
		bool RAGreedy::trySplitLoopInAll(LiveInterval &VirtReg,
		SmallVectorImpl<unsigned> &NewVRegs) {
		MachineLoop *CandLoop = nullptr;
		ArrayRef<SplitAnalysis::BlockInfo> UseBlocks = SA->getUseBlocks();
		for (unsigned i = 0; i != UseBlocks.size(); ++i) {
		const SplitAnalysis::BlockInfo &BI = UseBlocks[i];
		const MachineBasicBlock *MBB = BI.MBB;
		// If MBB is not inside any loop, skip it.
		if (!Loops->getLoopDepth(MBB))
		continue;

		// Find the loop which the live range of VirtReg is inside or just
		// covers.
		MachineLoop *Loop = Loops->getLoopFor(MBB);
		while (Loop) {
		if (noCriticalEdge(Loop, LoopExitBlocks) && isHotLoop(Loop, MBFI) &&
		!crossLoopBoundary(VirtReg, Loop, LIS, LoopExitBlocks))
		break;
		Loop = Loop->getParentLoop();
		}
		if (!SplittedLoops.count(Loop)) {
		CandLoop = Loop;
		break;
		}
		}

		if (!CandLoop)
		return false;

		#ifndef NDEBUG
		// Assume there is no separate live range outside CandLoop.
		for (unsigned i = 0; i != UseBlocks.size(); ++i) {
		const SplitAnalysis::BlockInfo &BI = UseBlocks[i];
		const MachineBasicBlock *MBB = BI.MBB;
		if (CandLoop && CandLoop->contains(MBB))
		continue;
		if (MBB == CandLoop->getLoopPreheader())
		continue;

		SmallVector<MachineBasicBlock *, 8> ExitBlocks;
		if (!LoopExitBlocks.count(CandLoop)) {
		CandLoop->getExitBlocks(ExitBlocks);
		LoopExitBlocks[CandLoop].insert(ExitBlocks.begin(), ExitBlocks.end());
		}
		bool found = false;
		for (auto ExitBlock : LoopExitBlocks[CandLoop]) {
		if (MBB == ExitBlock) {
		found = true;
		break;
		}
		}
		assert(found && "Separate live ranges from the same vreg");
		}
		#endif

		DEBUG(dbgs() << "Split all vregs for " << *CandLoop);
		splitAllAroundLoop(CandLoop, NewVRegs);
		return true;
		}

		/// trySplitRegAroundLoop - VirtReg didn't get a Physical reg assigned before
		/// splitting. If it lives across a loop and it has reference inside the loop,
		/// find the outermost such loop and split the VirtReg around it. So the
		/// new VirtReg with shorter live range just around the loop will have better
		/// chance to get a Physical register, and its reference inside the loop
		/// may not need to be spilled.
		/// Another benefit of splitting VirtReg around loop is if the new VirtReg
		/// around the loop still cannot get Physical register, it indicates the loop
		/// has high enough register pressure, so we may go further step to split all
		/// the virtregs acrossing the loop's boundary by calling splitAllAroundLoop
		/// in the next round of selectOrSplit.
		bool RAGreedy::trySplitRegAroundLoop(LiveInterval &VirtReg,
		SmallVectorImpl<unsigned> &NewVRegs) {
		LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);
		// Here we cannot use SplitSpillMode other than SM_Partition mode because
		// hoisting backcopies can keep generating new VReg living across loop and
		// cause infinite loop.
		SE->reset(LREdit, SplitEditor::SM_Partition);

		SmallPtrSet<MachineLoop *, 4> CandLoopSet;
		// Search each reference of VirtReg inside a hot loop.
		ArrayRef<SplitAnalysis::BlockInfo> UseBlocks = SA->getUseBlocks();
		for (unsigned i = 0; i != UseBlocks.size(); ++i) {
		const SplitAnalysis::BlockInfo &BI = UseBlocks[i];
		const MachineBasicBlock *MBB = BI.MBB;
		// If MBB is not inside any loop, skip it.
		if (!Loops->getLoopDepth(MBB))
		continue;
		// If MBB is already inside a CandLoop, don't try to find candidate
		// loop for it.
		bool InCandLoop = false;
		for (auto CandLoop : CandLoopSet)
		if (CandLoop->contains(MBB))
		InCandLoop = true;
		if (InCandLoop)
		continue;

		// Search the outermost hot loop which VirtReg lives across.
		MachineLoop *CandLoop = nullptr;
		MachineLoop *Loop = Loops->getLoopFor(MBB);
		while (Loop) {
		if (noCriticalEdge(Loop, LoopExitBlocks) && isHotLoop(Loop, MBFI) &&
		crossLoopBoundary(VirtReg, Loop, LIS, LoopExitBlocks))
		CandLoop = Loop;
		Loop = Loop->getParentLoop();
		}
		if (CandLoop)
		CandLoopSet.insert(CandLoop);
		}

		if (CandLoopSet.empty())
		return false;

		for (auto CandLoop : CandLoopSet) {
		DEBUG(dbgs() << "Split " << PrintReg(VirtReg.reg, TRI) << " for "
		<< *CandLoop);
		SE->splitAroundLoop(VirtReg, CandLoop, true);
		}
		SmallVector<unsigned, 8> IntvMap;
		SE->finish(&IntvMap);
		return true;
		}

		/// splitAllAroundLoop - Split all the virtregs crossing loop at the loop
		/// boundary, including those already got Physical register assigned.
		void RAGreedy::splitAllAroundLoop(MachineLoop *Loop,
		SmallVectorImpl<unsigned> &NewVRegs) {
		SmallVector<unsigned, 4> LSNewVRegs;
		for (unsigned i = 0, e = MRI->getNumVirtRegs(); i != e; ++i) {
		unsigned Reg = TargetRegisterInfo::index2VirtReg(i);
		if (MRI->reg_nodbg_empty(Reg))
		continue;

		LiveInterval &LI = LIS->getInterval(Reg);
		if (crossLoopBoundary(LI, Loop, LIS, LoopExitBlocks)) {
		DEBUG(dbgs() << PrintReg(Reg, TRI));
		// For those VirtRegs already been assigned Physical register, unassigned
		// the original VirtReg, split it and assign the same Physical register to
		// the new VirtRegs.
		// For those VirtRegs not been assigned yet, they need to be removed from
		// Queue after the splitting.
		unsigned PhysReg = 0;
		bool hasPhys = VRM->hasPhys(Reg);
		if (hasPhys) {
		PhysReg = VRM->getPhys(Reg);
		DEBUG(dbgs() << " with [" << PrintReg(PhysReg, TRI) << "] is splitted");
		Matrix->unassign(LI);
		} else {
		DEBUG(dbgs() << " is splitted:\n");
		}

		LSNewVRegs.clear();
		LiveRangeEdit LREdit(&LI, LSNewVRegs, MF, LIS, VRM, this);
		// For the splitting of VirtReg with physical register assigned, cannot
		// use SplitSpillMode other than SM_Partition mode because other mode may
		// hoist backcopies and produce overlapped live ranges.
		// For the splitting of VirtReg not being assigned, still cannot use
		// SplitSpillMode other than SM_Partition mode because hoisting backcopies
		// can keep generating new VReg living across loop and cause infinite
		// loop.
		SE->reset(LREdit, SplitEditor::SM_Partition);
		SE->splitAroundLoop(LI, Loop, !hasPhys);

		SmallVector<unsigned, 8> IntvMap;
		SE->finish(&IntvMap);
		for (auto NewVReg : LSNewVRegs) {
		if (hasPhys)
		Matrix->assign(LIS->getInterval(NewVReg), PhysReg);
		else {
		// Remove the original VirtReg from Queue.
		RemovedFromQ.insert(&LI);
		NewVRegs.push_back(NewVReg);
		}
		}
		}
		}
		SplittedLoops.insert(Loop);
		}

unsigned RAGreedy::tryRegionSplit(LiveInterval &VirtReg, AllocationOrder &Order,		unsigned RAGreedy::tryRegionSplit(LiveInterval &VirtReg, AllocationOrder &Order,
SmallVectorImpl<unsigned> &NewVRegs) {		SmallVectorImpl<unsigned> &NewVRegs) {
unsigned NumCands = 0;		unsigned NumCands = 0;
BlockFrequency BestCost;		BlockFrequency BestCost;

// Check if we can split this live range around a compact region.		// Check if we can split this live range around a compact region.
bool HasCompact = calcCompactRegion(GlobalCand.front());		bool HasCompact = calcCompactRegion(GlobalCand.front());
if (HasCompact) {		if (HasCompact) {
▲ Show 20 Lines • Show All 612 Lines • ▼ Show 20 Lines	unsigned RAGreedy::trySplit(LiveInterval &VirtReg, AllocationOrder &Order,
// an assertion when the coalescer is fixed.		// an assertion when the coalescer is fixed.
if (SA->didRepairRange()) {		if (SA->didRepairRange()) {
// VirtReg has changed, so all cached queries are invalid.		// VirtReg has changed, so all cached queries are invalid.
Matrix->invalidateVirtRegs();		Matrix->invalidateVirtRegs();
if (unsigned PhysReg = tryAssign(VirtReg, Order, NewVRegs))		if (unsigned PhysReg = tryAssign(VirtReg, Order, NewVRegs))
return PhysReg;		return PhysReg;
}		}

		// Loop split is to try to let references inside hot loops get their best
		// chance to get physical register assigned. It achieves that by splitting
		// virtregs at the loop boundary so the live ranges outside loop will not
		// affect the coloring of live ranges inside loop.
		//
		// This is achieved in two steps. For a VirtReg with a long live range
		// across a loop and there is reference of it inside the loop, call
		// trySplitRegAroundLoop to split the VirtReg at the loop boundary. In the
		// next round of selectOrSplit, if the new VirtReg after splitting around
		// the loop gets physical register, it is a good result. If the new VirtReg
		// still cannot get physical register, it indicates precisely the loop has
		// high register pressure, and trySplitLoopInAll will be called to split
		// all the other virtregs living across the loop and make the loop a
		// relatively independent register allocation region.
		if (getStage(VirtReg) < RS_LoopSplit) {
		bool LoopSplitted = trySplitLoopInAll(VirtReg, NewVRegs);
		bool RegSplitted = trySplitRegAroundLoop(VirtReg, NewVRegs);
		if (LoopSplitted \|\| RegSplitted) {
		if (LoopSplitted && !RegSplitted) {
		NewVRegs.push_back(VirtReg.reg);
		setStage(VirtReg, RS_LoopSplit);
		}
		return 0;
		}
		}

// First try to split around a region spanning multiple blocks. RS_Split2		// First try to split around a region spanning multiple blocks. RS_Split2
// ranges already made dubious progress with region splitting, so they go		// ranges already made dubious progress with region splitting, so they go
// straight to single block splitting.		// straight to single block splitting.
if (getStage(VirtReg) < RS_Split2) {		if (getStage(VirtReg) < RS_Split2) {
unsigned PhysReg = tryRegionSplit(VirtReg, Order, NewVRegs);		unsigned PhysReg = tryRegionSplit(VirtReg, Order, NewVRegs);
if (PhysReg \|\| !NewVRegs.empty())		if (PhysReg \|\| !NewVRegs.empty())
return PhysReg;		return PhysReg;
}		}
▲ Show 20 Lines • Show All 612 Lines • ▼ Show 20 Lines	bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {
SA.reset(new SplitAnalysis(VRM, LIS, *Loops));		SA.reset(new SplitAnalysis(VRM, LIS, *Loops));
SE.reset(new SplitEditor(SA, LIS, VRM, DomTree, *MBFI));		SE.reset(new SplitEditor(SA, LIS, VRM, DomTree, *MBFI));
ExtraRegInfo.clear();		ExtraRegInfo.clear();
ExtraRegInfo.resize(MRI->getNumVirtRegs());		ExtraRegInfo.resize(MRI->getNumVirtRegs());
NextCascade = 1;		NextCascade = 1;
IntfCache.init(MF, Matrix->getLiveUnions(), Indexes, LIS, TRI);		IntfCache.init(MF, Matrix->getLiveUnions(), Indexes, LIS, TRI);
GlobalCand.resize(32); // This will grow as needed.		GlobalCand.resize(32); // This will grow as needed.
SetOfBrokenHints.clear();		SetOfBrokenHints.clear();
		RemovedFromQ.clear();
		LoopExitBlocks.clear();

allocatePhysRegs();		allocatePhysRegs();
tryHintsRecoloring();		tryHintsRecoloring();
releaseMemory();		releaseMemory();
return true;		return true;
}		}

lib/CodeGen/SplitCritEdges.cpp

				//=== SplitCritEdges.cpp - Split critical edges for loop entry and exit ---===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass performs critical edges splitting for edges at the loop entry or
				// exits. It also creates a loop preheader if the loop header has multiple
				// predecessors. It is necessary to enable register splitting around loops,
				// and it is also helpful for better register coalescing and shrinkwrapping.
				//
				//===----------------------------------------------------------------------===//
				#include "llvm/ADT/Statistic.h"
				#include "llvm/CodeGen/MachineDominators.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineLoopInfo.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/Target/TargetInstrInfo.h"
				#include "llvm/Target/TargetSubtargetInfo.h"

				#define DEBUG_TYPE "critedges-split"

				using namespace llvm;

				namespace {
				class SplitCritEdges : public MachineFunctionPass {
				MachineDominatorTree *MDT;
				MachineLoopInfo *MLI;

				public:
				static char ID;

				SplitCritEdges() : MachineFunctionPass(ID) {
				initializeSplitCritEdgesPass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<MachineDominatorTree>();
				AU.addRequired<MachineLoopInfo>();
				AU.addPreserved<MachineDominatorTree>();
				AU.addPreserved<MachineLoopInfo>();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				const char *getPassName() const override {
				return "Split Critical Edges for RegAlloc";
				}
				bool createLoopPreheader(MachineFunction &MF, MachineLoop *Loop);
				bool runOnMachineFunction(MachineFunction &MF) override;
				};
				} // End anonymous namespace.

				char SplitCritEdges::ID = 0;
				char &llvm::SplitCritEdgesID = SplitCritEdges::ID;

				INITIALIZE_PASS_BEGIN(SplitCritEdges, "critedges-split",
				"Critical Edges Split Pass", false, false)
				INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
				INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
				INITIALIZE_PASS_END(SplitCritEdges, "critedges-split",
				"Critical Edges Split Pass", false, false)

				// If the header of the Loop has multiple predecessors outside of the loop,
				// a new Loop preheader will be inserted and it will become the only predecessor
				// of the header block outside of the loop.
				bool SplitCritEdges::createLoopPreheader(MachineFunction &MF,
				MachineLoop *Loop) {
				MachineBasicBlock *Header = Loop->getHeader();
				MachineDomTreeNode *IDom = MDT->getNode(Header)->getIDom();

				// Predecessors of Header inside of loop.
				SmallPtrSet<MachineBasicBlock *, 8> PredsIn;
				// Predecessors of Header outside of loop.
				SmallPtrSet<MachineBasicBlock *, 8> PredsOut;
				MachineBasicBlock TBB, FBB;
				SmallVector<MachineOperand, 4> Cond;
				const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
				for (auto Pred : Header->predecessors()) {
				TBB = nullptr;
				FBB = nullptr;
				Cond.clear();
				if (TII->AnalyzeBranch(*Pred, TBB, FBB, Cond))
				return false;
				if (!Loop->contains(Pred))
				PredsOut.insert(Pred);
				else
				PredsIn.insert(Pred);
				}

				MachineBasicBlock *NMBB = MF.CreateMachineBasicBlock();
				MF.insert(MachineFunction::iterator(Header), NMBB);

				// Update CFG and terminator for predecessor blocks outside loop.
				for (auto Pred : PredsOut) {
				Pred->ReplaceUsesOfBlockWith(Header, NMBB);
				Pred->updateTerminator();
				}
				// Update terminator for predecessor blocks inside loop.
				for (auto Pred : PredsIn)
				Pred->updateTerminator();
				// Update CFG and terminator for the new block.
				DebugLoc DL;
				NMBB->addSuccessor(Header);
				if (!NMBB->isLayoutSuccessor(Header)) {
				Cond.clear();
				TII->InsertBranch(*NMBB, Header, nullptr, Cond, DL);
				}

				// Update PHI nodes:
				// Original PHI node in Header looks like:
				// r1_0 = PHI(r1_1, PB_in, r1_2, PB_out1, r1_3, PB_out2)
				// PB_in is the Predecessor Block inside loop. PB_outX is the Xth
				// Predecessor Block outside loop.
				// After the update, a new PHI is generated in NMBB:
				// r1_4 = PHI(r1_2, PB_out1, r1_3, PB_out2)
				// And the old PHI in Header Block is changed to:
				// r1_0 = PHI(r1_1, PB_in, r1_4, NMBB)
				MachineRegisterInfo &MRI = MF.getRegInfo();
				for (MachineBasicBlock::instr_iterator I = Header->instr_begin(),
				E = Header->instr_end();
				I != E && I->isPHI(); ++I) {
				MachineInstr OldPHI = &I;
				MachineInstr *NewPHI = TII->duplicate(OldPHI, MF);
				MachineOperand *MO = &NewPHI->getOperand(0);
				assert(MO->isDef() && "NewPHI->Operand(0) should be a Def");
				unsigned Reg = MO->getReg();
				const TargetRegisterClass *RC = MRI.getRegClass(Reg);
				unsigned NewReg = MRI.createVirtualRegister(RC);
				MO->setReg(NewReg);
				// Remove operands of which the source BB is not in PredsOut from NewPHI.
				for (unsigned i = NewPHI->getNumOperands() - 1; i >= 2; i -= 2) {
				MO = &NewPHI->getOperand(i);
				if (!PredsOut.count(MO->getMBB())) {
				NewPHI->RemoveOperand(i);
				NewPHI->RemoveOperand(i - 1);
				}
				}
				// Remove operands of which the source BB is in PredsOut from OldPHI.
				for (unsigned i = OldPHI->getNumOperands() - 1; i >= 2; i -= 2) {
				MO = &OldPHI->getOperand(i);
				if (PredsOut.count(MO->getMBB())) {
				OldPHI->RemoveOperand(i);
				OldPHI->RemoveOperand(i - 1);
				}
				}
				MachineInstrBuilder MIB(MF, OldPHI);
				MIB.addReg(NewReg).addMBB(NMBB);
				NMBB->insert(NMBB->SkipPHIsAndLabels(NMBB->begin()), NewPHI);
				}

				// Update MDT. NMBB becomes the new immediate dominator node of Header.
				MDT->addNewBlock(NMBB, IDom->getBlock());
				MDT->changeImmediateDominator(Header, NMBB);

				// Update MLI.
				if (MachineLoop *P = Loop->getParentLoop())
				P->addBasicBlockToLoop(NMBB, MLI->getBase());
				return true;
				}

				bool SplitCritEdges::runOnMachineFunction(MachineFunction &MF) {
				MDT = &getAnalysis<MachineDominatorTree>();
				MLI = &getAnalysis<MachineLoopInfo>();

				SmallVector<MachineLoop *, 8> Worklist(MLI->begin(), MLI->end());
				while (!Worklist.empty()) {
				MachineLoop *CurLoop = Worklist.pop_back_val();

				MachineBasicBlock *Preheader = CurLoop->getLoopPreheader();
				if (!Preheader) {
				MachineBasicBlock *Pred = CurLoop->getLoopPredecessor();
				if (Pred && Pred->SplitCriticalEdge(CurLoop->getHeader(), this)) {
				DEBUG(dbgs() << "Split critical entry edge for " << *CurLoop);
				DEBUG(dbgs() << "BB#" << Pred->getNumber() << " --> "
				<< CurLoop->getHeader()->getNumber() << "\n");
				} else if (createLoopPreheader(MF, CurLoop)) {
				DEBUG(dbgs() << "Create a new loop header for " << *CurLoop);
				DEBUG(dbgs() << "new preheader BB#"
				<< CurLoop->getLoopPreheader()->getNumber() << "\n");
				}
				}

				SmallVector<MachineBasicBlock *, 8> ExitBlocks;
				SmallPtrSet<MachineBasicBlock *, 8> UniqExitBlocks;
				SmallPtrSet<MachineBasicBlock *, 8> PredBlocks;
				CurLoop->getExitBlocks(ExitBlocks);
				UniqExitBlocks.insert(ExitBlocks.begin(), ExitBlocks.end());
				for (auto ExitBlock : UniqExitBlocks) {
				bool ToSplit = false;
				for (auto PredBB : ExitBlock->predecessors()) {
				if (!CurLoop->contains(PredBB)) {
				ToSplit = true;
				break;
				}
				}
				if (ToSplit) {
				// When splitting critical edges, ExitBlock->predecessors() will
				// be changed, so save ExitBlock->predecessors() in PredBlocks
				// before doing any splitting.
				PredBlocks.insert(ExitBlock->predecessors().begin(),
				ExitBlock->predecessors().end());
				for (auto PredBB : PredBlocks) {
				if (CurLoop->contains(PredBB) &&
				PredBB->SplitCriticalEdge(ExitBlock, this)) {
				DEBUG(dbgs() << "Split critical exit edge for " << *CurLoop);
				DEBUG(dbgs() << "BB#" << PredBB->getNumber() << " --> "
				<< ExitBlock->getNumber() << "\n");
				}
				}
				PredBlocks.clear();
				}
				}

				Worklist.append(CurLoop->begin(), CurLoop->end());
				}
				return false;
				}

lib/CodeGen/SplitKit.h

Show All 23 Lines
namespace llvm {		namespace llvm {

class ConnectedVNInfoEqClasses;		class ConnectedVNInfoEqClasses;
class LiveInterval;		class LiveInterval;
class LiveIntervals;		class LiveIntervals;
class LiveRangeEdit;		class LiveRangeEdit;
class MachineBlockFrequencyInfo;		class MachineBlockFrequencyInfo;
class MachineInstr;		class MachineInstr;
		class MachineLoop;
class MachineLoopInfo;		class MachineLoopInfo;
class MachineRegisterInfo;		class MachineRegisterInfo;
class TargetInstrInfo;		class TargetInstrInfo;
class TargetRegisterInfo;		class TargetRegisterInfo;
class VirtRegMap;		class VirtRegMap;
class VNInfo;		class VNInfo;
class raw_ostream;		class raw_ostream;

▲ Show 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	public:
unsigned openIntv();		unsigned openIntv();

/// currentIntv - Return the current interval index.		/// currentIntv - Return the current interval index.
unsigned currentIntv() const { return OpenIdx; }		unsigned currentIntv() const { return OpenIdx; }

/// selectIntv - Select a previously opened interval index.		/// selectIntv - Select a previously opened interval index.
void selectIntv(unsigned Idx);		void selectIntv(unsigned Idx);

		/// splitAroundLoop - Split VirtReg around the loop.
		void splitAroundLoop(LiveInterval &VirtReg, MachineLoop *Loop, bool Tight);

/// enterIntvBefore - Enter the open interval before the instruction at Idx.		/// enterIntvBefore - Enter the open interval before the instruction at Idx.
/// If the parent interval is not live before Idx, a COPY is not inserted.		/// If the parent interval is not live before Idx, a COPY is not inserted.
/// Return the beginning of the new live range.		/// Return the beginning of the new live range.
SlotIndex enterIntvBefore(SlotIndex Idx);		SlotIndex enterIntvBefore(SlotIndex Idx);

/// enterIntvAfter - Enter the open interval after the instruction at Idx.		/// enterIntvAfter - Enter the open interval after the instruction at Idx.
/// Return the beginning of the new live range.		/// Return the beginning of the new live range.
SlotIndex enterIntvAfter(SlotIndex Idx);		SlotIndex enterIntvAfter(SlotIndex Idx);
Show All 16 Lines	public:
/// leaveIntvBefore - Leave the open interval before the instruction at Idx.		/// leaveIntvBefore - Leave the open interval before the instruction at Idx.
/// Return the end of the live range.		/// Return the end of the live range.
SlotIndex leaveIntvBefore(SlotIndex Idx);		SlotIndex leaveIntvBefore(SlotIndex Idx);

/// leaveIntvAtTop - Leave the interval at the top of MBB.		/// leaveIntvAtTop - Leave the interval at the top of MBB.
/// Add liveness from the MBB top to the copy.		/// Add liveness from the MBB top to the copy.
/// Return the end of the live range.		/// Return the end of the live range.
SlotIndex leaveIntvAtTop(MachineBasicBlock &MBB);		SlotIndex leaveIntvAtTop(MachineBasicBlock &MBB);
		SlotIndex leaveIntvAtEnd(MachineBasicBlock &MBB);

/// overlapIntv - Indicate that all instructions in range should use the open		/// overlapIntv - Indicate that all instructions in range should use the open
/// interval, but also let the complement interval be live.		/// interval, but also let the complement interval be live.
///		///
/// This doubles the register pressure, but is sometimes required to deal with		/// This doubles the register pressure, but is sometimes required to deal with
/// register uses after the last valid split point.		/// register uses after the last valid split point.
///		///
/// The Start index should be a return value from a leaveIntv* call, and End		/// The Start index should be a return value from a leaveIntv* call, and End
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

lib/CodeGen/SplitKit.cpp

Show First 20 Lines • Show All 460 Lines • ▼ Show 20 Lines

void SplitEditor::selectIntv(unsigned Idx) {		void SplitEditor::selectIntv(unsigned Idx) {
assert(Idx != 0 && "Cannot select the complement interval");		assert(Idx != 0 && "Cannot select the complement interval");
assert(Idx < Edit->size() && "Can only select previously opened interval");		assert(Idx < Edit->size() && "Can only select previously opened interval");
DEBUG(dbgs() << " selectIntv " << OpenIdx << " -> " << Idx << '\n');		DEBUG(dbgs() << " selectIntv " << OpenIdx << " -> " << Idx << '\n');
OpenIdx = Idx;		OpenIdx = Idx;
}		}

		/// Split the VirtReg in the Preheader Block or ExitBlock of the loop.
		///
		/// If Tight is true, VirtReg will be splitted at the bottom of the Preheader
		/// Block or at the top of the ExitBlock. If Tight is false, VirtReg will be
		/// splitted at the top of the Preheader Block or at the bottom of the
		/// ExitBlock. If the VirtReg to be splitted hasn't been assigned physical
		/// register yet, we want to split it tightly because the new VirtReg after
		/// split will has less interference with other VirtRegs inside Preheader or
		/// ExitBlock. If the VirtReg to be splitted here has physical register
		/// assigned before the splitting, we want to split it loosely because if
		/// the new VirtReg after split is evicted, the physical register can be used
		/// by other new VirtRegs also generated by loop split and tightly around loop.
		void SplitEditor::splitAroundLoop(LiveInterval &VirtReg, MachineLoop *Loop,
		bool Tight) {
		openIntv();
		// Add copy to Preheader if VirtReg is live at the entry of Preheader.
		// The copy will be from the remainder interval to the new interval.
		MachineBasicBlock *Preheader = Loop->getLoopPreheader();
		if (LIS.isLiveInToMBB(VirtReg, Preheader) &&
		LIS.isLiveOutOfMBB(VirtReg, Preheader)) {
		if (Tight) {
		// Split the loop tightly, .i.e, split at the bottom of preheader BB.
		MachineBasicBlock::const_iterator FirstTerm =
		Preheader->getFirstTerminator();
		MachineBasicBlock::const_iterator EndIter = Preheader->end();
		if (FirstTerm != EndIter) {
		SlotIndex Idx = enterIntvBefore(LIS.getInstructionIndex(FirstTerm));
		SlotIndex Stop = LIS.getSlotIndexes()->getMBBEndIdx(Preheader);
		useIntv(Idx, Stop);
		} else {
		enterIntvAtEnd(*Preheader);
		}
		} else {
		// Split the loop loosely, .i.e, split at the top of preheader BB.
		MachineBasicBlock::iterator FirstIt =
		(*Preheader->SkipPHIsAndLabels(Preheader->begin()));
		if (FirstIt != Preheader->end()) {
		SlotIndex FirstIdx = LIS.getInstructionIndex(FirstIt);
		SlotIndex Idx = enterIntvBefore(FirstIdx);
		SlotIndex Stop = LIS.getSlotIndexes()->getMBBEndIdx(Preheader);
		useIntv(Idx, Stop);
		} else {
		enterIntvAtEnd(*Preheader);
		}
		}
		} else if (LIS.isLiveOutOfMBB(VirtReg, Preheader)) {
		useIntv(*Preheader);
		}

		// Add copy to ExitBlock if VirtReg is live at the exit of the ExitBlock.
		// The copy will be from the new interval to the remainder interval.
		SmallVector<MachineBasicBlock *, 8> ExitBlocks;
		Loop->getExitBlocks(ExitBlocks);
		SmallPtrSet<MachineBasicBlock *, 8> UniqExitBlocks;
		UniqExitBlocks.insert(ExitBlocks.begin(), ExitBlocks.end());
		for (MachineBasicBlock *ExitBlock : UniqExitBlocks) {
		if (LIS.isLiveOutOfMBB(VirtReg, ExitBlock) &&
		LIS.isLiveInToMBB(VirtReg, ExitBlock)) {
		if (Tight) {
		// Split the loop tightly at the exit, .i.e, split at the top of
		// ExitBlock.
		leaveIntvAtTop(*ExitBlock);
		} else {
		// Split the loop loosely at the exit, .i.e, split at the bottom of
		// ExitBlock.
		MachineBasicBlock::const_iterator FirstTerm =
		ExitBlock->getFirstTerminator();
		MachineBasicBlock::const_iterator EndIter = ExitBlock->end();
		if (FirstTerm != EndIter) {
		SlotIndex Idx = leaveIntvBefore(LIS.getInstructionIndex(FirstTerm));
		SlotIndex Start = LIS.getSlotIndexes()->getMBBStartIdx(ExitBlock);
		useIntv(Start, Idx);
		} else {
		leaveIntvAtEnd(*ExitBlock);
		}
		}
		} else if (LIS.isLiveInToMBB(VirtReg, ExitBlock)) {
		useIntv(*ExitBlock);
		}
		}

		// For the Block inside loop, use the new interval.
		for (MachineBasicBlock *LoopBB : Loop->getBlocks())
		useIntv(*LoopBB);
		}

SlotIndex SplitEditor::enterIntvBefore(SlotIndex Idx) {		SlotIndex SplitEditor::enterIntvBefore(SlotIndex Idx) {
assert(OpenIdx && "openIntv not called before enterIntvBefore");		assert(OpenIdx && "openIntv not called before enterIntvBefore");
DEBUG(dbgs() << " enterIntvBefore " << Idx);		DEBUG(dbgs() << " enterIntvBefore " << Idx);
Idx = Idx.getBaseIndex();		Idx = Idx.getBaseIndex();
VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(Idx);		VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(Idx);
if (!ParentVNI) {		if (!ParentVNI) {
DEBUG(dbgs() << ": not live\n");		DEBUG(dbgs() << ": not live\n");
return Idx;		return Idx;
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	SlotIndex SplitEditor::leaveIntvAtTop(MachineBasicBlock &MBB) {

VNInfo *VNI = defFromParent(0, ParentVNI, Start, MBB,		VNInfo *VNI = defFromParent(0, ParentVNI, Start, MBB,
MBB.SkipPHIsAndLabels(MBB.begin()));		MBB.SkipPHIsAndLabels(MBB.begin()));
RegAssign.insert(Start, VNI->def, OpenIdx);		RegAssign.insert(Start, VNI->def, OpenIdx);
DEBUG(dump());		DEBUG(dump());
return VNI->def;		return VNI->def;
}		}

		SlotIndex SplitEditor::leaveIntvAtEnd(MachineBasicBlock &MBB) {
		assert(OpenIdx && "openIntv not called before leaveIntvAtEnd");
		SlotIndex Start = LIS.getMBBStartIdx(&MBB);
		SlotIndex Last = LIS.getMBBEndIdx(&MBB).getPrevSlot();
		DEBUG(dbgs() << " leaveIntvAtEnd BB#" << MBB.getNumber() << ", " << Last);
		VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(Last);
		if (!ParentVNI) {
		DEBUG(dbgs() << ": not live\n");
		return Last;
		}
		DEBUG(dbgs() << ": valno " << ParentVNI->id);
		VNInfo *VNI =
		defFromParent(0, ParentVNI, Last, MBB, SA.getLastSplitPointIter(&MBB));
		RegAssign.insert(Start, VNI->def, OpenIdx);
		DEBUG(dump());
		return VNI->def;
		}

void SplitEditor::overlapIntv(SlotIndex Start, SlotIndex End) {		void SplitEditor::overlapIntv(SlotIndex Start, SlotIndex End) {
assert(OpenIdx && "openIntv not called before overlapIntv");		assert(OpenIdx && "openIntv not called before overlapIntv");
const VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(Start);		const VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(Start);
assert(ParentVNI == Edit->getParent().getVNInfoBefore(End) &&		assert(ParentVNI == Edit->getParent().getVNInfoBefore(End) &&
"Parent changes value in extended range");		"Parent changes value in extended range");
assert(LIS.getMBBFromIndex(Start) == LIS.getMBBFromIndex(End) &&		assert(LIS.getMBBFromIndex(Start) == LIS.getMBBFromIndex(End) &&
"Range cannot span basic blocks");		"Range cannot span basic blocks");

▲ Show 20 Lines • Show All 802 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Greedy Regalloc] Reg splitting based on loopsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 48236

include/llvm/CodeGen/Passes.h

include/llvm/InitializePasses.h

lib/CodeGen/CMakeLists.txt

lib/CodeGen/CodeGen.cpp

lib/CodeGen/Passes.cpp

lib/CodeGen/RegAllocGreedy.cpp

lib/CodeGen/SplitCritEdges.cpp

lib/CodeGen/SplitKit.h

lib/CodeGen/SplitKit.cpp

[Greedy Regalloc] Reg splitting based on loops
Needs ReviewPublic