This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/
-
CodeGen/
-
RegisterCoalescer.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
late-remat-update.mir

Differential D49519

[RegisterCoalescer] Delay live interval update work until the rematerialization for all the uses from the same def is done
ClosedPublic

Authored by wmi on Jul 18 2018, 4:30 PM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB

Commits

rG3c1c088500c9: [RegisterCoalescer] Delay live interval update work until the rematerialization…
rL339035: [RegisterCoalescer] Delay live interval update work until the rematerialization

Summary

We run into a compile time problem with flex generated code combined with -fno-jump-tables. The cause is that machineLICM hoists a lot of invariants outside of a big loop, and drastically increases the compile time in global register splitting and copy coalescing. https://reviews.llvm.org/D49353 relieves the problem in global splitting. This patch is to handle the problem in copy coalescing.

About the situation where the problem in copy coalescing happens. After machineLICM, we have several defs outside of a big loop with hundreds or thousands of uses inside the loop. Rematerialization in copy coalescing happens for each use and everytime rematerialization is done, shrinkToUses will be called to update the huge live interval. Because we have 'n' uses for a def, and each live interval update will have at least 'n' complexity, the total update work is n^2.

To fix the problem, we try to do the live interval update work in a collective way. If a def has many copylike uses larger than a threshold, each time rematerialization is done for one of those uses, we won't do the live interval update in time but delay that work until rematerialization for all those uses are completed, so we only have to do the live interval update work once.

Delaying the live interval update could potentially change the copy coalescing result, so we hope to limit that change to those defs with many (like above a hundred) copylike uses.

I am running internal performance testing at the same time. I also run stress testing using clang bootstrap by setting the threshold to 0, i.e., delay all the live interval update work after rematerialization, and it passes.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi created this revision.Jul 18 2018, 4:30 PM

Nice approach!

I'd love to know if the old pattern has any real advantages. It would seem much simpler to just always defer the interval update until all the copies have been coalesced. Maybe its worth collecting perf data there too? If we can get away with the simpler (non-heuristic) model, it seems better.

Having invalid intermediate live ranges seems scary, and I'm not naturally convinced this is fine. How much have you tested this change yet?

In D49519#1167444, @MatzeB wrote:

Having invalid intermediate live ranges seems scary, and I'm not naturally convinced this is fine. How much have you tested this change yet?

The invalid live ranges are postponing to shrink some live ranges, so it is a conservative approach and shouldn't affect correctness, right? About the test, I set LateRematUpdateThreshold by default to 0 and bootstrap clang, then run through internal performance test. No correctness regression found.

Although there is performance perturbation in performance testing, that is within our expectation because failing to shrink live range in time could potentially hurt performance. That is why I choose to apply the change with a threshold.

Herald added a subscriber: tpr. · View Herald TranscriptJul 19 2018, 8:40 AM

Did two large server testing by setting LateRematUpdateThreshold to 0. No correctness issues found.

Ping.

I think I hit the compile time problem mentioned here too. Some files in our project can take 5-10 minutes to compile (depending on the machine compiling) and 80-85% of the time is spent in shrinkToUses in the registercoalescer.

A part that I'm interested into is if you know how changing this affects for example "resolveConflicts()" and "mapValue()". I don't have much experience with the RegisterCoalescer code, but from what I see those functions use LiveInterval information to check if values overlap (for example). I was wondering if we are sure that it is not affecting any of that from a correctness standpoint. From what I understand the delay is applied only if the rematerialization happens on a value used by a lot of coalesceable copies.
Have you tried running all your tests with the delay threshold set to the minimum , so that it is always delayed no matter what even if it used by a low number of copies?

Oh, I didn’t see the message where you said you tried. In that case disregard my question about testing with a lower threshold

I think we can go ahead and commit to see how it fares in tree.

This revision is now accepted and ready to land.Aug 2 2018, 4:54 PM

Closed by commit rL339035: [RegisterCoalescer] Delay live interval update work until the rematerialization (authored by wmi). · Explain WhyAug 6 2018, 10:31 AM

This revision was automatically updated to reflect the committed changes.

Hello @wmi, in our local fuzzing testing we've got the following assert:
src/include/llvm/ADT/IntervalMap.h:630: unsigned int llvm::IntervalMapImpl::LeafNode<KeyT, ValT, N, Traits>::insertFrom(unsigned int&, unsigned int, KeyT, KeyT, ValT) [with KeyT = llvm::SlotIndex; ValT = unsigned int; unsigned int N = 9u; Traits = llvm::IntervalMapInfo<llvm::SlotIndex>]: Assertion `!Traits::stopLess(b, a) && "Invalid interval"' failed.

Triage points to this patch. Revert of this patch eliminates an assert. Also if I specify -late-remat-update-threshold=100000 test also passes.
Unfortunately at this moment I cannot create a LLVM reproducer for this bug. Also This bug is not reproducible on the current trunk however I have a suspicion that it is just hided not fixed.
I continue investigation.

However if you have some insides or suggestion what it could be or what other commits could fix that bug, could you please share it, it would probably save me a lot of time to narrowing it down.

The stack trace of the crash looks like:
#0 0x00007ffff700c5d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff700dcc8 in abort () from /lib64/libc.so.6
#2 0x00007ffff7005546 in assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff70055f2 in assert_fail () from /lib64/libc.so.6
#4 0x00007ffff13b334f in llvm::IntervalMapImpl::LeafNode<llvm::SlotIndex, unsigned int, 9u, llvm::IntervalMapInfo<llvm::SlotIndex> >::insertFrom (this=this@entry=0x7fff4811c0b8, Pos=@0x7fff5d8d4920: 2,

Size=2, a=..., a@entry=..., b=..., y=<optimized out>) at include/llvm/ADT/IntervalMap.h:630

#5 0x00007ffff13bd56d in llvm::IntervalMap<llvm::SlotIndex, unsigned int, 9u, llvm::IntervalMapInfo<llvm::SlotIndex> >::insert (this=this@entry=0x7fff4811c0b8, a=a@entry=..., b=..., b@entry=...,

y=<optimized out>) at include/llvm/ADT/IntervalMap.h:1092

#6 0x00007ffff13bd8ef in llvm::SplitEditor::useIntv (this=this@entry=0x7fff4811bff0, Start=Start@entry=..., End=...)

at lib/CodeGen/SplitKit.cpp:754

#7 0x00007ffff13beeee in llvm::SplitEditor::splitSingleBlock (this=0x7fff4811bff0, BI=...) at lib/CodeGen/SplitKit.cpp:1580
#8 0x00007ffff13297c7 in splitAroundRegion (UsedCands=..., LREdit=..., this=0x7fff482fd010)

at lib/CodeGen/RegAllocGreedy.cpp:1685

#9 (anonymous namespace)::RAGreedy::doRegionSplit (this=0x7fff482fd010, VirtReg=..., BestCand=6, HasCompact=<optimized out>, NewVRegs=...)

at lib/CodeGen/RegAllocGreedy.cpp:1968

#10 0x00007ffff1335b49 in tryRegionSplit (NewVRegs=..., Order=..., VirtReg=..., this=<optimized out>)

at lib/CodeGen/RegAllocGreedy.cpp:1832

#11 trySplit (NewVRegs=..., Order=..., VirtReg=..., this=<optimized out>) at lib/CodeGen/RegAllocGreedy.cpp:2453
#12 (anonymous namespace)::RAGreedy::selectOrSplitImpl (this=0x7fff482fd010, VirtReg=..., NewVRegs=..., FixedRegisters=..., Depth=1, Depth@entry=0)

at lib/CodeGen/RegAllocGreedy.cpp:3052

#13 0x00007ffff1335f77 in (anonymous namespace)::RAGreedy::selectOrSplit (this=0x7fff482fd010, VirtReg=..., NewVRegs=...)

at lib/CodeGen/RegAllocGreedy.cpp:2732

#14 0x00007ffff131905e in llvm::RegAllocBase::allocatePhysRegs (this=0x7fff482fd078, this@entry=0x7ffff3e19006 <llvm::MachineModuleInfo::ID>)

at lib/CodeGen/RegAllocBase.cpp:113

#15 0x00007ffff132f7b1 in (anonymous namespace)::RAGreedy::runOnMachineFunction (this=0x7fff482fd010, mf=...)

at lib/CodeGen/RegAllocGreedy.cpp:3207

#16 0x00007ffff123b6a5 in llvm::MachineFunctionPass::runOnFunction (this=0x7fff482fd010, F=...)

at lib/CodeGen/MachineFunctionPass.cpp:61

#17 0x00007ffff1061957 in llvm::FPPassManager::runOnFunction (this=0x7fff480bd530, F=...) at lib/IR/LegacyPassManager.cpp:1586

I was able to create a pure LLVM reproducer for this issue and filed a bug https://bugs.llvm.org/show_bug.cgi?id=40061.
Could you please take a look into it?

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

RegisterCoalescer.cpp

63 lines

test/

CodeGen/

X86/

late-remat-update.mir

118 lines

Diff 159334

llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp

Show All 10 Lines
// is used as the common interface used by all clients and		// is used as the common interface used by all clients and
// implementations of register coalescing.		// implementations of register coalescing.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "RegisterCoalescer.h"		#include "RegisterCoalescer.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/BitVector.h"		#include "llvm/ADT/BitVector.h"
		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/LiveInterval.h"		#include "llvm/CodeGen/LiveInterval.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
#include "llvm/CodeGen/LiveRangeEdit.h"		#include "llvm/CodeGen/LiveRangeEdit.h"
Show All 37 Lines
STATISTIC(numJoins , "Number of interval joins performed");		STATISTIC(numJoins , "Number of interval joins performed");
STATISTIC(numCrossRCs , "Number of cross class joins performed");		STATISTIC(numCrossRCs , "Number of cross class joins performed");
STATISTIC(numCommutes , "Number of instruction commuting performed");		STATISTIC(numCommutes , "Number of instruction commuting performed");
STATISTIC(numExtends , "Number of copies extended");		STATISTIC(numExtends , "Number of copies extended");
STATISTIC(NumReMats , "Number of instructions re-materialized");		STATISTIC(NumReMats , "Number of instructions re-materialized");
STATISTIC(NumInflated , "Number of register classes inflated");		STATISTIC(NumInflated , "Number of register classes inflated");
STATISTIC(NumLaneConflicts, "Number of dead lane conflicts tested");		STATISTIC(NumLaneConflicts, "Number of dead lane conflicts tested");
STATISTIC(NumLaneResolves, "Number of dead lane conflicts resolved");		STATISTIC(NumLaneResolves, "Number of dead lane conflicts resolved");
		STATISTIC(NumShrinkToUses, "Number of shrinkToUses called");

static cl::opt<bool> EnableJoining("join-liveintervals",		static cl::opt<bool> EnableJoining("join-liveintervals",
cl::desc("Coalesce copies (default=true)"),		cl::desc("Coalesce copies (default=true)"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

static cl::opt<bool> UseTerminalRule("terminal-rule",		static cl::opt<bool> UseTerminalRule("terminal-rule",
cl::desc("Apply the terminal rule"),		cl::desc("Apply the terminal rule"),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);
Show All 9 Lines	EnableGlobalCopies("join-globalcopies",
cl::desc("Coalesce copies that span blocks (default=subtarget)"),		cl::desc("Coalesce copies that span blocks (default=subtarget)"),
cl::init(cl::BOU_UNSET), cl::Hidden);		cl::init(cl::BOU_UNSET), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
VerifyCoalescing("verify-coalescing",		VerifyCoalescing("verify-coalescing",
cl::desc("Verify machine instrs before and after register coalescing"),		cl::desc("Verify machine instrs before and after register coalescing"),
cl::Hidden);		cl::Hidden);

		static cl::opt<unsigned> LateRematUpdateThreshold(
		"late-remat-update-threshold", cl::Hidden,
		cl::desc("During rematerialization for a copy, if the def instruction has "
		"many other copy uses to be rematerialized, delay the multiple "
		"separate live interval update work and do them all at once after "
		"all those rematerialization are done. It will save a lot of "
		"repeated work. "),
		cl::init(100));

namespace {		namespace {

class RegisterCoalescer : public MachineFunctionPass,		class RegisterCoalescer : public MachineFunctionPass,
private LiveRangeEdit::Delegate {		private LiveRangeEdit::Delegate {
MachineFunction* MF;		MachineFunction* MF;
MachineRegisterInfo* MRI;		MachineRegisterInfo* MRI;
const TargetRegisterInfo* TRI;		const TargetRegisterInfo* TRI;
const TargetInstrInfo* TII;		const TargetInstrInfo* TII;
Show All 27 Lines	class RegisterCoalescer : public MachineFunctionPass,
SmallPtrSet<MachineInstr*, 8> ErasedInstrs;		SmallPtrSet<MachineInstr*, 8> ErasedInstrs;

/// Dead instructions that are about to be deleted.		/// Dead instructions that are about to be deleted.
SmallVector<MachineInstr*, 8> DeadDefs;		SmallVector<MachineInstr*, 8> DeadDefs;

/// Virtual registers to be considered for register class inflation.		/// Virtual registers to be considered for register class inflation.
SmallVector<unsigned, 8> InflateRegs;		SmallVector<unsigned, 8> InflateRegs;

		/// The collection of live intervals which should have been updated
		/// immediately after rematerialiation but delayed until
		/// lateLiveIntervalUpdate is called.
		DenseSet<unsigned> ToBeUpdated;

/// Recursively eliminate dead defs in DeadDefs.		/// Recursively eliminate dead defs in DeadDefs.
void eliminateDeadDefs();		void eliminateDeadDefs();

/// LiveRangeEdit callback for eliminateDeadDefs().		/// LiveRangeEdit callback for eliminateDeadDefs().
void LRE_WillEraseInstruction(MachineInstr *MI) override;		void LRE_WillEraseInstruction(MachineInstr *MI) override;

/// Coalesce the LocalWorkList.		/// Coalesce the LocalWorkList.
void coalesceLocals();		void coalesceLocals();

/// Join compatible live intervals		/// Join compatible live intervals
void joinAllIntervals();		void joinAllIntervals();

/// Coalesce copies in the specified MBB, putting		/// Coalesce copies in the specified MBB, putting
/// copies that cannot yet be coalesced into WorkList.		/// copies that cannot yet be coalesced into WorkList.
void copyCoalesceInMBB(MachineBasicBlock *MBB);		void copyCoalesceInMBB(MachineBasicBlock *MBB);

/// Tries to coalesce all copies in CurrList. Returns true if any progress		/// Tries to coalesce all copies in CurrList. Returns true if any progress
/// was made.		/// was made.
bool copyCoalesceWorkList(MutableArrayRef<MachineInstr*> CurrList);		bool copyCoalesceWorkList(MutableArrayRef<MachineInstr*> CurrList);

		/// If one def has many copy like uses, and those copy uses are all
		/// rematerialized, the live interval update needed for those
		/// rematerializations will be delayed and done all at once instead
		/// of being done multiple times. This is to save compile cost becuase
		/// live interval update is costly.
		void lateLiveIntervalUpdate();

/// Attempt to join intervals corresponding to SrcReg/DstReg, which are the		/// Attempt to join intervals corresponding to SrcReg/DstReg, which are the
/// src/dst of the copy instruction CopyMI. This returns true if the copy		/// src/dst of the copy instruction CopyMI. This returns true if the copy
/// was successfully coalesced away. If it is not currently possible to		/// was successfully coalesced away. If it is not currently possible to
/// coalesce this interval, but it may be possible if other things get		/// coalesce this interval, but it may be possible if other things get
/// coalesced, then it returns true by reference in 'Again'.		/// coalesced, then it returns true by reference in 'Again'.
bool joinCopy(MachineInstr *CopyMI, bool &Again);		bool joinCopy(MachineInstr *CopyMI, bool &Again);

/// Attempt to join these two intervals. On failure, this		/// Attempt to join these two intervals. On failure, this
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	class RegisterCoalescer : public MachineFunctionPass,
/// Dst, we can drop \p Copy.		/// Dst, we can drop \p Copy.
bool applyTerminalRule(const MachineInstr &Copy) const;		bool applyTerminalRule(const MachineInstr &Copy) const;

/// Wrapper method for \see LiveIntervals::shrinkToUses.		/// Wrapper method for \see LiveIntervals::shrinkToUses.
/// This method does the proper fixing of the live-ranges when the afore		/// This method does the proper fixing of the live-ranges when the afore
/// mentioned method returns true.		/// mentioned method returns true.
void shrinkToUses(LiveInterval *LI,		void shrinkToUses(LiveInterval *LI,
SmallVectorImpl<MachineInstr * > *Dead = nullptr) {		SmallVectorImpl<MachineInstr * > *Dead = nullptr) {
		NumShrinkToUses++;
if (LIS->shrinkToUses(LI, Dead)) {		if (LIS->shrinkToUses(LI, Dead)) {
/// Check whether or not \p LI is composed by multiple connected		/// Check whether or not \p LI is composed by multiple connected
/// components and if that is the case, fix that.		/// components and if that is the case, fix that.
SmallVector<LiveInterval*, 8> SplitLIs;		SmallVector<LiveInterval*, 8> SplitLIs;
LIS->splitSeparateComponents(*LI, SplitLIs);		LIS->splitSeparateComponents(*LI, SplitLIs);
}		}
}		}

▲ Show 20 Lines • Show All 1,091 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = NewMIImplDefs.size(); i != e; ++i) {
for (MCRegUnitIterator Units(Reg, TRI); Units.isValid(); ++Units)		for (MCRegUnitIterator Units(Reg, TRI); Units.isValid(); ++Units)
if (LiveRange LR = LIS->getCachedRegUnit(Units))		if (LiveRange LR = LIS->getCachedRegUnit(Units))
LR->createDeadDef(NewMIIdx.getRegSlot(), LIS->getVNInfoAllocator());		LR->createDeadDef(NewMIIdx.getRegSlot(), LIS->getVNInfoAllocator());
}		}

LLVM_DEBUG(dbgs() << "Remat: " << NewMI);		LLVM_DEBUG(dbgs() << "Remat: " << NewMI);
++NumReMats;		++NumReMats;

// The source interval can become smaller because we removed a use.
shrinkToUses(&SrcInt, &DeadDefs);
if (!DeadDefs.empty()) {
// If the virtual SrcReg is completely eliminated, update all DBG_VALUEs		// If the virtual SrcReg is completely eliminated, update all DBG_VALUEs
// to describe DstReg instead.		// to describe DstReg instead.
		if (MRI->use_nodbg_empty(SrcReg)) {
for (MachineOperand &UseMO : MRI->use_operands(SrcReg)) {		for (MachineOperand &UseMO : MRI->use_operands(SrcReg)) {
MachineInstr *UseMI = UseMO.getParent();		MachineInstr *UseMI = UseMO.getParent();
if (UseMI->isDebugValue()) {		if (UseMI->isDebugValue()) {
UseMO.setReg(DstReg);		UseMO.setReg(DstReg);
// Move the debug value directly after the def of the rematerialized		// Move the debug value directly after the def of the rematerialized
// value in DstReg.		// value in DstReg.
MBB->splice(std::next(NewMI.getIterator()), UseMI->getParent(), UseMI);		MBB->splice(std::next(NewMI.getIterator()), UseMI->getParent(), UseMI);
LLVM_DEBUG(dbgs() << "\t\tupdated: " << *UseMI);		LLVM_DEBUG(dbgs() << "\t\tupdated: " << *UseMI);
}		}
}		}
eliminateDeadDefs();
}		}

		if (ToBeUpdated.count(SrcReg))
		return true;

		long NumCopyUses = 0;
		for (MachineOperand &UseMO : MRI->use_nodbg_operands(SrcReg)) {
		if (UseMO.getParent()->isCopyLike())
		NumCopyUses++;
		}
		if (NumCopyUses < LateRematUpdateThreshold) {
		// The source interval can become smaller because we removed a use.
		shrinkToUses(&SrcInt, &DeadDefs);
		if (!DeadDefs.empty())
		eliminateDeadDefs();
		} else {
		ToBeUpdated.insert(SrcReg);
		}
return true;		return true;
}		}

MachineInstr RegisterCoalescer::eliminateUndefCopy(MachineInstr CopyMI) {		MachineInstr RegisterCoalescer::eliminateUndefCopy(MachineInstr CopyMI) {
// ProcessImplicitDefs may leave some copies of <undef> values, it only		// ProcessImplicitDefs may leave some copies of <undef> values, it only
// removes local variables. When we have a copy like:		// removes local variables. When we have a copy like:
//		//
// %1 = COPY undef %2		// %1 = COPY undef %2
▲ Show 20 Lines • Show All 1,891 Lines • ▼ Show 20 Lines	static bool isLocalCopy(MachineInstr Copy, const LiveIntervals LIS) {
if (TargetRegisterInfo::isPhysicalRegister(SrcReg)		if (TargetRegisterInfo::isPhysicalRegister(SrcReg)
\|\| TargetRegisterInfo::isPhysicalRegister(DstReg))		\|\| TargetRegisterInfo::isPhysicalRegister(DstReg))
return false;		return false;

return LIS->intervalIsInOneMBB(LIS->getInterval(SrcReg))		return LIS->intervalIsInOneMBB(LIS->getInterval(SrcReg))
\|\| LIS->intervalIsInOneMBB(LIS->getInterval(DstReg));		\|\| LIS->intervalIsInOneMBB(LIS->getInterval(DstReg));
}		}

		void RegisterCoalescer::lateLiveIntervalUpdate() {
		for (unsigned reg : ToBeUpdated) {
		if (!LIS->hasInterval(reg))
		continue;
		LiveInterval &LI = LIS->getInterval(reg);
		shrinkToUses(&LI, &DeadDefs);
		if (!DeadDefs.empty())
		eliminateDeadDefs();
		}
		ToBeUpdated.clear();
		}

bool RegisterCoalescer::		bool RegisterCoalescer::
copyCoalesceWorkList(MutableArrayRef<MachineInstr*> CurrList) {		copyCoalesceWorkList(MutableArrayRef<MachineInstr*> CurrList) {
bool Progress = false;		bool Progress = false;
for (unsigned i = 0, e = CurrList.size(); i != e; ++i) {		for (unsigned i = 0, e = CurrList.size(); i != e; ++i) {
if (!CurrList[i])		if (!CurrList[i])
continue;		continue;
// Skip instruction pointers that have already been erased, for example by		// Skip instruction pointers that have already been erased, for example by
// dead code elimination.		// dead code elimination.
▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	void RegisterCoalescer::joinAllIntervals() {
for (unsigned i = 0, e = MBBs.size(); i != e; ++i) {		for (unsigned i = 0, e = MBBs.size(); i != e; ++i) {
// Try coalescing the collected local copies for deeper loops.		// Try coalescing the collected local copies for deeper loops.
if (JoinGlobalCopies && MBBs[i].Depth < CurrDepth) {		if (JoinGlobalCopies && MBBs[i].Depth < CurrDepth) {
coalesceLocals();		coalesceLocals();
CurrDepth = MBBs[i].Depth;		CurrDepth = MBBs[i].Depth;
}		}
copyCoalesceInMBB(MBBs[i].MBB);		copyCoalesceInMBB(MBBs[i].MBB);
}		}
		lateLiveIntervalUpdate();
coalesceLocals();		coalesceLocals();

// Joining intervals can allow other intervals to be joined. Iteratively join		// Joining intervals can allow other intervals to be joined. Iteratively join
// until we make no progress.		// until we make no progress.
while (copyCoalesceWorkList(WorkList))		while (copyCoalesceWorkList(WorkList))
/* empty */ ;		/* empty */ ;
		lateLiveIntervalUpdate();
}		}

void RegisterCoalescer::releaseMemory() {		void RegisterCoalescer::releaseMemory() {
ErasedInstrs.clear();		ErasedInstrs.clear();
WorkList.clear();		WorkList.clear();
DeadDefs.clear();		DeadDefs.clear();
InflateRegs.clear();		InflateRegs.clear();
}		}
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/late-remat-update.mir

				# REQUIRES: asserts
				# RUN: llc -mtriple=x86_64-- -run-pass=simple-register-coalescing -late-remat-update-threshold=1 -stats %s -o /dev/null 2>&1 \| FileCheck %s
				# Check the test will rematerialize for three copies, but will call shrinkToUses
				# only once to update live range because of late rematerialization update.
				# CHECK: 3 regalloc - Number of instructions re-materialized
				# CHECK: 1 regalloc - Number of shrinkToUses called
				--- \|
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: noreturn uwtable
				define void @_Z3fooi(i32 %value) local_unnamed_addr #0 {
				entry:
				br label %do.body

				do.body: ; preds = %do.body, %sw.bb2, %entry
				tail call void asm sideeffect "", "~{r10},~{r11},~{r12},~{r13},~{r14},~{r15},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !3
				switch i32 %value, label %do.body [
				i32 0, label %sw.bb
				i32 1, label %sw.bb1
				i32 2, label %sw.bb2
				]

				sw.bb: ; preds = %do.body
				tail call void @_Z3gooi(i32 2122)
				br label %sw.bb1

				sw.bb1: ; preds = %sw.bb, %do.body
				tail call void @_Z3gooi(i32 2122)
				br label %sw.bb2

				sw.bb2: ; preds = %sw.bb1, %do.body
				tail call void @_Z3gooi(i32 2122)
				br label %do.body
				}

				declare void @_Z3gooi(i32) local_unnamed_addr #1

				; Function Attrs: nounwind
				declare void @llvm.stackprotector(i8, i8*) #2

				attributes #0 = { noreturn uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { nounwind }

				!llvm.module.flags = !{!0, !1}
				!llvm.ident = !{!2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 7, !"PIC Level", i32 2}
				!2 = !{!"clang version 7.0.0 (trunk 335057)"}
				!3 = !{i32 82}

				...
				---
				name: _Z3fooi
				alignment: 4
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr32 }
				- { id: 1, class: gr32 }
				- { id: 2, class: gr32 }
				- { id: 3, class: gr32 }
				- { id: 4, class: gr32 }
				- { id: 5, class: gr32 }
				liveins:
				- { reg: '$edi', virtual-reg: '%0' }
				frameInfo:
				hasCalls: true
				body: \|
				bb.0.entry:
				liveins: $edi

				%0:gr32 = COPY killed $edi
				%5:gr32 = MOV32ri 2122

				bb.1.do.body:
				successors: %bb.6(0x15555555), %bb.2(0x6aaaaaab)

				INLINEASM &"", 1, 12, implicit-def dead early-clobber $r10, 12, implicit-def dead early-clobber $r11, 12, implicit-def dead early-clobber $r12, 12, implicit-def dead early-clobber $r13, 12, implicit-def dead early-clobber $r14, 12, implicit-def dead early-clobber $r15, 12, implicit-def dead early-clobber $eflags, !3
				CMP32ri8 %0, 2, implicit-def $eflags
				JE_1 %bb.6, implicit killed $eflags
				JMP_1 %bb.2

				bb.2.do.body:
				successors: %bb.5(0x19999999), %bb.3(0x66666667)

				CMP32ri8 %0, 1, implicit-def $eflags
				JE_1 %bb.5, implicit killed $eflags
				JMP_1 %bb.3

				bb.3.do.body:
				successors: %bb.4(0x20000000), %bb.1(0x60000000)

				TEST32rr %0, %0, implicit-def $eflags
				JNE_1 %bb.1, implicit killed $eflags
				JMP_1 %bb.4

				bb.4.sw.bb:
				ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				$edi = COPY %5
				CALL64pcrel32 @_Z3gooi, csr_64, implicit $rsp, implicit $ssp, implicit killed $edi, implicit-def $rsp, implicit-def $ssp
				ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp

				bb.5.sw.bb1:
				ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				$edi = COPY %5
				CALL64pcrel32 @_Z3gooi, csr_64, implicit $rsp, implicit $ssp, implicit killed $edi, implicit-def $rsp, implicit-def $ssp
				ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp

				bb.6.sw.bb2:
				ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				$edi = COPY %5
				CALL64pcrel32 @_Z3gooi, csr_64, implicit $rsp, implicit $ssp, implicit killed $edi, implicit-def $rsp, implicit-def $ssp
				ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				JMP_1 %bb.1

				...