This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
6
StackColoring.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
1
StackColoring.ll

Differential D31583

StackColoring: smarter check for slot overlap
ClosedPublic

Authored by arielb1 on Apr 2 2017, 9:40 AM.

Download Raw Diff

Details

Reviewers

thanm
nagisa
llvm-commits
efriedma
rnk

Commits

rG14d61436c010: StackColoring: smarter check for slot overlap
rL305193: StackColoring: smarter check for slot overlap

Summary

The old check for slot overlap treated 2 slots S and T as
overlapping if there existed a CFG node in which both of the slots could
possibly be active. That is overly conservative and caused stack blowups
in Rust programs. Instead, check whether there is a single CFG node in
which both of the slots are possibly active *together*.

Fixes PR32488.

Diff Detail

Event Timeline

arielb1 created this revision.Apr 2 2017, 9:40 AM

Hi Ariel,

I am thinking that maybe there is something wrong with the patch on phab-- I downloaded it and it would not compile:

StackColoring.cpp:1050:5: error: use of undeclared identifier 'UI'
    UI->getNextValue(Indexes->getZeroIndex(), VNInfoAllocator);

I fixed that up in the obvious way.

Aside from that, I'd say that so far I don't think I completely understand how your strategy works. Part of this may be due simple to terminology and the names you've given to the new things.

For example, I don't get why you've chosen the name "Use" for the new set in BlockInfo. The fact that gets recorded in this set is not really a "use" of the stack slot. Consider these two blocks:

BB1:
  lifetime_end slot 5
  branch

BB2:
  lifetime_end slot 5
  lifetime_start slot 5
  use of slot 5
  branch

The way the code works now, "Use" will be set for BB1 and not for BB2, even though BB2 has an actual use (reference to) the slot and BB2 does not. This is very confusing for people reading the code.

Similar confusion over your use of the term "active" in the comment ("active" is also used at various other places in the file comments, and I don't think your definition of "active" is the same as that used in the other places). Ditto for "IntervalStarts" -- why is it named this way?

Another point of confusion is over the use of the term "interference graph" in the comments-- the data structure being used to capture interferences is not really a graph, there are no explicit edges to speak of. A real interference graph would be an improvement over what's there (could be more precise in a number of cases), but that's not what exists in the code today.

A final question is what sort of testing you've done so far to make sure this works for all the odd corner cases. LNT and bootstrap build are two things that come to mind.

With that said, please keep working on this, I think it's worth pursuing. I am OOO all this week and possibly next, so I won't have a lot of time to look at it, but I'll get there eventually.

Regards,

Than

Typofix UI -> SI

In D31583#718444, @thanm wrote:
Hi Ariel,

I am thinking that maybe there is something wrong with the patch on phab-- I downloaded it and it would not compile:
StackColoring.cpp:1050:5: error: use of undeclared identifier 'UI'
    UI->getNextValue(Indexes->getZeroIndex(), VNInfoAllocator);
I fixed that up in the obvious way.

Fixed that.

Aside from that, I'd say that so far I don't think I completely understand how your strategy works. Part of this may be due simple to terminology and the names you've given to the new things.

For example, I don't get why you've chosen the name "Use" for the new set in BlockInfo. The fact that gets recorded in this set is not really a "use" of the stack slot. Consider these two blocks:
BB1:
  lifetime_end slot 5
  branch

BB2:
  lifetime_end slot 5
  lifetime_start slot 5
  use of slot 5
  branch
The way the code works now, "Use" will be set for BB1 and not for BB2, even though BB2 has an actual use (reference to) the slot and BB2 does not. This is very confusing for people reading the code.

BLI.Use refes, as it says, to a slot that begins and ends within the same block. Maybe I should have called it "BeginAndEnd" or something? Maybe I should have Use also include Begin (actually, let's do that)?

Similar confusion over your use of the term "active" in the comment ("active" is also used at various other places in the file comments, and I don't think your definition of "active" is the same as that used in the other places). Ditto for "IntervalStarts" -- why is it named this way?

I thought "active" was a new fresh term. I informally referred to it as "live", but unfortunately "live" has another meaning. Is there a better term than "active"?

Another point of confusion is over the use of the term "interference graph" in the comments-- the data structure being used to capture interferences is not really a graph, there are no explicit edges to speak of. A real interference graph would be an improvement over what's there (could be more precise in a number of cases), but that's not what exists in the code today.

I was thinking about things theoretically.

A final question is what sort of testing you've done so far to make sure this works for all the odd corner cases. LNT and bootstrap build are two things that come to mind.

I bootstrapped rustc and passed all of its tests. I don't know how to bootstrap LLVM, so I didn't do that.

With that said, please keep working on this, I think it's worth pursuing. I am OOO all this week and possibly next, so I won't have a lot of time to look at it, but I'll get there eventually.

Thanks.

Regards,

Than

To try to echo a bit what than is saying:, i think it would be helpful if you could write some paragraphs of what you think the high level design of this new part is.

IE something like:
we create a dataflow problem that tells us x
we do this by computing these attributes on a per-block basis, and then finding a maximal fixpoint.
We then use x to prune y.

or whatever.

I can think of a lot of different approaches than stack coloring currently takes (IE you could do real liveness analysis, as than mentions, you could build interference graphs and simplify them, etc), but they are more fundamental changes than i see here.

Change names and comment to not mention interference graph.

In D31583#719207, @dberlin wrote:

To try to echo a bit what than is saying:, i think it would be helpful if you could write some paragraphs of what you think the high level design of this new part is.

IE something like:
we create a dataflow problem that tells us x
we do this by computing these attributes on a per-block basis, and then finding a maximal fixpoint.
We then use x to prune y.

or whatever.

I can think of a lot of different approaches than stack coloring currently takes (IE you could do real liveness analysis, as than mentions, you could build interference graphs and simplify them, etc), but they are more fundamental changes than i see here.

I have a big comment in the source code that is supposed to be that explanation. Is there anything missing?

• dberlin added inline comments.Apr 5 2017, 4:32 PM

lib/CodeGen/StackColoring.cpp
1125	Or you could just augment the standard dataflow problem with more info? You also just kind of assert it doesn't give very good results, but it's pretty widely used with good results :)
1163	Errr, what? It is very possible to do this in N time with N dataflow facts.

arielb1 added inline comments.Apr 6 2017, 1:53 AM

lib/CodeGen/StackColoring.cpp
1125	What I said is that what the old code was doing - propagating the N `S active` facts across the CFG and that saying that 2 nodes interfere if they are both potentially active at a point - is overconservative at merge points. Merging this PR caused x16 stack usage reductions in real world Rust compiler stack usage. My algorithm also propagates the N `S active` facts, but uses them along with `S active starts` non-propagated facts to compute `S active & T active` more precisely. Liveness can make it smarter, but aliasing makes it annoying to compute (and if there are opaque function calls in the merge BB, impossible to compute).

dotdash added a subscriber: dotdash.Apr 15 2017, 12:11 PM

dotdash added inline comments.Apr 16 2017, 8:31 AM

test/CodeGen/X86/StackColoring.ll
539	I think you want to use `%bar` here.

Clean up code to LLVM standards - thanks @doener. Also add support for multiple live ranges.

Since I had a hand in changing the logic for the live interval calculation, I
feel like I should take some time to explain the approach that was taken here.

Originally, the stack coloring pass collected the lifetime markers and assumed
that there are at most two such markers per MBB. Either in the order start ->
end, creating a single segment in the live interval. Or in the order end ->
start, creating two segments in the live interval, the slot being dead in the
middle of the MBB.

Later the code was adjusted to handle more than two markers per block. Because
the collected markers were in a random order, instead of forming multiple
segments only a single segment was created and extended as needed.
Unfortunately this actually broke the logic for handling two markers in the
order end -> start. This is because the LiveIn and LiveOut bits would then also
be set, and the segment gets extended to cover the whole MBB.

By now, we're iterating over the instructions in the MBB in order anyway, so we
can actually handle multiple start/end markers properly and create multiple
segments in the live interval even within a single block.

The semantics are as follows:

A slot for which there are no lifetime markers is always live.

For slots that have lifetime markers:

In the entry MBB, the slot starts out as dead. In other MBBs, the slot starts
out as live if the dataflow analysis determined it to be live coming into this
block, otherwise it starts as dead.

Iterating over the instructions in the MBB in sequential order:

A dead slot becomes live when it encounters a START (which could be a use), and
stays live until it encounters an END. Any START encountered while a slot is
live has no effect. This is necessary, because the START could be a plain use
rather than a lifetime_start and may thus not shorten the live interval.

A live slot becomes dead when it encounters an END. At this point, the slots
live interval gets a new segment that starts at the index where the slot became
live, and ends at the index the END was encountered. Any END encountered while
a slot is dead has no effect.

Building upon that, the improvement this patch was initially motivated by is
described below.

Because a lifetime end on a dead slot has no effect, a frontend may choose to
combine certain code paths to produce fewer BBs. Consider this pseudocode:

A = alloca TYPE
B = alloca TYPE
C = alloca TYPE_WITH_DTOR

main:
  LT_START(C)

  if COND {
    LT_START(A)
    INVOKE func UNWIND cleanup_A
    LT_END(A)
  } else {
    LT_START(B)
    INVOKE func UNWIND cleanup_B
    LT_END(B)
  }

  DTOR(C)
  LT_END(C)

  RETURN

cleanup_A:
  LP
  LT_END(A)
  br cleanup;

cleanup_B:
  LP
  LT_END(B)
  br cleanup;

cleanup:
  DTOR(C)
  LT_END(C)
  RESUME

Here we need two distinct cleanup paths and landing pads just to ensure that
the old stack coloring code can merge slots A and B. But assuming that a
lifetime end on a dead slot is a no-op, we could also write:

A = alloca TYPE
B = alloca TYPE
C = alloca TYPE_WITH_DTOR

main:
  LT_START(C)

  if COND {
    LT_START(A)
    INVOKE func UNWIND cleanup_A
    LT_END(A)
  } else {
    LT_START(B)
    INVOKE func UNWIND cleanup_B
    LT_END(B)
  }

  DTOR(C)
  LT_END(C)

  RETURN

cleanup:
  LP
  LT_END(A)
  LT_END(B)
  DTOR(C)
  LT_END(C)
  RESUME

Now the problem is that both A and B are only "possibly live" coming into
"cleanup". "Possibly live" meaning that a slot is live on one but not
necessarily all incoming edges. Using a plain overlap check on the live
intervals created using this information causes false positives, stopping the
slots from being merged.

To solve this, we can use an alternative approach to check whether two slots
are live at the same time. For two slots to be live at the same time, one of
them needs to become live when the other is live as well. We can check this by
keeping track of the points at which a slot becomes live. As these points tell
us that a slot is "definitely" live, we get more accurate results.

I hope this explains the approach we've taken well enough.

Ready and just waiting for review.

• dberlin removed a reviewer: • dberlin.May 15 2017, 6:41 AM

rnk can you review this? This is stuck in the queue for almost a month.

In D31583#754945, @arielb1 wrote:

Ready and just waiting for review.

Thanks for the ping/reminder.

I read through Björn's April 17th comment. The explanation makes sense, but this is something that should be in the code as opposed to only in a Phab review. Please add something along these lines to the "Implementation Notes" command section -- that is where other material exists relating to how the data flow analysis works.

My existing comments regarding naming still stand (overloading of the terms "active" and naming of the BB set "Use").

Ditto on my previous comment regarding testing. I recommend running a clang/llvm bootstrap build to make sure you haven't broken anything. You can find instructions on how to do bootstrap builds at http://llvm.org/docs/AdvancedBuilds.html. This is purely an advisory suggestion, you are free to ignore it (I mention it mainly since when I made changes to stack coloring previously it was helpful in flushing out problems that I hadn't anticipated).

Tests run locally.

Updated comment.

Works locally.

Ping. I think the new comment is good enough to go.

I pulled your patch and did a bootstrap build with it, no issues. Overall this seems like an improvement over the initial patch. Please see inline comments.

lib/CodeGen/StackColoring.cpp
127	This is kind of a nit, but if I am reading this correctly, your use of "non-conservative" here refers to a slot that is not degenerate, which is not talked about until down on line 342. Maybe it would make sense to put in a forward reference?
646	I see "if (!IsStart && ..." -- seems like the "!IsStart" is redundant at this point, no?
1000	A couple of observations: First, relative to your first patch I think I can make more more sense out of what's going on. I can also see where you're getting the improvement, since doing the conflict check based on starts is inherently more precise. Second, there is a concern that in a large function with many packets overlapped together you could get into quadratic compile time behavior (since in the worst case the interval has O(N) items and the LiveStarts vector has O(N) items). If you haven't seen any issues in practice, though, I suppose it's probably not worrying too much about.

This revision now requires changes to proceed.Jun 6 2017, 5:28 AM

Changed else/if nesting in that part.

I'm not too worried about any O(n^2) part - both the old and the new code do an O(# slots) * O(# slots) overlap check, and I don't remember it being a particularly expensive pass.

Use "degenerate" in all comments.

LGTM.

FWIW I tried a self-build of clang with and without the change and looked at the -stats output to see what sort of improvement there is. Looks pretty modest (<1%) but it is positive overall.

This revision is now accepted and ready to land.Jun 8 2017, 5:32 AM

Closed by commit rL305193: StackColoring: smarter check for slot overlap (authored by thanm). · Explain WhyJun 12 2017, 7:56 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

StackColoring.cpp

102 lines

test/

CodeGen/

X86/

StackColoring.ll

42 lines

Diff 94237

lib/CodeGen/StackColoring.cpp

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
// slot 2: b5		// slot 2: b5
//		//
// Achieving this allocation is tricky, however, due to the way		// Achieving this allocation is tricky, however, due to the way
// lifetime markers are inserted. Here is a simplified view of the		// lifetime markers are inserted. Here is a simplified view of the
// control flow graph for the code above:		// control flow graph for the code above:
//		//
// +------ block 0 -------+		// +------ block 0 -------+
// 0\| LIFETIME_START b1, b2 \|		// 0\| LIFETIME_START b1, b2 \|
// 1\| <test 'if' condition> \|		// 1\| <test 'if' condition> \|
		thanmUnsubmitted Not Done Reply Inline Actions This is kind of a nit, but if I am reading this correctly, your use of "non-conservative" here refers to a slot that is not degenerate, which is not talked about until down on line 342. Maybe it would make sense to put in a forward reference? thanm: This is kind of a nit, but if I am reading this correctly, your use of "non-conservative" here…
// +-----------------------+		// +-----------------------+
// ./ \.		// ./ \.
// +------ block 1 -------+ +------ block 2 -------+		// +------ block 1 -------+ +------ block 2 -------+
// 2\| LIFETIME_START b3 \| 5\| LIFETIME_START b4, b5 \|		// 2\| LIFETIME_START b3 \| 5\| LIFETIME_START b4, b5 \|
// 3\| <uses of b1, b3> \| 6\| <uses of b2, b4, b5> \|		// 3\| <uses of b1, b3> \| 6\| <uses of b2, b4, b5> \|
// 4\| LIFETIME_END b3 \| 7\| LIFETIME_END b4, b5 \|		// 4\| LIFETIME_END b3 \| 7\| LIFETIME_END b4, b5 \|
// +-----------------------+ +-----------------------+		// +-----------------------+ +-----------------------+
// \. /.		// \. /.
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
// If in RPO ordering chosen to walk the CFG we happen to visit the b[k]		// If in RPO ordering chosen to walk the CFG we happen to visit the b[k]
// before visiting the memcpy block (which will contain the lifetime start		// before visiting the memcpy block (which will contain the lifetime start
// for "b" then it will appear that 'b' has a degenerate lifetime.		// for "b" then it will appear that 'b' has a degenerate lifetime.
//		//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// StackColoring Pass		// StackColoring Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
/// StackColoring - A machine pass for merging disjoint stack allocations,		/// StackColoring - A machine pass for merging disjoint stack allocations,
		/// marked by the LIFETIME_START and LIFETIME_END pseudo instructions.
		class StackColoring : public MachineFunctionPass {
/// marked by the LIFETIME_START and LIFETIME_END pseudo instructions.		MachineFrameInfo *MFI;
class StackColoring : public MachineFunctionPass {		MachineFunction *MF;
MachineFrameInfo *MFI;
MachineFunction *MF;		/// A class representing liveness information for a single basic block.
		/// Each bit in the BitVector represents the liveness property
/// A class representing liveness information for a single basic block.		/// for a different stack slot.
/// Each bit in the BitVector represents the liveness property		struct BlockLifetimeInfo {
/// for a different stack slot.		/// Which slots BEGIN in this block and survive to its end.
struct BlockLifetimeInfo {		BitVector Begin;
/// Which slots BEGINs in each basic block.		/// Which slots BEGIN and END in this block.
BitVector Begin;		BitVector Use;
/// Which slots ENDs in each basic block.		/// Which slots END in this block.
BitVector End;		BitVector End;
/// Which slots are marked as LIVE_IN, coming into each basic block.		/// Which slots are marked as LIVE_IN, coming into this block.
		BitVector LiveIn;
BitVector LiveIn;		/// Which slots are marked as LIVE_OUT, coming out of this block.
		BitVector LiveOut;
		};
/// Which slots are marked as LIVE_OUT, coming out of each basic block.
BitVector LiveOut;		/// Maps active slots (per bit) for each basic block.
};		typedef DenseMap<const MachineBasicBlock*, BlockLifetimeInfo> LivenessMap;
		LivenessMap BlockLiveness;
/// Maps active slots (per bit) for each basic block.
typedef DenseMap<const MachineBasicBlock*, BlockLifetimeInfo> LivenessMap;		/// Maps serial numbers to basic blocks.
LivenessMap BlockLiveness;		DenseMap<const MachineBasicBlock*, int> BasicBlocks;
		/// Maps basic blocks to a serial number.
/// Maps serial numbers to basic blocks.		SmallVector<const MachineBasicBlock*, 8> BasicBlockNumbering;
DenseMap<const MachineBasicBlock*, int> BasicBlocks;
/// Maps basic blocks to a serial number.		/// Maps slots to their activity interval. Outside of this interval, slots
SmallVector<const MachineBasicBlock*, 8> BasicBlockNumbering;		/// values are either dead or `undef` and they will not be written to.
		SmallVector<std::unique_ptr<LiveInterval>, 16> Intervals;
/// Maps liveness intervals for each slot.		/// Maps slots to the set of gen-points of their intervals.
SmallVector<std::unique_ptr<LiveInterval>, 16> Intervals;		SmallVector<std::unique_ptr<LiveInterval>, 16> IntervalStarts;
/// VNInfo is used for the construction of LiveIntervals.		/// VNInfo is used for the construction of LiveIntervals.
VNInfo::Allocator VNInfoAllocator;		VNInfo::Allocator VNInfoAllocator;
/// SlotIndex analysis object.		/// SlotIndex analysis object.
SlotIndexes *Indexes;		SlotIndexes *Indexes;
/// The stack protector object.		/// The stack protector object.
StackProtector *SP;		StackProtector *SP;

/// The list of lifetime markers found. These markers are to be removed		/// The list of lifetime markers found. These markers are to be removed
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	private:
/// This procedure checks all of the instructions in the function and		/// This procedure checks all of the instructions in the function and
/// invalidates lifetime ranges which do not contain all of the instructions		/// invalidates lifetime ranges which do not contain all of the instructions
/// which access that frame slot.		/// which access that frame slot.
void removeInvalidSlotRanges();		void removeInvalidSlotRanges();

/// Map entries which point to other entries to their destination.		/// Map entries which point to other entries to their destination.
/// A->B->C becomes A->C.		/// A->B->C becomes A->C.
void expungeSlotMap(DenseMap<int, int> &SlotRemap, unsigned NumSlots);		void expungeSlotMap(DenseMap<int, int> &SlotRemap, unsigned NumSlots);

		/// Used in collectMarkers
/// Used in collectMarkers		typedef DenseMap<const MachineBasicBlock*, BitVector> BlockBitVecMap;
typedef DenseMap<const MachineBasicBlock*, BitVector> BlockBitVecMap;		};
};		} // end anonymous namespace
} // end anonymous namespace
		char StackColoring::ID = 0;
char StackColoring::ID = 0;		char &llvm::StackColoringID = StackColoring::ID;
char &llvm::StackColoringID = StackColoring::ID;
		INITIALIZE_PASS_BEGIN(StackColoring,
INITIALIZE_PASS_BEGIN(StackColoring,		"stack-coloring", "Merge disjoint stack slots", false, false)
"stack-coloring", "Merge disjoint stack slots", false, false)		INITIALIZE_PASS_DEPENDENCY(SlotIndexes)
INITIALIZE_PASS_DEPENDENCY(SlotIndexes)		INITIALIZE_PASS_DEPENDENCY(StackProtector)
INITIALIZE_PASS_DEPENDENCY(StackProtector)		INITIALIZE_PASS_END(StackColoring,
INITIALIZE_PASS_END(StackColoring,		"stack-coloring", "Merge disjoint stack slots", false, false)
"stack-coloring", "Merge disjoint stack slots", false, false)
		void StackColoring::getAnalysisUsage(AnalysisUsage &AU) const {
		AU.addRequired<SlotIndexes>();
		AU.addRequired<StackProtector>();
void StackColoring::getAnalysisUsage(AnalysisUsage &AU) const {		MachineFunctionPass::getAnalysisUsage(AU);
AU.addRequired<SlotIndexes>();		}
AU.addRequired<StackProtector>();
MachineFunctionPass::getAnalysisUsage(AU);		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
}		LLVM_DUMP_METHOD void StackColoring::dumpBV(const char *tag,
		const BitVector &BV) const {
#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)		dbgs() << tag << " : { ";
LLVM_DUMP_METHOD void StackColoring::dumpBV(const char *tag,		for (unsigned I = 0, E = BV.size(); I != E; ++I)
const BitVector &BV) const {		dbgs() << BV.test(I) << " ";
dbgs() << tag << " : { ";		dbgs() << "}\n";
for (unsigned I = 0, E = BV.size(); I != E; ++I)		}
dbgs() << BV.test(I) << " ";
dbgs() << "}\n";		LLVM_DUMP_METHOD void StackColoring::dumpBB(MachineBasicBlock *MBB) const {
}		LivenessMap::const_iterator BI = BlockLiveness.find(MBB);
		assert(BI != BlockLiveness.end() && "Block not found");
LLVM_DUMP_METHOD void StackColoring::dumpBB(MachineBasicBlock *MBB) const {		const BlockLifetimeInfo &BlockInfo = BI->second;
LivenessMap::const_iterator BI = BlockLiveness.find(MBB);
assert(BI != BlockLiveness.end() && "Block not found");		dumpBV("BEGIN", BlockInfo.Begin);
const BlockLifetimeInfo &BlockInfo = BI->second;		dumpBV("USE", BlockInfo.Use);
		dumpBV("END", BlockInfo.End);
dumpBV("BEGIN", BlockInfo.Begin);		dumpBV("LIVE_IN", BlockInfo.LiveIn);
dumpBV("END", BlockInfo.End);		dumpBV("LIVE_OUT", BlockInfo.LiveOut);
dumpBV("LIVE_IN", BlockInfo.LiveIn);		}
dumpBV("LIVE_OUT", BlockInfo.LiveOut);
}		LLVM_DUMP_METHOD void StackColoring::dump() const {
		for (MachineBasicBlock *MBB : depth_first(MF)) {
LLVM_DUMP_METHOD void StackColoring::dump() const {		dbgs() << "Inspecting block #" << MBB->getNumber() << " ["
for (MachineBasicBlock *MBB : depth_first(MF)) {		<< MBB->getName() << "]\n";
dbgs() << "Inspecting block #" << MBB->getNumber() << " ["		dumpBB(MBB);
<< MBB->getName() << "]\n";		}
dumpBB(MBB);		}
}
}		LLVM_DUMP_METHOD void StackColoring::dumpIntervals() const {
		for (unsigned I = 0, E = Intervals.size(); I != E; ++I) {
LLVM_DUMP_METHOD void StackColoring::dumpIntervals() const {		dbgs() << "Interval[" << I << "]:\n";
for (unsigned I = 0, E = Intervals.size(); I != E; ++I) {		Intervals[I]->dump();
dbgs() << "Interval[" << I << "]:\n";		dbgs() << "IntervalStarts[" << I << "]:\n";
Intervals[I]->dump();		IntervalStarts[I]->dump();
}		}
}		}
#endif		#endif

static inline int getStartOrEndSlot(const MachineInstr &MI)		static inline int getStartOrEndSlot(const MachineInstr &MI)
{		{
assert((MI.getOpcode() == TargetOpcode::LIFETIME_START \|\|		assert((MI.getOpcode() == TargetOpcode::LIFETIME_START \|\|
MI.getOpcode() == TargetOpcode::LIFETIME_END) &&		MI.getOpcode() == TargetOpcode::LIFETIME_END) &&
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	for (MachineInstr &MI : *MBB) {
if (MI.getOpcode() == TargetOpcode::LIFETIME_START) {		if (MI.getOpcode() == TargetOpcode::LIFETIME_START) {
BetweenStartEnd.set(Slot);		BetweenStartEnd.set(Slot);
NumStartLifetimes[Slot] += 1;		NumStartLifetimes[Slot] += 1;
} else {		} else {
BetweenStartEnd.reset(Slot);		BetweenStartEnd.reset(Slot);
NumEndLifetimes[Slot] += 1;		NumEndLifetimes[Slot] += 1;
}		}
const AllocaInst *Allocation = MFI->getObjectAllocation(Slot);		const AllocaInst *Allocation = MFI->getObjectAllocation(Slot);
		if (Allocation) {
if (Allocation) {		DEBUG(dbgs() << "Found a lifetime ");
DEBUG(dbgs() << "Found a lifetime ");		DEBUG(dbgs() << (MI.getOpcode() == TargetOpcode::LIFETIME_START
DEBUG(dbgs() << (MI.getOpcode() == TargetOpcode::LIFETIME_START		? "start"
? "start"		: "end"));
: "end"));		DEBUG(dbgs() << " marker for slot #" << Slot);
DEBUG(dbgs() << " marker for slot #" << Slot);		DEBUG(dbgs() << " with allocation: " << Allocation->getName()
DEBUG(dbgs() << " with allocation: " << Allocation->getName()		<< "\n");
<< "\n");		}
}		Markers.push_back(&MI);
Markers.push_back(&MI);		MarkersFound += 1;
MarkersFound += 1;		} else {
		for (const MachineOperand &MO : MI.operands()) {
} else {		if (!MO.isFI())
for (const MachineOperand &MO : MI.operands()) {		continue;
if (!MO.isFI())		int Slot = MO.getIndex();
continue;		if (Slot < 0)
int Slot = MO.getIndex();		continue;
if (Slot < 0)		if (! BetweenStartEnd.test(Slot)) {
continue;		ConservativeSlots.set(Slot);
if (! BetweenStartEnd.test(Slot)) {		}
ConservativeSlots.set(Slot);		}
}		}
}		}
}		BitVector &SeenStart = SeenStartMap[MBB];
}		SeenStart \|= BetweenStartEnd;
BitVector &SeenStart = SeenStartMap[MBB];		}
SeenStart \|= BetweenStartEnd;		if (!MarkersFound) {
		return 0;
		}

}		// PR27903: slots with multiple start or end lifetime ops are not
if (!MarkersFound) {		// safe to enable for "lifetime-start-on-first-use".
return 0;		for (unsigned slot = 0; slot < NumSlot; ++slot)
}		if (NumStartLifetimes[slot] > 1 \|\| NumEndLifetimes[slot] > 1)
		ConservativeSlots.set(slot);
// PR27903: slots with multiple start or end lifetime ops are not		DEBUG(dumpBV("Conservative slots", ConservativeSlots));
// safe to enable for "lifetime-start-on-first-use".
for (unsigned slot = 0; slot < NumSlot; ++slot)		// Step 2: compute begin/end sets for each block
if (NumStartLifetimes[slot] > 1 \|\| NumEndLifetimes[slot] > 1)
ConservativeSlots.set(slot);		// NOTE: We use a depth-first iteration to ensure that we obtain a
DEBUG(dumpBV("Conservative slots", ConservativeSlots));		// deterministic numbering.
		for (MachineBasicBlock *MBB : depth_first(MF)) {
// Step 2: compute begin/end sets for each block
		// Assign a serial number to this basic block.
// NOTE: We use a depth-first iteration to ensure that we obtain a		BasicBlocks[MBB] = BasicBlockNumbering.size();
// deterministic numbering.		BasicBlockNumbering.push_back(MBB);
for (MachineBasicBlock *MBB : depth_first(MF)) {
		// Keep a reference to avoid repeated lookups.
// Assign a serial number to this basic block.		BlockLifetimeInfo &BlockInfo = BlockLiveness[MBB];
BasicBlocks[MBB] = BasicBlockNumbering.size();
BasicBlockNumbering.push_back(MBB);		BlockInfo.Begin.resize(NumSlot);
		BlockInfo.Use.resize(NumSlot);
// Keep a reference to avoid repeated lookups.		BlockInfo.End.resize(NumSlot);
BlockLifetimeInfo &BlockInfo = BlockLiveness[MBB];
		SmallVector<int, 4> slots;
BlockInfo.Begin.resize(NumSlot);		for (MachineInstr &MI : *MBB) {
BlockInfo.End.resize(NumSlot);		bool isStart = false;
		slots.clear();
SmallVector<int, 4> slots;		if (isLifetimeStartOrEnd(MI, slots, isStart)) {
for (MachineInstr &MI : *MBB) {		if (!isStart) {
bool isStart = false;		assert(slots.size() == 1 && "unexpected: MI ends multiple slots");
slots.clear();		int Slot = slots[0];
if (isLifetimeStartOrEnd(MI, slots, isStart)) {		if (BlockInfo.Begin.test(Slot)) {
if (!isStart) {		BlockInfo.Begin.reset(Slot);
assert(slots.size() == 1 && "unexpected: MI ends multiple slots");		BlockInfo.Use.set(Slot);
int Slot = slots[0];		}
if (BlockInfo.Begin.test(Slot)) {		BlockInfo.End.set(Slot);
BlockInfo.Begin.reset(Slot);		} else {
}		for (auto Slot : slots) {
BlockInfo.End.set(Slot);		DEBUG(dbgs() << "Found a use of slot #" << Slot);
} else {		DEBUG(dbgs() << " at BB#" << MBB->getNumber() << " index ");
for (auto Slot : slots) {		DEBUG(Indexes->getInstructionIndex(MI).print(dbgs()));
DEBUG(dbgs() << "Found a use of slot #" << Slot);		const AllocaInst *Allocation = MFI->getObjectAllocation(Slot);
DEBUG(dbgs() << " at BB#" << MBB->getNumber() << " index ");		if (Allocation) {
DEBUG(Indexes->getInstructionIndex(MI).print(dbgs()));		DEBUG(dbgs() << " with allocation: "<< Allocation->getName());
const AllocaInst *Allocation = MFI->getObjectAllocation(Slot);		}
if (Allocation) {		DEBUG(dbgs() << "\n");
DEBUG(dbgs() << " with allocation: "<< Allocation->getName());		if (BlockInfo.End.test(Slot)) {
}		BlockInfo.End.reset(Slot);
DEBUG(dbgs() << "\n");		}
if (BlockInfo.End.test(Slot)) {		if (BlockInfo.Use.test(Slot)) {
BlockInfo.End.reset(Slot);		BlockInfo.Use.reset(Slot);
}		}
BlockInfo.Begin.set(Slot);		BlockInfo.Begin.set(Slot);
}		}
}		}
}		}
}		}
}		}

Show All 9 Lines	void StackColoring::calculateLocalLiveness()
while (changed) {		while (changed) {
changed = false;		changed = false;
++NumIters;		++NumIters;

for (const MachineBasicBlock *BB : BasicBlockNumbering) {		for (const MachineBasicBlock *BB : BasicBlockNumbering) {

// Use an iterator to avoid repeated lookups.		// Use an iterator to avoid repeated lookups.
LivenessMap::iterator BI = BlockLiveness.find(BB);		LivenessMap::iterator BI = BlockLiveness.find(BB);
assert(BI != BlockLiveness.end() && "Block not found");		assert(BI != BlockLiveness.end() && "Block not found");
		thanmUnsubmitted Not Done Reply Inline Actions I see "if (!IsStart && ..." -- seems like the "!IsStart" is redundant at this point, no? thanm: I see "if (!IsStart && ..." -- seems like the "!IsStart" is redundant at this point, no?
BlockLifetimeInfo &BlockInfo = BI->second;		BlockLifetimeInfo &BlockInfo = BI->second;

// Compute LiveIn by unioning together the LiveOut sets of all preds.		// Compute LiveIn by unioning together the LiveOut sets of all preds.
BitVector LocalLiveIn;		BitVector LocalLiveIn;
for (MachineBasicBlock::const_pred_iterator PI = BB->pred_begin(),		for (MachineBasicBlock::const_pred_iterator PI = BB->pred_begin(),
PE = BB->pred_end(); PI != PE; ++PI) {		PE = BB->pred_end(); PI != PE; ++PI) {
LivenessMap::const_iterator I = BlockLiveness.find(*PI);		LivenessMap::const_iterator I = BlockLiveness.find(*PI);
assert(I != BlockLiveness.end() && "Predecessor not found");		assert(I != BlockLiveness.end() && "Predecessor not found");
Show All 25 Lines	while (changed) {
}		}
}// while changed.		}// while changed.

NumIterations = NumIters;		NumIterations = NumIters;
}		}

void StackColoring::calculateLiveIntervals(unsigned NumSlots) {		void StackColoring::calculateLiveIntervals(unsigned NumSlots) {
SmallVector<SlotIndex, 16> Starts;		SmallVector<SlotIndex, 16> Starts;
		SmallVector<SlotIndex, 16> Finishes;
SmallVector<SlotIndex, 16> Finishes;
		// For each block, find which slots are active within this block
// For each block, find which slots are active within this block		// and update the live intervals.
// and update the live intervals.		for (const MachineBasicBlock &MBB : *MF) {
for (const MachineBasicBlock &MBB : *MF) {		Starts.clear();
		Starts.resize(NumSlots);
		Finishes.clear();
		Finishes.resize(NumSlots);

Starts.clear();		// Create the interval for the basic blocks containing lifetime begin/end.
Starts.resize(NumSlots);		for (const MachineInstr &MI : MBB) {
Finishes.clear();
Finishes.resize(NumSlots);		SmallVector<int, 4> slots;
		bool IsStart = false;
// Create the interval for the basic blocks containing lifetime begin/end.		if (!isLifetimeStartOrEnd(MI, slots, IsStart))
for (const MachineInstr &MI : MBB) {		continue;
		SlotIndex ThisIndex = Indexes->getInstructionIndex(MI);
		for (auto Slot : slots) {
		if (IsStart) {
		if (!Starts[Slot].isValid() \|\| Starts[Slot] > ThisIndex)
		Starts[Slot] = ThisIndex;
		} else {
SmallVector<int, 4> slots;		if (!Finishes[Slot].isValid() \|\| Finishes[Slot] < ThisIndex)
bool IsStart = false;		Finishes[Slot] = ThisIndex;
if (!isLifetimeStartOrEnd(MI, slots, IsStart))		}
continue;		}
SlotIndex ThisIndex = Indexes->getInstructionIndex(MI);		}
for (auto Slot : slots) {
if (IsStart) {		// Create the interval of the blocks that we previously found to be 'alive'.
if (!Starts[Slot].isValid() \|\| Starts[Slot] > ThisIndex)		BlockLifetimeInfo &MBBLiveness = BlockLiveness[&MBB];
Starts[Slot] = ThisIndex;		for (int pos = MBBLiveness.LiveIn.find_first(); pos != -1;
} else {		pos = MBBLiveness.LiveIn.find_next(pos)) {
if (!Finishes[Slot].isValid() \|\| Finishes[Slot] < ThisIndex)		Starts[pos] = Indexes->getMBBStartIdx(&MBB);
Finishes[Slot] = ThisIndex;		}
}		for (int pos = MBBLiveness.LiveOut.find_first(); pos != -1;
}		pos = MBBLiveness.LiveOut.find_next(pos)) {
}		Finishes[pos] = Indexes->getMBBEndIdx(&MBB);
		}
// Create the interval of the blocks that we previously found to be 'alive'.
BlockLifetimeInfo &MBBLiveness = BlockLiveness[&MBB];		for (unsigned i = 0; i < NumSlots; ++i) {
for (int pos = MBBLiveness.LiveIn.find_first(); pos != -1;		//
pos = MBBLiveness.LiveIn.find_next(pos)) {		// When LifetimeStartOnFirstUse is turned on, data flow analysis
Starts[pos] = Indexes->getMBBStartIdx(&MBB);		// is forward (from starts to ends), not bidirectional. A
}		// consequence of this is that we can wind up in situations
for (int pos = MBBLiveness.LiveOut.find_first(); pos != -1;		// where Starts[i] is invalid but Finishes[i] is valid and vice
pos = MBBLiveness.LiveOut.find_next(pos)) {		// versa. Example:
Finishes[pos] = Indexes->getMBBEndIdx(&MBB);		//
}		// LIFETIME_START x
		// if (...) {
for (unsigned i = 0; i < NumSlots; ++i) {		// <use of x>
//		// throw ...;
// When LifetimeStartOnFirstUse is turned on, data flow analysis		// }
// is forward (from starts to ends), not bidirectional. A		// LIFETIME_END x
// consequence of this is that we can wind up in situations		// return 2;
// where Starts[i] is invalid but Finishes[i] is valid and vice		//
// versa. Example:		//
//		// Here the slot for "x" will not be live into the block
// LIFETIME_START x		// containing the "return 2" (since lifetimes start with first
// if (...) {		// use, not at the dominating LIFETIME_START marker).
// <use of x>		//
// throw ...;		if (Starts[i].isValid() && !Finishes[i].isValid()) {
// }		Finishes[i] = Indexes->getMBBEndIdx(&MBB);
// LIFETIME_END x		}
// return 2;		if (!Starts[i].isValid())
//		continue;
//
// Here the slot for "x" will not be live into the block		assert(Starts[i] && Finishes[i] && "Invalid interval");
// containing the "return 2" (since lifetimes start with first		VNInfo *ValNum = Intervals[i]->getValNumInfo(0);
// use, not at the dominating LIFETIME_START marker).		VNInfo *ValNumS = IntervalStarts[i]->getValNumInfo(0);
//		SlotIndex S = Starts[i];
if (Starts[i].isValid() && !Finishes[i].isValid()) {		SlotIndex F = Finishes[i];
Finishes[i] = Indexes->getMBBEndIdx(&MBB);		if (S < F) {
}		// We have a single consecutive region.
if (!Starts[i].isValid())		Intervals[i]->addSegment(LiveInterval::Segment(S, F, ValNum));
continue;		// FIXME: stop cargo culting
		if (MBBLiveness.Begin.test(i) \|\| MBBLiveness.Use.test(i)) {
assert(Starts[i] && Finishes[i] && "Invalid interval");		IntervalStarts[i]->addSegment(LiveInterval::Segment(S, F, ValNumS));
VNInfo *ValNum = Intervals[i]->getValNumInfo(0);		}
SlotIndex S = Starts[i];		} else {
SlotIndex F = Finishes[i];		// We have two non-consecutive regions. This happens when
if (S < F) {		// LIFETIME_START appears after the LIFETIME_END marker.
// We have a single consecutive region.		SlotIndex NewStart = Indexes->getMBBStartIdx(&MBB);
Intervals[i]->addSegment(LiveInterval::Segment(S, F, ValNum));		SlotIndex NewFin = Indexes->getMBBEndIdx(&MBB);
} else {		Intervals[i]->addSegment(LiveInterval::Segment(NewStart, F, ValNum));
// We have two non-consecutive regions. This happens when		Intervals[i]->addSegment(LiveInterval::Segment(S, NewFin, ValNum));
// LIFETIME_START appears after the LIFETIME_END marker.		// FIXME: stop cargo culting
SlotIndex NewStart = Indexes->getMBBStartIdx(&MBB);		if (MBBLiveness.Begin.test(i) \|\| MBBLiveness.Use.test(i)) {
SlotIndex NewFin = Indexes->getMBBEndIdx(&MBB);		IntervalStarts[i]->addSegment(LiveInterval::Segment(NewStart, F, ValNumS));
Intervals[i]->addSegment(LiveInterval::Segment(NewStart, F, ValNum));		IntervalStarts[i]->addSegment(LiveInterval::Segment(S, NewFin, ValNumS));
Intervals[i]->addSegment(LiveInterval::Segment(S, NewFin, ValNum));		}
}		}
}		}
}		}
}		}

bool StackColoring::removeAllMarkers() {		bool StackColoring::removeAllMarkers() {
unsigned Count = 0;		unsigned Count = 0;
for (MachineInstr *MI : Markers) {		for (MachineInstr *MI : Markers) {
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	for (MachineInstr &I : BB) {
// outside of the lifetime markers, or that the user has a bug.		// outside of the lifetime markers, or that the user has a bug.
// NOTE: Alloca address calculations which happen outside the lifetime		// NOTE: Alloca address calculations which happen outside the lifetime
// zone are are okay, despite the fact that we don't have a good way		// zone are are okay, despite the fact that we don't have a good way
// for validating all of the usages of the calculation.		// for validating all of the usages of the calculation.
#ifndef NDEBUG		#ifndef NDEBUG
bool TouchesMemory = I.mayLoad() \|\| I.mayStore();		bool TouchesMemory = I.mayLoad() \|\| I.mayStore();
// If we don't protect the user from escaped allocas, don't bother		// If we don't protect the user from escaped allocas, don't bother
// validating the instructions.		// validating the instructions.
		if (!I.isDebugValue() && TouchesMemory && ProtectFromEscapedAllocas) {
if (!I.isDebugValue() && TouchesMemory && ProtectFromEscapedAllocas) {		SlotIndex Index = Indexes->getInstructionIndex(I);
SlotIndex Index = Indexes->getInstructionIndex(I);		const LiveInterval Interval = &Intervals[FromSlot];
const LiveInterval Interval = &Intervals[FromSlot];		assert(Interval->find(Index) != Interval->end() &&
assert(Interval->find(Index) != Interval->end() &&		"Found instruction usage outside of live range.");
"Found instruction usage outside of live range.");		}
}		#endif
#endif
		// Fix the machine instructions.
// Fix the machine instructions.		int ToSlot = SlotRemap[FromSlot];
int ToSlot = SlotRemap[FromSlot];		MO.setIndex(ToSlot);
MO.setIndex(ToSlot);		FixedInstr++;
FixedInstr++;		}
}		}
}
		// Update the location of C++ catch objects for the MSVC personality routine.
// Update the location of C++ catch objects for the MSVC personality routine.		if (WinEHFuncInfo *EHInfo = MF->getWinEHFuncInfo())
if (WinEHFuncInfo *EHInfo = MF->getWinEHFuncInfo())		for (WinEHTryBlockMapEntry &TBME : EHInfo->TryBlockMap)
for (WinEHTryBlockMapEntry &TBME : EHInfo->TryBlockMap)		for (WinEHHandlerType &H : TBME.HandlerArray)
for (WinEHHandlerType &H : TBME.HandlerArray)		if (H.CatchObj.FrameIndex != INT_MAX &&
if (H.CatchObj.FrameIndex != INT_MAX &&		SlotRemap.count(H.CatchObj.FrameIndex))
SlotRemap.count(H.CatchObj.FrameIndex))		H.CatchObj.FrameIndex = SlotRemap[H.CatchObj.FrameIndex];
H.CatchObj.FrameIndex = SlotRemap[H.CatchObj.FrameIndex];
		DEBUG(dbgs()<<"Fixed "<<FixedMemOp<<" machine memory operands.\n");
DEBUG(dbgs()<<"Fixed "<<FixedMemOp<<" machine memory operands.\n");		DEBUG(dbgs()<<"Fixed "<<FixedDbg<<" debug locations.\n");
DEBUG(dbgs()<<"Fixed "<<FixedDbg<<" debug locations.\n");		DEBUG(dbgs()<<"Fixed "<<FixedInstr<<" machine instructions.\n");
DEBUG(dbgs()<<"Fixed "<<FixedInstr<<" machine instructions.\n");		}
}
		void StackColoring::removeInvalidSlotRanges() {
		for (MachineBasicBlock &BB : *MF)
		for (MachineInstr &I : BB) {
		if (I.getOpcode() == TargetOpcode::LIFETIME_START \|\|
		I.getOpcode() == TargetOpcode::LIFETIME_END \|\| I.isDebugValue())
		continue;

void StackColoring::removeInvalidSlotRanges() {		// Some intervals are suspicious! In some cases we find address
for (MachineBasicBlock &BB : *MF)		// calculations outside of the lifetime zone, but not actual memory
for (MachineInstr &I : BB) {		// read or write. Memory accesses outside of the lifetime zone are a clear
if (I.getOpcode() == TargetOpcode::LIFETIME_START \|\|		// violation, but address calculations are okay. This can happen when
I.getOpcode() == TargetOpcode::LIFETIME_END \|\| I.isDebugValue())		// GEPs are hoisted outside of the lifetime zone.
continue;		// So, in here we only check instructions which can read or write memory.
		if (!I.mayLoad() && !I.mayStore())
// Some intervals are suspicious! In some cases we find address		continue;
// calculations outside of the lifetime zone, but not actual memory
// read or write. Memory accesses outside of the lifetime zone are a clear		// Check all of the machine operands.
// violation, but address calculations are okay. This can happen when		for (const MachineOperand &MO : I.operands()) {
// GEPs are hoisted outside of the lifetime zone.		if (!MO.isFI())
// So, in here we only check instructions which can read or write memory.		continue;
if (!I.mayLoad() && !I.mayStore())
continue;		int Slot = MO.getIndex();

// Check all of the machine operands.		if (Slot<0)
for (const MachineOperand &MO : I.operands()) {		continue;
if (!MO.isFI())
continue;		if (Intervals[Slot]->empty())
		continue;
int Slot = MO.getIndex();
		// Check that the used slot is inside the calculated lifetime range.
if (Slot<0)		// If it is not, warn about it and invalidate the range.
continue;		LiveInterval Interval = &Intervals[Slot];
		SlotIndex Index = Indexes->getInstructionIndex(I);
if (Intervals[Slot]->empty())		if (Interval->find(Index) == Interval->end()) {
continue;		Interval->clear();
		DEBUG(dbgs()<<"Invalidating range #"<<Slot<<"\n");
// Check that the used slot is inside the calculated lifetime range.		EscapedAllocas++;
// If it is not, warn about it and invalidate the range.		}
LiveInterval Interval = &Intervals[Slot];		}
SlotIndex Index = Indexes->getInstructionIndex(I);		}
if (Interval->find(Index) == Interval->end()) {		}
Interval->clear();
DEBUG(dbgs()<<"Invalidating range #"<<Slot<<"\n");		void StackColoring::expungeSlotMap(DenseMap<int, int> &SlotRemap,
EscapedAllocas++;		unsigned NumSlots) {
}		// Expunge slot remap map.
}		for (unsigned i=0; i < NumSlots; ++i) {
}		// If we are remapping i
}		if (SlotRemap.count(i)) {
		int Target = SlotRemap[i];
void StackColoring::expungeSlotMap(DenseMap<int, int> &SlotRemap,		// As long as our target is mapped to something else, follow it.
unsigned NumSlots) {		while (SlotRemap.count(Target)) {
// Expunge slot remap map.		Target = SlotRemap[Target];
for (unsigned i=0; i < NumSlots; ++i) {		SlotRemap[i] = Target;
// If we are remapping i		}
if (SlotRemap.count(i)) {		}
int Target = SlotRemap[i];		}
		}
// As long as our target is mapped to something else, follow it.
		bool StackColoring::runOnMachineFunction(MachineFunction &Func) {
while (SlotRemap.count(Target)) {		DEBUG(dbgs() << "******** Stack Coloring ********\n"
Target = SlotRemap[Target];		<< "********** Function: "
SlotRemap[i] = Target;		<< ((const Value*)Func.getFunction())->getName() << '\n');
		MF = &Func;
		thanmUnsubmitted Not Done Reply Inline Actions A couple of observations: First, relative to your first patch I think I can make more more sense out of what's going on. I can also see where you're getting the improvement, since doing the conflict check based on starts is inherently more precise. Second, there is a concern that in a large function with many packets overlapped together you could get into quadratic compile time behavior (since in the worst case the interval has O(N) items and the LiveStarts vector has O(N) items). If you haven't seen any issues in practice, though, I suppose it's probably not worrying too much about. thanm: A couple of observations: First, relative to your first patch I think I can make more more…
		MFI = &MF->getFrameInfo();
		Indexes = &getAnalysis<SlotIndexes>();
		SP = &getAnalysis<StackProtector>();
		BlockLiveness.clear();
		BasicBlocks.clear();
		BasicBlockNumbering.clear();
		Markers.clear();
		Intervals.clear();
		IntervalStarts.clear();
		VNInfoAllocator.Reset();

		unsigned NumSlots = MFI->getObjectIndexEnd();

		// If there are no stack slots then there are no markers to remove.
		if (!NumSlots)
		return false;

		SmallVector<int, 8> SortedSlots;
		SortedSlots.reserve(NumSlots);
		Intervals.reserve(NumSlots);

		unsigned NumMarkers = collectMarkers(NumSlots);

		unsigned TotalSize = 0;
		DEBUG(dbgs()<<"Found "<<NumMarkers<<" markers and "<<NumSlots<<" slots\n");
		DEBUG(dbgs()<<"Slot structure:\n");

		for (int i=0; i < MFI->getObjectIndexEnd(); ++i) {
		DEBUG(dbgs()<<"Slot #"<<i<<" - "<<MFI->getObjectSize(i)<<" bytes.\n");
		TotalSize += MFI->getObjectSize(i);
		}

		DEBUG(dbgs()<<"Total Stack size: "<<TotalSize<<" bytes\n\n");

		// Don't continue because there are not enough lifetime markers, or the
		// stack is too small, or we are told not to optimize the slots.
		if (NumMarkers < 2 \|\| TotalSize < 16 \|\| DisableColoring \|\|
		skipFunction(*Func.getFunction())) {
		DEBUG(dbgs()<<"Will not try to merge slots.\n");
		return removeAllMarkers();
		}

		for (unsigned i=0; i < NumSlots; ++i) {
		std::unique_ptr<LiveInterval> LI(new LiveInterval(i, 0));
		LI->getNextValue(Indexes->getZeroIndex(), VNInfoAllocator);
		Intervals.push_back(std::move(LI));

		// Just cargo culting. Please help me DTRT.
		std::unique_ptr<LiveInterval> SI(new LiveInterval(i, 0));
		SI->getNextValue(Indexes->getZeroIndex(), VNInfoAllocator);
		IntervalStarts.push_back(std::move(SI));

		SortedSlots.push_back(i);
		}
}
}		// Calculate the liveness of each block.
		calculateLocalLiveness();
}		DEBUG(dbgs() << "Dataflow iterations: " << NumIterations << "\n");
}		DEBUG(dump());

bool StackColoring::runOnMachineFunction(MachineFunction &Func) {		// Propagate the liveness information.
DEBUG(dbgs() << "******** Stack Coloring ********\n"		calculateLiveIntervals(NumSlots);
<< "********** Function: "		DEBUG(dumpIntervals());
<< ((const Value*)Func.getFunction())->getName() << '\n');
MF = &Func;		// Search for allocas which are used outside of the declared lifetime
MFI = &MF->getFrameInfo();		// markers.
Indexes = &getAnalysis<SlotIndexes>();		if (ProtectFromEscapedAllocas)
SP = &getAnalysis<StackProtector>();		removeInvalidSlotRanges();
BlockLiveness.clear();
BasicBlocks.clear();		// Maps old slots to new slots.
BasicBlockNumbering.clear();		DenseMap<int, int> SlotRemap;
Markers.clear();		unsigned RemovedSlots = 0;
Intervals.clear();		unsigned ReducedSize = 0;
VNInfoAllocator.Reset();
		// Do not bother looking at empty intervals.
unsigned NumSlots = MFI->getObjectIndexEnd();		for (unsigned I = 0; I < NumSlots; ++I) {
		if (Intervals[SortedSlots[I]]->empty())
// If there are no stack slots then there are no markers to remove.		SortedSlots[I] = -1;
if (!NumSlots)		}
return false;
		// This is a simple greedy algorithm for merging allocas. First, sort the
SmallVector<int, 8> SortedSlots;		// slots, placing the largest slots first. Next, perform an n^2 scan and look
SortedSlots.reserve(NumSlots);		// for disjoint slots. When you find disjoint slots, merge the samller one
Intervals.reserve(NumSlots);		// into the bigger one and update the live interval. Remove the small alloca
		// and continue.
unsigned NumMarkers = collectMarkers(NumSlots);
		// Sort the slots according to their size. Place unused slots at the end.
unsigned TotalSize = 0;		// Use stable sort to guarantee deterministic code generation.
DEBUG(dbgs()<<"Found "<<NumMarkers<<" markers and "<<NumSlots<<" slots\n");		std::stable_sort(SortedSlots.begin(), SortedSlots.end(),
DEBUG(dbgs()<<"Slot structure:\n");		[this](int LHS, int RHS) {
		// We use -1 to denote a uninteresting slot. Place these slots at the end.
for (int i=0; i < MFI->getObjectIndexEnd(); ++i) {		if (LHS == -1) return false;
DEBUG(dbgs()<<"Slot #"<<i<<" - "<<MFI->getObjectSize(i)<<" bytes.\n");		if (RHS == -1) return true;
TotalSize += MFI->getObjectSize(i);		// Sort according to size.
}		return MFI->getObjectSize(LHS) > MFI->getObjectSize(RHS);
		});
DEBUG(dbgs()<<"Total Stack size: "<<TotalSize<<" bytes\n\n");
		bool Changed = true;
// Don't continue because there are not enough lifetime markers, or the		while (Changed) {
// stack is too small, or we are told not to optimize the slots.		Changed = false;
if (NumMarkers < 2 \|\| TotalSize < 16 \|\| DisableColoring \|\|		for (unsigned I = 0; I < NumSlots; ++I) {
skipFunction(*Func.getFunction())) {		if (SortedSlots[I] == -1)
DEBUG(dbgs()<<"Will not try to merge slots.\n");		continue;
return removeAllMarkers();
}		for (unsigned J=I+1; J < NumSlots; ++J) {
		if (SortedSlots[J] == -1)
for (unsigned i=0; i < NumSlots; ++i) {		continue;
std::unique_ptr<LiveInterval> LI(new LiveInterval(i, 0));
LI->getNextValue(Indexes->getZeroIndex(), VNInfoAllocator);		int FirstSlot = SortedSlots[I];
Intervals.push_back(std::move(LI));		int SecondSlot = SortedSlots[J];
SortedSlots.push_back(i);		LiveInterval First = &Intervals[FirstSlot];
}		LiveInterval FirstS = &IntervalStarts[FirstSlot];
		LiveInterval Second = &Intervals[SecondSlot];
// Calculate the liveness of each block.		LiveInterval SecondS = &IntervalStarts[SecondSlot];
calculateLocalLiveness();		assert (!First->empty() && !Second->empty() && "Found an empty range");
DEBUG(dbgs() << "Dataflow iterations: " << NumIterations << "\n");
DEBUG(dump());		// Merge disjoint slots. Now, the condition for this is a little bit
		// tricky.
// Propagate the liveness information.		//
calculateLiveIntervals(NumSlots);		// The fundamental condition we want to preserve is that each stack
DEBUG(dumpIntervals());		// slot has the correct contents at each point it is live.
		//
// Search for allocas which are used outside of the declared lifetime		// We could compute liveness using the standard backward dataflow
// markers.		// algorithm. Unfortunately, that does not give very good results in the
if (ProtectFromEscapedAllocas)		// presence of aliasing, so we have frontends emit `lifetime.start` and
		dberlinUnsubmitted Not Done Reply Inline Actions Or you could just augment the standard dataflow problem with more info? You also just kind of assert it doesn't give very good results, but it's pretty widely used with good results :) dberlin: Or you could just augment the standard dataflow problem with more info? You also just kind of…
		arielb1AuthorUnsubmitted Not Done Reply Inline Actions What I said is that what the old code was doing - propagating the N `S active` facts across the CFG and that saying that 2 nodes interfere if they are both potentially active at a point - is overconservative at merge points. Merging this PR caused x16 stack usage reductions in real world Rust compiler stack usage. My algorithm also propagates the N `S active` facts, but uses them along with `S active starts` non-propagated facts to compute `S active & T active` more precisely. Liveness can make it smarter, but aliasing makes it annoying to compute (and if there are opaque function calls in the merge BB, impossible to compute). arielb1: What I said is that what the old code was doing - propagating the N `S active` facts across the…
removeInvalidSlotRanges();		// `lifetime.end` intrinsics that make undesirable accesses UB.
		//
// Maps old slots to new slots.		// The effect of these intrinsics is as follows:
DenseMap<int, int> SlotRemap;		// 1) at start, each stack-slot is marked as out-of-scope, unless no
unsigned RemovedSlots = 0;		// lifetime intrinsic refers to that stack slot, in which case
unsigned ReducedSize = 0;		// it is marked as in-scope.
		// 2) on a `lifetime.start`, a stack slot is marked as in-scope and
// Do not bother looking at empty intervals.		// the stack slot is overwritten with `undef`.
for (unsigned I = 0; I < NumSlots; ++I) {		// 3) on a `lifetime.end`, a stack slot is marked as out-of-scope.
if (Intervals[SortedSlots[I]]->empty())		// 4) on function exit, all stack slots are marked as out-of-scope.
SortedSlots[I] = -1;		// 5) the effects of calling `lifetime.start` on an in-scope stack-slot,
}		// or `lifetime.end` on an out-of-scope stack-slot, are left unspecified.
		// 6) memory accesses to out-of-scope stack slots are UB.
// This is a simple greedy algorithm for merging allocas. First, sort the		// 7) when a stack-slot is marked as out-of-scope, all pointers to it
// slots, placing the largest slots first. Next, perform an n^2 scan and look		// are invalidated unless it looks like they might be used (?). This
// for disjoint slots. When you find disjoint slots, merge the samller one		// is used to justify not marking slots as live until the pointer
// into the bigger one and update the live interval. Remove the small alloca		// to them is used, but I think this should be clarified better.
// and continue.		//
		// If we define a slot as active at a program point if it either can
// Sort the slots according to their size. Place unused slots at the end.		// be written to, or if it has a live and non-undef content, then it
// Use stable sort to guarantee deterministic code generation.		// is obvious that slots that are never active together can be merged.
std::stable_sort(SortedSlots.begin(), SortedSlots.end(),		//
[this](int LHS, int RHS) {		// From our rules, we see that out-of-scope slots are never active,
// We use -1 to denote a uninteresting slot. Place these slots at the end.		// and from (7) we see that "non-conservative" slots remain non-active
if (LHS == -1) return false;		// until their address is taken. Therefore, we can approximate slot activity
if (RHS == -1) return true;		// using dataflow.
// Sort according to size.		//
return MFI->getObjectSize(LHS) > MFI->getObjectSize(RHS);		// Now, naively, we might think that we could construct our interference
});		// graph by propagating `S active` through the CFG for every stack-slot `S`,
		// and having `S` and `T` interfere if there is a point in which they are
bool Changed = true;		// both active. That is sound, but overly conservative in some important
while (Changed) {		// cases: it is possible that `S` is active on one predecessor edge and
Changed = false;		// `T` is active on another. See PR32488.
for (unsigned I = 0; I < NumSlots; ++I) {		//
if (SortedSlots[I] == -1)		// If we want to construct the interference graph precisely, we could
continue;		// propagate `S active` and `S&T active` predicates through the CFG. That
		// would be precise, but requires propagating `O(n^2)` dataflow facts.
for (unsigned J=I+1; J < NumSlots; ++J) {		//
		dberlinUnsubmitted Not Done Reply Inline Actions Errr, what? It is very possible to do this in N time with N dataflow facts. dberlin: Errr, what? It is very possible to do this in N time with N dataflow facts.
if (SortedSlots[J] == -1)		// Instead, we rely on a little trick: for an `S&T active` predicate to
continue;		// start holding, there has to be either
		// A) a point in the gen-set of `S active` where `T` is active
int FirstSlot = SortedSlots[I];		// B) a point in the gen-set of `T active` where `S` is active
int SecondSlot = SortedSlots[J];		// C) a point in the gen-set of both `S active` and `T active`.
LiveInterval First = &Intervals[FirstSlot];		//
LiveInterval Second = &Intervals[SecondSlot];		// Of course, the `S&T active` predicate can be propagated further, but
assert (!First->empty() && !Second->empty() && "Found an empty range");		// it holding at 1 point is enough for us to mark an edge on the interference
		// graph. So that's what we do.
// Merge disjoint slots.		if (!First->overlaps(SecondS) && !FirstS->overlaps(Second)) {
if (!First->overlaps(*Second)) {		Changed = true;
Changed = true;		First->MergeSegmentsInAsValue(*Second, First->getValNumInfo(0));
First->MergeSegmentsInAsValue(*Second, First->getValNumInfo(0));		FirstS->MergeSegmentsInAsValue(*SecondS, FirstS->getValNumInfo(0));
SlotRemap[SecondSlot] = FirstSlot;		SlotRemap[SecondSlot] = FirstSlot;
SortedSlots[J] = -1;		SortedSlots[J] = -1;
DEBUG(dbgs()<<"Merging #"<<FirstSlot<<" and slots #"<<		DEBUG(dbgs()<<"Merging #"<<FirstSlot<<" and slots #"<<
SecondSlot<<" together.\n");		SecondSlot<<" together.\n");
unsigned MaxAlignment = std::max(MFI->getObjectAlignment(FirstSlot),		unsigned MaxAlignment = std::max(MFI->getObjectAlignment(FirstSlot),
MFI->getObjectAlignment(SecondSlot));		MFI->getObjectAlignment(SecondSlot));

assert(MFI->getObjectSize(FirstSlot) >=		assert(MFI->getObjectSize(FirstSlot) >=
Show All 25 Lines

test/CodeGen/X86/StackColoring.ll

Show First 20 Lines • Show All 515 Lines • ▼ Show 20 Lines	if.else: ; preds = %entry
call void @inita(i32* %arraydecay1) #3		call void @inita(i32* %arraydecay1) #3
%arraydecay3 = getelementptr inbounds [128 x i32], [128 x i32]* %b3, i64 0, i64 0		%arraydecay3 = getelementptr inbounds [128 x i32], [128 x i32]* %b3, i64 0, i64 0
call void @inita(i32* %arraydecay3) #3		call void @inita(i32* %arraydecay3) #3
%tobool25 = icmp eq i32 %x, 0		%tobool25 = icmp eq i32 %x, 0
br i1 %tobool25, label %if.end, label %while.body.lr.ph		br i1 %tobool25, label %if.end, label %while.body.lr.ph

while.body.lr.ph: ; preds = %if.else		while.body.lr.ph: ; preds = %if.else
%tmp2 = bitcast [128 x i32]* %b3 to i8*		%tmp2 = bitcast [128 x i32]* %b3 to i8*
		br label %while.body

		while.body: ; preds = %while.body.lr.ph, %while.body
		%x.addr.06 = phi i32 [ %x, %while.body.lr.ph ], [ %dec, %while.body ]
		%dec = add nsw i32 %x.addr.06, -1
		call void @llvm.lifetime.start(i64 512, i8* %tmp2) #3
		call void @inita(i32* %arraydecay3) #3
		call void @llvm.lifetime.end(i64 512, i8* %tmp2) #3
		%tobool2 = icmp eq i32 %dec, 0
		br i1 %tobool2, label %if.end.loopexit, label %while.body

		if.end.loopexit: ; preds = %while.body
		br label %if.end

		if.end: ; preds = %if.end.loopexit, %if.else, %if.then
		call void @llvm.lifetime.end(i64 512, i8* %tmp1) #3
		dotdashUnsubmitted Not Done Reply Inline Actions I think you want to use `%bar` here. dotdash: I think you want to use `%bar` here.
		call void @llvm.lifetime.end(i64 512, i8* %tmp) #3
		ret i32 0
		}

		; Test case motivated by PR27903. Same routine inlined multiple times
		; into a caller results in a multi-segment lifetime, but the second
		; lifetime has no explicit references to the stack slot. Such slots
		; have to be treated conservatively.

		;CHECK-LABEL: twobod_b27903:
		;YESCOLOR: subq $96, %rsp
		;NOFIRSTUSE: subq $96, %rsp
		;NOCOLOR: subq $96, %rsp

		define i32 @twobod_b27903(i32 %y, i32 %x) {
		entry:
		%buffer.i = alloca [12 x i32], align 16
		%abc = alloca [12 x i32], align 16
		%tmp = bitcast [12 x i32]* %buffer.i to i8*
		call void @llvm.lifetime.start(i64 48, i8* %tmp)
		%idxprom.i = sext i32 %y to i64
		%arrayidx.i = getelementptr inbounds [12 x i32], [12 x i32]* %buffer.i, i64 0, i64 %idxprom.i
		call void @inita(i32* %arrayidx.i)
		%add.i = add nsw i32 %x, %y
br label %while.body		call void @llvm.lifetime.end(i64 48, i8* %tmp)
		%tobool = icmp eq i32 %y, 0
while.body: ; preds = %while.body.lr.ph, %while.body		br i1 %tobool, label %if.end, label %if.then

		if.then: ; preds = %entry
%x.addr.06 = phi i32 [ %x, %while.body.lr.ph ], [ %dec, %while.body ]		%tmp1 = bitcast [12 x i32]* %abc to i8*
%dec = add nsw i32 %x.addr.06, -1		call void @llvm.lifetime.start(i64 48, i8* %tmp1)
call void @llvm.lifetime.start(i64 512, i8* %tmp2) #3		%arrayidx = getelementptr inbounds [12 x i32], [12 x i32]* %abc, i64 0, i64 %idxprom.i
call void @inita(i32* %arraydecay3) #3		call void @inita(i32* %arrayidx)
call void @llvm.lifetime.end(i64 512, i8* %tmp2) #3		call void @llvm.lifetime.start(i64 48, i8* %tmp)
%tobool2 = icmp eq i32 %dec, 0		call void @inita(i32* %arrayidx.i)
br i1 %tobool2, label %if.end.loopexit, label %while.body		%add.i9 = add nsw i32 %add.i, %y
		call void @llvm.lifetime.end(i64 48, i8* %tmp)
if.end.loopexit: ; preds = %while.body		call void @llvm.lifetime.end(i64 48, i8* %tmp1)
br label %if.end		br label %if.end

if.end: ; preds = %if.end.loopexit, %if.else, %if.then		if.end: ; preds = %if.then, %entry
call void @llvm.lifetime.end(i64 512, i8* %tmp1) #3		%x.addr.0 = phi i32 [ %add.i9, %if.then ], [ %add.i, %entry ]
call void @llvm.lifetime.end(i64 512, i8* %tmp) #3		ret i32 %x.addr.0
ret i32 0		}
}

; Test case motivated by PR27903. Same routine inlined multiple times		;CHECK-LABEL: pr32488:
; into a caller results in a multi-segment lifetime, but the second		;YESCOLOR: subq $256, %rsp
; lifetime has no explicit references to the stack slot. Such slots		;NOFIRSTUSE: subq $256, %rsp
; have to be treated conservatively.		;NOCOLOR: subq $512, %rsp
		define i1 @pr32488(i1, i1)
;CHECK-LABEL: twobod_b27903:		{
;YESCOLOR: subq $96, %rsp		entry-block:
;NOFIRSTUSE: subq $96, %rsp		%foo = alloca [32 x i64]
;NOCOLOR: subq $96, %rsp		%bar = alloca [32 x i64]
		%foo_i8 = bitcast [32 x i64]* %foo to i8*
define i32 @twobod_b27903(i32 %y, i32 %x) {		%bar_i8 = bitcast [32 x i64]* %bar to i8*
entry:		br i1 %0, label %if_false, label %if_true
%buffer.i = alloca [12 x i32], align 16		if_false:
%abc = alloca [12 x i32], align 16		call void @llvm.lifetime.start(i64 256, i8* %bar_i8)
%tmp = bitcast [12 x i32]* %buffer.i to i8*		call void @baz([32 x i64]* %foo, i32 0)
call void @llvm.lifetime.start(i64 48, i8* %tmp)		br i1 %1, label %if_false.1, label %onerr
%idxprom.i = sext i32 %y to i64		if_false.1:
%arrayidx.i = getelementptr inbounds [12 x i32], [12 x i32]* %buffer.i, i64 0, i64 %idxprom.i		call void @llvm.lifetime.end(i64 256, i8* %bar_i8)
call void @inita(i32* %arrayidx.i)		br label %merge
%add.i = add nsw i32 %x, %y		if_true:
call void @llvm.lifetime.end(i64 48, i8* %tmp)		call void @llvm.lifetime.start(i64 256, i8* %foo_i8)
%tobool = icmp eq i32 %y, 0		call void @baz([32 x i64]* %foo, i32 1)
br i1 %tobool, label %if.end, label %if.then		br i1 %1, label %if_true.1, label %onerr
		if_true.1:
if.then: ; preds = %entry		call void @llvm.lifetime.end(i64 256, i8* %foo_i8)
%tmp1 = bitcast [12 x i32]* %abc to i8*		br label %merge
call void @llvm.lifetime.start(i64 48, i8* %tmp1)		merge:
%arrayidx = getelementptr inbounds [12 x i32], [12 x i32]* %abc, i64 0, i64 %idxprom.i		ret i1 false
call void @inita(i32* %arrayidx)		onerr:
call void @llvm.lifetime.start(i64 48, i8* %tmp)		call void @llvm.lifetime.end(i64 256, i8* %foo_i8)
call void @inita(i32* %arrayidx.i)		call void @llvm.lifetime.end(i64 256, i8* %bar_i8)
%add.i9 = add nsw i32 %add.i, %y		call void @destructor()
call void @llvm.lifetime.end(i64 48, i8* %tmp)		ret i1 true
call void @llvm.lifetime.end(i64 48, i8* %tmp1)		}
br label %if.end
		%Data = type { [32 x i64] }
if.end: ; preds = %if.then, %entry
%x.addr.0 = phi i32 [ %add.i9, %if.then ], [ %add.i, %entry ]		declare void @destructor()
ret i32 %x.addr.0
}		declare void @inita(i32*)

declare void @inita(i32*)		declare void @initb(i32,i32,i32*)

declare void @initb(i32,i32,i32*)		declare void @bar([100 x i32]* , [100 x i32]*) nounwind

declare void @bar([100 x i32]* , [100 x i32]*) nounwind		declare void @baz([32 x i64]*, i32)

declare void @llvm.lifetime.start(i64, i8* nocapture) nounwind		declare void @llvm.lifetime.start(i64, i8* nocapture) nounwind

declare void @llvm.lifetime.end(i64, i8* nocapture) nounwind		declare void @llvm.lifetime.end(i64, i8* nocapture) nounwind

declare i32 @foo(i32, i8*)		declare i32 @foo(i32, i8*)