This is an archive of the discontinued LLVM Phabricator instance.

Rework/enhance stack coloring data flow analysis.
ClosedPublic

Authored by thanm on Apr 6 2016, 8:18 AM.

Download Raw Diff

Details

Reviewers

qcolombet
• gbiv
wmi

Commits

rG879ad8fa9998: Rework/enhance stack coloring data flow analysis.

Summary

Replace bidirectional flow analysis to compute
liveness with forward analysis pass. Treat lifetimes
as starting when there is a first reference to the
stack slot, as opposed to starting at the point of
the lifetime.start intrinsic, so as to increase the
number of stack variables we can overlap.

Bug: 25776

Diff Detail

Event Timeline

thanm updated this revision to Diff 52804.Apr 6 2016, 8:18 AM

thanm retitled this revision from to Rework/enhance stack coloring data flow analysis..

thanm updated this object.

thanm added reviewers: • gbiv, davidxl.

thanm added a subscriber: llvm-commits.Apr 6 2016, 12:18 PM

Fix nit: debugging code was improperly guarded.

davidxl edited reviewers, added: wmi, qcolombet; removed: davidxl.Apr 7 2016, 9:55 AM

davidxl added a subscriber: davidxl.

jevinskie added a subscriber: jevinskie.Apr 7 2016, 11:09 AM

I would welcome any advice/suggestions on the best way to qualify these changes. I've run "ninja check" of course, and I've also run "lnt runtest nt" as described in http://llvm.org/docs/lnt/quickstart.html.

Hi Than,

It is a nice improvement and the code looks clean.

I can see the patch should work well for caller func with local vars introduced from callee during inline. The only concern I have is for independent local vars both defined in the same func: Because an interval is from use to @llvm.lifetime.end, even if two local vars have separate live ranges, they may still be regarded as overlapped only because their live intervals are extended from the last use to @llvm.lifetime.end. I understand it is safe to use @llvm.lifetime.end as the end of live interval, especically when there are indirect accesses to those local vars, but it can also limit the improvement we could have got, especially for local vars defined in the same func. I tried the motivational testcase stack-ifthen.cc and found it had the problem described here too - it still allocated space for both itar1 and itar2 (@llvm.lifetime.end of itar1 and itar2 are both sinked to the func exit, not at the end of then and else branches. I apply the patch and the command I use: ~/workarea/llvm-r265225/dbuild/bin/clang++ -O2 -S stack-ifthen.cc).

If we want to recognize more exact live interval of local vars, we need to involve alias analysis to count every alias access also as the uses of a local var in addition to direct use of local vars. Or at least if we know which local vars don't have the address saved or passed, we can only use their direct uses to compute live interval.

Since inline is an important source of increasing stack space, I think it maybe ok to leave the more exact interval requirement described here for further improvement. The cause that stack-ifthen.cc were not improved may be worthy to look at.

Other inline comments are nits.

Thanks,
Wei.

lib/CodeGen/StackColoring.cpp
653	Since we only do dataflow iteration in forward direction now, if only no change in LiveIn, there shouldn't be change in LiveOut. So can we move BlockInfo.LiveOut \|= LocalLiveOut into the above if?
1021–1023	There may be an unused warning when NDEBUG is enabled.
test/CodeGen/X86/StackColoring.ll
445	%0, %1, %2 are prone to changing. we can run "opt -instnamer" to rename %0, %1, %2..., so the test will be stabler.

Hi Wei,

Thanks for the review. I responded to your comments in line.

You are quite right, even after this patch, clang/llvm will still worse than GCC. The reason has to do with the way that the GCC code computes interferences between variables.

The GCC implementation uses an interference graph (node == var, edge == coloring conflict between vars), and it adds edges in the interference graph only at DEFs: statements where there could conceivably be some sort of memory write. In contrast, the LLVM algorithm works by building live intervals and checking to see whether there is any overlap between the intervals (there is no attention paid to DEF vs non-DEF). This second strategy is simpler but is definitely less precise in some cases.

Let me expand on this a bit. Going back to the if/then example:

int foo(int x) {
  int ar1[128]; int ar2[128];
  if (x & 0x99) {
    bar(&ar1); return x-1;
  } else {
    bar(&ar2); return x+1;
  }
}

Here is what the IR looks like at the point where the analysis is done (it's pretety much the same between both GCC and LLVM at an abstract level):

       BB#1:
I0:    call bar(&ar1)
I1:    %vreg1 = %vregx - 1
I2:    jmp BB3

       BB#2:
I3:    call bar(&ar1)
I4:    %vreg2 = %vregx + 1
       <fall through>
   
       BB#3:
I5:    %vrefr = phi(%vreg1, %vreg2)
I6:    LIFETIME_END <ar1>
I7:    LIFETIME_END <ar2>
I8:    <return %vregr>

As mentioned previously, the GCC algorithm uses an uses an interference graph -- for each variable X it keeps a bit vector of other variables that X interferes with. At each BB, it computes the live-in set for the BB by unioning together the live out sets of the pred BBs. It then walks the BB and looks for any instruction that could write memory (essentially anything other than phi or debug). At such an inst, it adds conflicts between every variable in the work set.

At the point where GCC processes BB#3 above, both "ar1" and "ar2" are in the work set. However as it walks through the BB it never sees any instruction that could write memory, so it never adds conflicts between the vars.

The LLVM implementation doesn't use an interference graph; instead it builds live intervals for each variable and then sees whether there is overlap between the intervals; this approach is less precise. For the graph above, we wind up with the following live intervals:

ar1: [1-2], [5-6]
ar2: [3-7]

The existing stack-coloring code examines these two intervals and sees that they overlap in at inst 5, so it considers the two variables as interfering and gives them separate stack allocations.

[This situation is somewhat related to the so-called "X graph problem" in register allocation, you may have heard of it.]

The patch I wrote sticks with the current method of building the (less precise) interval graph, hence it doesn't catch as many legal variable overlaps. The main reason I took this route was that there is an option in the existing implementation ("protect-from-escaped-allocas") that uses the interval information to identify "stray" uses of a stack location outside of the LIFETIME-START/LIFETIME-END interval the variable -- moving to an interference graph would have made it difficult to retain this functionality. Second, since I am an LLVM newbie I thought it would be best to start with a relatively modest patch, as opposed to completely throwing out the old implementation and putting in my own.

Given that the new implementation is a bit better but not ideal, I think we can revisit it at some point and decide whether we want to throw out intervals and move to interference-graph.

Thanks, Than

lib/CodeGen/StackColoring.cpp
653	I think there would be a problem with that on the first dataflow iteration for blocks containing a GEN (or first-use) of some variable "x". In that case we wind up adding "X" to the live out set, so we need to update the changed flag. On all subsequent iterations what you say is true.
1021–1023	Hmm. Good point. I will rework the code to handle this.
test/CodeGen/X86/StackColoring.ll
445	Thanks-- I hadn't realized that we had such a tool. I'll update the test.

Than,

Thank you for the detailed explanation of the implementation difference
between GCC and LLVM. I have a question about GCC's implementation inlined.

Wei wrote: But isn't bar() could potentially change ar1 and ar2 indirectly since ar1 and ar2 are escaped local arrays from current func?

GCC's implementation completely ignores "lifetime start" and instead adds a var X to the work set the first time it sees a reference to it. So on entry to BB1 it has not yet seen any references to "ar2", so when it processes "call bar(&ar1)" no edge is added. Ditto for processing BB2; live-in for the block is empty (since no explicit references in the entry block), so when it sees "call bar(&a2)" no edge is added. Both ar1 and ar1 are part of the work set on entry to BB3, but BB3 has no instructions that could access memory.

BTW in the abstract IR there is a typo, call in BB2 should be "call bar(&ar2)" not "call bar(&ar1)".

Incorporate code review feedback from Wei; rebase.

Incorporate code review feedback from Wei (part 2).

Hi Than,

Have you run clang-format?
Some of the formatting looks strange to me.

I haven’t looked into the algo yet.

Cheers,
-Quentin

lib/CodeGen/StackColoring.cpp
334	Can’t we theoretically have an interesting lifetime start and end on the same instruction. The API does not allow to model that. What happens in that case?
337	Having look at the implementation, this method does not support nullptr as input argument so please, use references.
419	Add a message in assert.
421	All the dump methods are refactoring right? If so, please commit them separately.
432	Get rid of the else. http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
575	SmallVector?
676–677	Period.

thanm added inline comments.Apr 11 2016, 12:48 PM

lib/CodeGen/StackColoring.cpp
334	As far as I know (take this with a grain of salt, since I am an LLVM newbie) the only way to end a lifetime is with a LIFETIME_END marker instruction. If you can think of a scenario where you might have both, then yes, this should definitely be changed. Let me know if you have a real example?
337	OK, I will convert to use references.
419	Will fix.
421	Yes, all changes to dump methods are refactoring. I will pull them into a separate patch.
432	Will fix.
575	Will fix.
676–678	Will fix.

qcolombet added inline comments.Apr 11 2016, 12:57 PM

lib/CodeGen/StackColoring.cpp
334	No, you’re right, for lifetime markers this cannot happen at the same time. I was thinking in term of live-range. Maybe add in the comment that we consider only lifetime end from LIFETIME_END instructions and thus, we cannot have both conditions (start and end) at the same time.

Hi Quentin,

Haven't run clang-format. Would it make sense to run afterwards, so as to not mix up pure formatting changes with functional changes?

I am ok either way, just not sure what the best practice is.

Thanks, Than

Incorporate code review feedback from qcolombet.

Improve unit test coverage.

Some Nits. Other than that LGTM. Quentin knows it better than me so I will let him to approve it finally.

Thanks,
Wei.

lib/CodeGen/StackColoring.cpp
580	Missing an assert message.
1027	We want dumpIntervals() after intervals being computed.

Hi all,
I did some additional testing of this patch over the weekend, and I've discovered (unfortunately) a couple of cases where my new algorithm is not doing as well as the old algorithm (in spite of the fact that the new scheme does fix the case that motivated me to file the bug in the first place). I am still working to understand what the issue is, but it looks as though the degradations are in cases where uses of a stack slot appear outside of regions strictly dominated/post-dominated by markers.
I will post another comment when I have more info.
Thanks, Than

Some more detail on my previous comment.

I should clarify what I am seeing are not correctness problems (e.g. incorrectly overlapping two slots that should not be overlapped) but performance problems (cases where the old implementation overlaps more slots then the new implementation). I had assumed that my new scheme would always produce better results than the old scheme, but it turns out that this is not the case.

The first problem looks like it is due to a bug in my implementation-- since with the new scheme there can now be multiple starts/begins for a lifetime, the code that computes begin/end has to treat begin/end in the same block as "end" (as opposed to a no-op) during the liveness propagation. Example:

 BB#2:
   <use fi#1>
   ...
   fall through to BB#3

BB#3:
   <use fi#1>
   LIFETIME_END <fi#1>

The old implementation assumed matched pairs of begin/end -- in the case above where you have multiple begins for fi#1, there would not be a 'kill' of the fi#1 lifetime in BB#3, so its lifetime would be extended. I have put in a fix for this.

The second and more serious problem is that I am seeing cases where stack slot references are hoisted out of loops into preheader blocks, where the preheader block is not strictly dominated by the original lifetime start. (and more importantly, is not post-dominated by the lifetime end op). It looks as though this is happening due to code motion of some sort (LICM or GVN perhaps).

In such cases the lifetimes computed by the old approach are smaller/shorter than those computed by the new approach. Also means that if the "-protectfromescapedallocas" option were to be used, the IR would be flagged as invalid, for whatever that is worth.

I will need to think about what to do with this information. I also need to run some more experiments to see just exactly how widespread the problem is.

Revise stack coloring patch to handle degenerate slots.

Update to the enhancement to handle cases where references
to a given stack slot appear outside of the lifetime
start/end markers (this can happen due to optimizations
such as LICM).

Hello reviewers:

My apologies for the long delay in updating this revision.

I've worked out a method for handling the specific cases I was seeing where my new technique was less effective than the old one; you can find additional detail in the comment section I added to StackColoring.cpp marked "Implementation Notes".

I think I've addressed all of Wei's and Quentin's specific comments, so things should be ready for another look now.

Thanks, Than

Hi folks,

I spent some time working with SPEC yesterday to characterize the amount of improvement (number of additional stack slots eliminated) that this patch provides.

Here are the results. Note that these are not execution times, this is simply looking at the stats to collect the stack space savings reported by the optimization. I did a SPEC2006 build (int and fp, but only the C/C++ programs, no Fortran), vanilla "-O2", x86_64 target.

Base:         3011 slots removed, 87483 space saved
Enhanced:     3151 slots removed, 94137 space saved

So a modest improvement in the number of bytes saved (~ 7%), but an improvement nonetheless.

Thanks, Than

Thanks for the detailed implementation notes and spec2006 data. The spec2006 data is a overall number which looks good. I guess for individual file there is no regression anymore, right?

But I havn't understood quite well about the way to handle case with degenerate slot. From my understanding, for case with degenerate slot, its liverange from startmarker to endmarker is smaller than actual, so it is possible for stackcoloring to generate incorrect code for such case (although I understand the case is never worse than without the patch). The question is before this patch, since the case with degenerate slot always existed, why we didn't see error caused by it?

According to http://lists.llvm.org/pipermail/llvm-dev/2012-September/053458.html, letting ProtectFromEscapedAllocas stay off is to detect problematic case and fix it. Is the case with degenerate slot caused by LICM such a problematic case?

Thanks,
Wei.

In D18827#429001, @wmi wrote:

Thanks for the detailed implementation notes and spec2006 data. The spec2006 data is a overall number which looks good. I guess for individual file there is no regression anymore, right?

Correct.

But I havn't understood quite well about the way to handle case with degenerate slot. From my understanding, for case with degenerate slot, its liverange from startmarker to endmarker is smaller than actual, so it is possible for stackcoloring to generate incorrect code for such case (although I understand the case is never worse than without the patch). The question is before this patch, since the case with degenerate slot always existed, why we didn't see error caused by it?

Sorry, looking back at the example I put in the comment I think I left things a bit vague. In the case that I have seen so far where uses of the stack slot are hoisted outside of the livetime start/end interval, it's only the address computation that's outside the interval-- the actual memory load/store takes place inside. So at least for all of the cases I have seen so far, the program is still correct.

I will update the comments to make this clearer.

Thanks, Than

Thanks for the explanation. It makes sense for me now.

lib/CodeGen/StackColoring.cpp
545	should we add an assertion "!MI.mayLoad() && !MI.mayStore()" so if weird escaped reference case does exist, they will be caught at compile time?

Wei, you suggested adding an assert that triggers if an instruction is found otuside the the start/end lifetime that accesses memory.

I experimented a bit with this; turns out there is a roadblock here relating to how I degenerate ranges are identified in my implementation.

At the moment I am not doing flow analysis to determine ranges of instructions that fall in between lifetime start/end for a given slot -- the code is only doing a single pass over the CFG. This makes it fast, but there is some imprecision, e.g. if there is unstructure control flow, a slot may appear degenerate when it is not.

Here is an example, abstracted from the 400.perlbench function store_scalar():

void store_scalar(...) {
   ...
   if (...) goto string;
   ...
   if (cond) {
      int wlen;
      ...
      string:
        wlen = ...;
   }
}

Note the 'goto'. The front end will place start and end lifetime markers within the body of the "if (cond)" true block, but due to the goto there is a path that leads from the entry block to the 'wlen' assignment that doesn't go pass through the lifetime start marker. If the ordering of blocks in "depth_first(MF)" happens to choose that path, then we'll see what looks like a reference to "wlen" before we see its lifetime start marker.

I'm not sure it's worth it to slow down the pass by doing real dataflow analysis to collect lifetime ranges-- that seems like overkill to me.

lib/CodeGen/StackColoring.cpp
545	This is difficult to accomplish with my current implementation (see more detail on comment).

Improvements to comments.

wmi added inline comments.May 16 2016, 10:48 AM

lib/CodeGen/StackColoring.cpp
164	repeated "can sometimes"
581	assert message missing.

Will post a revised patch shortly.

lib/CodeGen/StackColoring.cpp
164	Fixed.
164	Fixed.
581	Fixed

Incorporate fixes suggested by Wei.

Friendly ping...

Another friendly ping... I'd like to get this submitted at some point.

wmi accepted this revision.May 23 2016, 12:54 PM

wmi edited edge metadata.

This revision is now accepted and ready to land.May 23 2016, 12:54 PM

Submitted in r270559.

Thanks Wei, George, and Quentin for your reviews.

Than

Revision Contents

Path

Size

lib/

CodeGen/

StackColoring.cpp

521 lines

test/

CodeGen/

X86/

StackColoring.ll

118 lines

misched-aa-colored.ll

1 line

Diff 57378

lib/CodeGen/StackColoring.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
static cl::opt<bool>		static cl::opt<bool>
DisableColoring("no-stack-coloring",		DisableColoring("no-stack-coloring",
cl::init(false), cl::Hidden,		cl::init(false), cl::Hidden,
cl::desc("Disable stack coloring"));		cl::desc("Disable stack coloring"));

/// The user may write code that uses allocas outside of the declared lifetime		/// The user may write code that uses allocas outside of the declared lifetime
/// zone. This can happen when the user returns a reference to a local		/// zone. This can happen when the user returns a reference to a local
/// data-structure. We can detect these cases and decide not to optimize the		/// data-structure. We can detect these cases and decide not to optimize the
/// code. If this flag is enabled, we try to save the user.		/// code. If this flag is enabled, we try to save the user. This option
		/// is treated as overriding LifetimeStartOnFirstUse below.
static cl::opt<bool>		static cl::opt<bool>
ProtectFromEscapedAllocas("protect-from-escaped-allocas",		ProtectFromEscapedAllocas("protect-from-escaped-allocas",
cl::init(false), cl::Hidden,		cl::init(false), cl::Hidden,
cl::desc("Do not optimize lifetime zones that "		cl::desc("Do not optimize lifetime zones that "
"are broken"));		"are broken"));

		/// Enable enhanced dataflow scheme for lifetime analysis (treat first
		/// use of stack slot as start of slot lifetime, as opposed to looking
		/// for LIFETIME_START marker). See "Implementation notes" below for
		/// more info.
		static cl::opt<bool>
		LifetimeStartOnFirstUse("stackcoloring-lifetime-start-on-first-use",
		cl::init(true), cl::Hidden,
		cl::desc("Treat stack lifetimes as starting on first use, not on START marker."));


STATISTIC(NumMarkerSeen, "Number of lifetime markers found.");		STATISTIC(NumMarkerSeen, "Number of lifetime markers found.");
STATISTIC(StackSpaceSaved, "Number of bytes saved due to merging slots.");		STATISTIC(StackSpaceSaved, "Number of bytes saved due to merging slots.");
STATISTIC(StackSlotMerged, "Number of stack slot merged.");		STATISTIC(StackSlotMerged, "Number of stack slot merged.");
STATISTIC(EscapedAllocas, "Number of allocas that escaped the lifetime region");		STATISTIC(EscapedAllocas, "Number of allocas that escaped the lifetime region");

		//
		// Implementation Notes:
		// ---------------------
		//
		// Consider the following motivating example:
		//
		// int foo() {
		// char b1[1024], b2[1024];
		// if (...) {
		// char b3[1024];
		// <uses of b1, b3>;
		// return x;
		// } else {
		// char b4[1024], b5[1024];
		// <uses of b2, b4, b5>;
		// return y;
		// }
		// }
		//
		// In the code above, "b3" and "b4" are declared in distinct lexical
		// scopes, meaning that it is easy to prove that they can share the
		// same stack slot. Variables "b1" and "b2" are declared in the same
		// scope, meaning that from a lexical point of view, their lifetimes
		// overlap. From a control flow pointer of view, however, the two
		// variables are accessed in disjoint regions of the CFG, thus it
		// should be possible for them to share the same stack slot. An ideal
		// stack allocation for the function above would look like:
		//
		// slot 0: b1, b2
		// slot 1: b3, b4
		// slot 2: b5
		//
		// Achieving this allocation is tricky, however, due to the way
		// lifetime markers are inserted. Here is a simplified view of the
		// control flow graph for the code above:
		//
		// +------ block 0 -------+
		// 0\| LIFETIME_START b1, b2 \|
		// 1\| <test 'if' condition> \|
		// +-----------------------+
		// ./ \.
		// +------ block 1 -------+ +------ block 2 -------+
		// 2\| LIFETIME_START b3 \| 5\| LIFETIME_START b4, b5 \|
		// 3\| <uses of b1, b3> \| 6\| <uses of b2, b4, b5> \|
		// 4\| LIFETIME_END b3 \| 7\| LIFETIME_END b4, b5 \|
		// +-----------------------+ +-----------------------+
		// \. /.
		// +------ block 3 -------+
		// 8\| <cleanupcode> \|
		// 9\| LIFETIME_END b1, b2 \|
		// 10\| return \|
		// +-----------------------+
		//
		// If we create live intervals for the variables above strictly based
		// on the lifetime markers, we'll get the set of intervals on the
		// left. If we ignore the lifetime start markers and instead treat a
		// variable's lifetime as beginning with the first reference to the
		// var, then we get the intervals on the right.
		//
		// LIFETIME_START First Use
		// b1: [0,9] [3,4] [8,9]
		// b2: [0,9] [6,9]
		// b3: [2,4] [3,4]
		// b4: [5,7] [6,7]
		// b5: [5,7] [6,7]
		//
		// For the intervals on the left, the best we can do is overlap two
		// variables (b3 and b4, for example); this gives us a stack size of
		// 4*1024 bytes, not ideal. When treating first-use as the start of a
		// lifetime, we can additionally overlap b1 and b5, giving us a 3*1024
		// byte stack (better).
		//
		// Relying entirely on first-use of stack slots is problematic,
		// however, due to the fact that optimizations can sometimes migrate
		// uses of a variable outside of its lifetime start/end region. Here
		wmiUnsubmitted Done Reply Inline Actions repeated "can sometimes" wmi: repeated "can sometimes"
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Fixed. thanm: Fixed.
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Fixed. thanm: Fixed.
		// is an example:
		//
		// int bar() {
		// char b1[1024], b2[1024];
		// if (...) {
		// <uses of b2>
		// return y;
		// } else {
		// <uses of b1>
		// while (...) {
		// char b3[1024];
		// <uses of b3>
		// }
		// }
		// }
		//
		// Before optimization, the control flow graph for the code above
		// might look like the following:
		//
		// +------ block 0 -------+
		// 0\| LIFETIME_START b1, b2 \|
		// 1\| <test 'if' condition> \|
		// +-----------------------+
		// ./ \.
		// +------ block 1 -------+ +------- block 2 -------+
		// 2\| <uses of b2> \| 3\| <uses of b1> \|
		// +-----------------------+ +-----------------------+
		// \| \|
		// \| +------- block 3 -------+ <-\.
		// \| 4\| <while condition> \| \|
		// \| +-----------------------+ \|
		// \| / \| \|
		// \| / +------- block 4 -------+
		// \ / 5\| LIFETIME_START b3 \| \|
		// \ / 6\| <uses of b3> \| \|
		// \ / 7\| LIFETIME_END b3 \| \|
		// \ \| +------------------------+ \|
		// \ \| \ /
		// +------ block 5 -----+ \---------------
		// 8\| <cleanupcode> \|
		// 9\| LIFETIME_END b1, b2 \|
		// 10\| return \|
		// +---------------------+
		//
		// During optimization, however, it can happen that an instruction
		// computing an address in "b3" (for example, a loop-invariant GEP) is
		// hoisted up out of the loop from block 4 to block 2. [Note that
		// this is not an actual load from the stack, only an instruction that
		// computes the address to be loaded]. If this happens, there is now a
		// path leading from the first use of b3 to the return instruction
		// that does not encounter the b3 LIFETIME_END, hence b3's lifetime is
		// now larger than if we were computing live intervals strictly based
		// on lifetime markers. In the example above, this lengthened lifetime
		// would mean that it would appear illegal to overlap b3 with b2.
		//
		// To deal with this such cases, the code in ::collectMarkers() below
		// tries to identify "degenerate" slots -- those slots where on a single
		// forward pass through the CFG we encounter a first reference to slot
		// K before we hit the slot K lifetime start marker. For such slots,
		// we fall back on using the lifetime start marker as the beginning of
		// the variable's lifetime. NB: with this implementation, slots can
		// appear degenerate in cases where there is unstructured control flow:
		//
		// if (q) goto mid;
		// if (x > 9) {
		// int b[100];
		// memcpy(&b[0], ...);
		// mid: b[k] = ...;
		// abc(&b);
		// }
		//
		// If in RPO ordering chosen to walk the CFG we happen to visit the b[k]
		// before visiting the memcpy block (which will contain the lifetime start
		// for "b" then it will appear that 'b' has a degenerate lifetime.
		//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// StackColoring Pass		// StackColoring Pass
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
/// StackColoring - A machine pass for merging disjoint stack allocations,		/// StackColoring - A machine pass for merging disjoint stack allocations,
/// marked by the LIFETIME_START and LIFETIME_END pseudo instructions.		/// marked by the LIFETIME_START and LIFETIME_END pseudo instructions.
class StackColoring : public MachineFunctionPass {		class StackColoring : public MachineFunctionPass {
Show All 31 Lines	class StackColoring : public MachineFunctionPass {
SlotIndexes *Indexes;		SlotIndexes *Indexes;
/// The stack protector object.		/// The stack protector object.
StackProtector *SP;		StackProtector *SP;

/// The list of lifetime markers found. These markers are to be removed		/// The list of lifetime markers found. These markers are to be removed
/// once the coloring is done.		/// once the coloring is done.
SmallVector<MachineInstr*, 8> Markers;		SmallVector<MachineInstr*, 8> Markers;

		/// Record the FI slots for which we have seen some sort of
		/// lifetime marker (either start or end).
		BitVector InterestingSlots;

		/// Degenerate slots -- first use appears outside of start/end
		/// lifetime markers.
		BitVector DegenerateSlots;

		/// Number of iterations taken during data flow analysis.
		unsigned NumIterations;

public:		public:
static char ID;		static char ID;
StackColoring() : MachineFunctionPass(ID) {		StackColoring() : MachineFunctionPass(ID) {
initializeStackColoringPass(*PassRegistry::getPassRegistry());		initializeStackColoringPass(*PassRegistry::getPassRegistry());
}		}
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

Show All 14 Lines	private:
unsigned collectMarkers(unsigned NumSlot);		unsigned collectMarkers(unsigned NumSlot);

/// Perform the dataflow calculation and calculate the lifetime for each of		/// Perform the dataflow calculation and calculate the lifetime for each of
/// the slots, based on the BEGIN/END vectors. Set the LifetimeLIVE_IN and		/// the slots, based on the BEGIN/END vectors. Set the LifetimeLIVE_IN and
/// LifetimeLIVE_OUT maps that represent which stack slots are live coming		/// LifetimeLIVE_OUT maps that represent which stack slots are live coming
/// in and out blocks.		/// in and out blocks.
void calculateLocalLiveness();		void calculateLocalLiveness();

		/// Returns TRUE if we're using the first-use-begins-lifetime method for
		/// this slot (if FALSE, then the start marker is treated as start of lifetime).
		bool applyFirstUse(int Slot) {
		if (!LifetimeStartOnFirstUse \|\| ProtectFromEscapedAllocas)
		return false;
		if (DegenerateSlots.test(Slot))
		qcolombetUnsubmitted Not Done Reply Inline Actions Can’t we theoretically have an interesting lifetime start and end on the same instruction. The API does not allow to model that. What happens in that case? qcolombet: Can’t we theoretically have an interesting lifetime start and end on the same instruction. The…
		thanmAuthorUnsubmitted Not Done Reply Inline Actions As far as I know (take this with a grain of salt, since I am an LLVM newbie) the only way to end a lifetime is with a LIFETIME_END marker instruction. If you can think of a scenario where you might have both, then yes, this should definitely be changed. Let me know if you have a real example? thanm: As far as I know (take this with a grain of salt, since I am an LLVM newbie) the only way to…
		qcolombetUnsubmitted Not Done Reply Inline Actions No, you’re right, for lifetime markers this cannot happen at the same time. I was thinking in term of live-range. Maybe add in the comment that we consider only lifetime end from LIFETIME_END instructions and thus, we cannot have both conditions (start and end) at the same time. qcolombet: No, you’re right, for lifetime markers this cannot happen at the same time. I was thinking in…
		return false;
		return true;
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions Having look at the implementation, this method does not support nullptr as input argument so please, use references. qcolombet: Having look at the implementation, this method does not support nullptr as input argument so…
		thanmAuthorUnsubmitted Not Done Reply Inline Actions OK, I will convert to use references. thanm: OK, I will convert to use references.

		/// Examines the specified instruction and returns TRUE if the instruction
		/// represents the start or end of an interesting lifetime. The slot or slots
		/// starting or ending are added to the vector "slots" and "isStart" is set
		/// accordingly.
		/// \returns True if inst contains a lifetime start or end
		bool isLifetimeStartOrEnd(const MachineInstr &MI,
		SmallVector<int, 4> &slots,
		bool &isStart);

/// Construct the LiveIntervals for the slots.		/// Construct the LiveIntervals for the slots.
void calculateLiveIntervals(unsigned NumSlots);		void calculateLiveIntervals(unsigned NumSlots);

/// Go over the machine function and change instructions which use stack		/// Go over the machine function and change instructions which use stack
/// slots to use the joint slots.		/// slots to use the joint slots.
void remapInstructions(DenseMap<int, int> &SlotRemap);		void remapInstructions(DenseMap<int, int> &SlotRemap);

/// The input program may contain instructions which are not inside lifetime		/// The input program may contain instructions which are not inside lifetime
/// markers. This can happen due to a bug in the compiler or due to a bug in		/// markers. This can happen due to a bug in the compiler or due to a bug in
/// user code (for example, returning a reference to a local variable).		/// user code (for example, returning a reference to a local variable).
/// This procedure checks all of the instructions in the function and		/// This procedure checks all of the instructions in the function and
/// invalidates lifetime ranges which do not contain all of the instructions		/// invalidates lifetime ranges which do not contain all of the instructions
/// which access that frame slot.		/// which access that frame slot.
void removeInvalidSlotRanges();		void removeInvalidSlotRanges();

/// Map entries which point to other entries to their destination.		/// Map entries which point to other entries to their destination.
/// A->B->C becomes A->C.		/// A->B->C becomes A->C.
void expungeSlotMap(DenseMap<int, int> &SlotRemap, unsigned NumSlots);		void expungeSlotMap(DenseMap<int, int> &SlotRemap, unsigned NumSlots);

		/// Used in collectMarkers
		typedef DenseMap<const MachineBasicBlock*, BitVector> BlockBitVecMap;
};		};
} // end anonymous namespace		} // end anonymous namespace

char StackColoring::ID = 0;		char StackColoring::ID = 0;
char &llvm::StackColoringID = StackColoring::ID;		char &llvm::StackColoringID = StackColoring::ID;

INITIALIZE_PASS_BEGIN(StackColoring,		INITIALIZE_PASS_BEGIN(StackColoring,
"stack-coloring", "Merge disjoint stack slots", false, false)		"stack-coloring", "Merge disjoint stack slots", false, false)
Show All 34 Lines	for (MachineBasicBlock *MBB : depth_first(MF)) {
DEBUG(dbgs() << "Inspecting block #" << MBB->getNumber() << " ["		DEBUG(dbgs() << "Inspecting block #" << MBB->getNumber() << " ["
<< MBB->getName() << "]\n");		<< MBB->getName() << "]\n");
DEBUG(dumpBB(MBB));		DEBUG(dumpBB(MBB));
}		}
}		}

LLVM_DUMP_METHOD void StackColoring::dumpIntervals() const {		LLVM_DUMP_METHOD void StackColoring::dumpIntervals() const {
for (unsigned I = 0, E = Intervals.size(); I != E; ++I) {		for (unsigned I = 0, E = Intervals.size(); I != E; ++I) {
DEBUG(dbgs() << "Interval[" << I << "]:\n");		DEBUG(dbgs() << "Interval[" << I << "]:\n");
		qcolombetUnsubmitted Not Done Reply Inline Actions Add a message in assert. qcolombet: Add a message in assert.
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Will fix. thanm: Will fix.
DEBUG(Intervals[I]->dump());		DEBUG(Intervals[I]->dump());
}		}
		qcolombetUnsubmitted Not Done Reply Inline Actions All the dump methods are refactoring right? If so, please commit them separately. qcolombet: All the dump methods are refactoring right? If so, please commit them separately.
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Yes, all changes to dump methods are refactoring. I will pull them into a separate patch. thanm: Yes, all changes to dump methods are refactoring. I will pull them into a separate patch.
}		}

#endif // not NDEBUG		#endif // not NDEBUG

unsigned StackColoring::collectMarkers(unsigned NumSlot) {		static inline int getStartOrEndSlot(const MachineInstr &MI)
		{
		assert((MI.getOpcode() == TargetOpcode::LIFETIME_START \|\|
		MI.getOpcode() == TargetOpcode::LIFETIME_END) &&
		"Expected LIFETIME_START or LIFETIME_END op");
		const MachineOperand &MO = MI.getOperand(0);
		int Slot = MO.getIndex();
		qcolombetUnsubmitted Not Done Reply Inline Actions Get rid of the else. http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return qcolombet: Get rid of the else. http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Will fix. thanm: Will fix.
		if (Slot >= 0)
		return Slot;
		return -1;
		}

		//
		// At the moment the only way to end a variable lifetime is with
		// a VARIABLE_LIFETIME op (which can't contain a start). If things
		// change and the IR allows for a single inst that both begins
		// and ends lifetime(s), this interface will need to be reworked.
		//
		bool StackColoring::isLifetimeStartOrEnd(const MachineInstr &MI,
		SmallVector<int, 4> &slots,
		bool &isStart)
		{
		if (MI.getOpcode() == TargetOpcode::LIFETIME_START \|\|
		MI.getOpcode() == TargetOpcode::LIFETIME_END) {
		int Slot = getStartOrEndSlot(MI);
		if (Slot < 0)
		return false;
		if (!InterestingSlots.test(Slot))
		return false;
		slots.push_back(Slot);
		if (MI.getOpcode() == TargetOpcode::LIFETIME_END) {
		isStart = false;
		return true;
		}
		if (! applyFirstUse(Slot)) {
		isStart = true;
		return true;
		}
		} else if (LifetimeStartOnFirstUse && !ProtectFromEscapedAllocas) {
		if (! MI.isDebugValue()) {
		bool found = false;
		for (const MachineOperand &MO : MI.operands()) {
		if (!MO.isFI())
		continue;
		int Slot = MO.getIndex();
		if (Slot<0)
		continue;
		if (InterestingSlots.test(Slot) && applyFirstUse(Slot)) {
		slots.push_back(Slot);
		found = true;
		}
		}
		if (found) {
		isStart = true;
		return true;
		}
		}
		}
		return false;
		}

		unsigned StackColoring::collectMarkers(unsigned NumSlot)
		{
unsigned MarkersFound = 0;		unsigned MarkersFound = 0;
// Scan the function to find all lifetime markers.		BlockBitVecMap SeenStartMap;
		InterestingSlots.clear();
		InterestingSlots.resize(NumSlot);
		DegenerateSlots.clear();
		DegenerateSlots.resize(NumSlot);

		// Step 1: collect markers and populate the "InterestingSlots"
		// and "DegenerateSlots" sets.
		for (MachineBasicBlock *MBB : depth_first(MF)) {

		// Compute the set of slots for which we've seen a START marker but have
		// not yet seen an END marker at this point in the walk (e.g. on entry
		// to this bb).
		BitVector BetweenStartEnd;
		BetweenStartEnd.resize(NumSlot);
		for (MachineBasicBlock::const_pred_iterator PI = MBB->pred_begin(),
		PE = MBB->pred_end(); PI != PE; ++PI) {
		BlockBitVecMap::const_iterator I = SeenStartMap.find(*PI);
		if (I != SeenStartMap.end()) {
		BetweenStartEnd \|= I->second;
		}
		}

		// Walk the instructions in the block to look for start/end ops.
		for (MachineInstr &MI : *MBB) {
		if (MI.getOpcode() == TargetOpcode::LIFETIME_START \|\|
		MI.getOpcode() == TargetOpcode::LIFETIME_END) {
		int Slot = getStartOrEndSlot(MI);
		if (Slot < 0)
		continue;
		InterestingSlots.set(Slot);
		if (MI.getOpcode() == TargetOpcode::LIFETIME_START)
		BetweenStartEnd.set(Slot);
		else
		BetweenStartEnd.reset(Slot);
		const AllocaInst *Allocation = MFI->getObjectAllocation(Slot);
		if (Allocation) {
		DEBUG(dbgs() << "Found a lifetime ");
		DEBUG(dbgs() << (MI.getOpcode() == TargetOpcode::LIFETIME_START
		? "start"
		: "end"));
		DEBUG(dbgs() << " marker for slot #" << Slot);
		DEBUG(dbgs() << " with allocation: " << Allocation->getName()
		<< "\n");
		}
		Markers.push_back(&MI);
		MarkersFound += 1;
		} else {
		for (const MachineOperand &MO : MI.operands()) {
		if (!MO.isFI())
		continue;
		int Slot = MO.getIndex();
		if (Slot < 0)
		continue;
		if (! BetweenStartEnd.test(Slot)) {
		DegenerateSlots.set(Slot);
		wmiUnsubmitted Not Done Reply Inline Actions should we add an assertion "!MI.mayLoad() && !MI.mayStore()" so if weird escaped reference case does exist, they will be caught at compile time? wmi: should we add an assertion "!MI.mayLoad() && !MI.mayStore()" so if weird escaped reference case…
		thanmAuthorUnsubmitted Not Done Reply Inline Actions This is difficult to accomplish with my current implementation (see more detail on comment). thanm: This is difficult to accomplish with my current implementation (see more detail on comment).
		}
		}
		}
		}
		BitVector &SeenStart = SeenStartMap[MBB];
		SeenStart \|= BetweenStartEnd;
		}
		if (!MarkersFound) {
		return 0;
		}
		DEBUG(dumpBV("Degenerate slots", DegenerateSlots));

		// Step 2: compute begin/end sets for each block

// NOTE: We use a reverse-post-order iteration to ensure that we obtain a		// NOTE: We use a reverse-post-order iteration to ensure that we obtain a
// deterministic numbering, and because we'll need a post-order iteration		// deterministic numbering, and because we'll need a post-order iteration
// later for solving the liveness dataflow problem.		// later for solving the liveness dataflow problem.
for (MachineBasicBlock *MBB : depth_first(MF)) {		for (MachineBasicBlock *MBB : depth_first(MF)) {

// Assign a serial number to this basic block.		// Assign a serial number to this basic block.
BasicBlocks[MBB] = BasicBlockNumbering.size();		BasicBlocks[MBB] = BasicBlockNumbering.size();
BasicBlockNumbering.push_back(MBB);		BasicBlockNumbering.push_back(MBB);

// Keep a reference to avoid repeated lookups.		// Keep a reference to avoid repeated lookups.
BlockLifetimeInfo &BlockInfo = BlockLiveness[MBB];		BlockLifetimeInfo &BlockInfo = BlockLiveness[MBB];

BlockInfo.Begin.resize(NumSlot);		BlockInfo.Begin.resize(NumSlot);
BlockInfo.End.resize(NumSlot);		BlockInfo.End.resize(NumSlot);

		SmallVector<int, 4> slots;
		qcolombetUnsubmitted Not Done Reply Inline Actions SmallVector? qcolombet: SmallVector?
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Will fix. thanm: Will fix.
for (MachineInstr &MI : *MBB) {		for (MachineInstr &MI : *MBB) {
if (MI.getOpcode() != TargetOpcode::LIFETIME_START &&		bool isStart = false;
MI.getOpcode() != TargetOpcode::LIFETIME_END)		slots.clear();
continue;		if (isLifetimeStartOrEnd(MI, slots, isStart)) {
		if (!isStart) {
		wmiUnsubmitted Not Done Reply Inline Actions Missing an assert message. wmi: Missing an assert message.
bool IsStart = MI.getOpcode() == TargetOpcode::LIFETIME_START;		assert(slots.size() == 1 && "unexpected: MI ends multiple slots");
		wmiUnsubmitted Done Reply Inline Actions assert message missing. wmi: assert message missing.
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Fixed thanm: Fixed
const MachineOperand &MO = MI.getOperand(0);		int Slot = slots[0];
int Slot = MO.getIndex();		if (BlockInfo.Begin.test(Slot)) {
if (Slot < 0)		BlockInfo.Begin.reset(Slot);
continue;		}
		BlockInfo.End.set(Slot);
Markers.push_back(&MI);		} else {
		for (auto Slot : slots) {
MarkersFound++;		DEBUG(dbgs() << "Found a use of slot #" << Slot);
		DEBUG(dbgs() << " at BB#" << MBB->getNumber() << " index ");
		DEBUG(Indexes->getInstructionIndex(MI).print(dbgs()));
const AllocaInst *Allocation = MFI->getObjectAllocation(Slot);		const AllocaInst *Allocation = MFI->getObjectAllocation(Slot);
if (Allocation) {		if (Allocation) {
DEBUG(dbgs()<<"Found a lifetime marker for slot #"<<Slot<<		DEBUG(dbgs() << " with allocation: "<< Allocation->getName());
" with allocation: "<< Allocation->getName()<<"\n");		}
		DEBUG(dbgs() << "\n");
		if (BlockInfo.End.test(Slot)) {
		BlockInfo.End.reset(Slot);
}		}

if (IsStart) {
BlockInfo.Begin.set(Slot);		BlockInfo.Begin.set(Slot);
} else {		}
if (BlockInfo.Begin.test(Slot)) {
// Allocas that start and end within a single block are handled
// specially when computing the LiveIntervals to avoid pessimizing
// the liveness propagation.
BlockInfo.Begin.reset(Slot);
} else {
BlockInfo.End.set(Slot);
}		}
}		}
}		}
}		}

// Update statistics.		// Update statistics.
NumMarkerSeen += MarkersFound;		NumMarkerSeen += MarkersFound;
return MarkersFound;		return MarkersFound;
}		}

void StackColoring::calculateLocalLiveness() {		void StackColoring::calculateLocalLiveness()
// Perform a standard reverse dataflow computation to solve for		{
// global liveness. The BEGIN set here is equivalent to KILL in the standard		unsigned NumIters = 0;
// formulation, and END is equivalent to GEN. The result of this computation
// is a map from blocks to bitvectors where the bitvectors represent which
// allocas are live in/out of that block.
SmallPtrSet<const MachineBasicBlock*, 8> BBSet(BasicBlockNumbering.begin(),
BasicBlockNumbering.end());
unsigned NumSSMIters = 0;
bool changed = true;		bool changed = true;
while (changed) {		while (changed) {
changed = false;		changed = false;
++NumSSMIters;		++NumIters;

SmallPtrSet<const MachineBasicBlock*, 8> NextBBSet;

for (const MachineBasicBlock *BB : BasicBlockNumbering) {		for (const MachineBasicBlock *BB : BasicBlockNumbering) {
if (!BBSet.count(BB)) continue;

// Use an iterator to avoid repeated lookups.		// Use an iterator to avoid repeated lookups.
LivenessMap::iterator BI = BlockLiveness.find(BB);		LivenessMap::iterator BI = BlockLiveness.find(BB);
assert(BI != BlockLiveness.end() && "Block not found");		assert(BI != BlockLiveness.end() && "Block not found");
BlockLifetimeInfo &BlockInfo = BI->second;		BlockLifetimeInfo &BlockInfo = BI->second;

		// Compute LiveIn by unioning together the LiveOut sets of all preds.
BitVector LocalLiveIn;		BitVector LocalLiveIn;
BitVector LocalLiveOut;

// Forward propagation from begins to ends.
for (MachineBasicBlock::const_pred_iterator PI = BB->pred_begin(),		for (MachineBasicBlock::const_pred_iterator PI = BB->pred_begin(),
PE = BB->pred_end(); PI != PE; ++PI) {		PE = BB->pred_end(); PI != PE; ++PI) {
LivenessMap::const_iterator I = BlockLiveness.find(*PI);		LivenessMap::const_iterator I = BlockLiveness.find(*PI);
assert(I != BlockLiveness.end() && "Predecessor not found");		assert(I != BlockLiveness.end() && "Predecessor not found");
LocalLiveIn \|= I->second.LiveOut;		LocalLiveIn \|= I->second.LiveOut;
}		}
LocalLiveIn \|= BlockInfo.End;
LocalLiveIn.reset(BlockInfo.Begin);

// Reverse propagation from ends to begins.
for (MachineBasicBlock::const_succ_iterator SI = BB->succ_begin(),
SE = BB->succ_end(); SI != SE; ++SI) {
LivenessMap::const_iterator I = BlockLiveness.find(*SI);
assert(I != BlockLiveness.end() && "Successor not found");
LocalLiveOut \|= I->second.LiveIn;
}
LocalLiveOut \|= BlockInfo.Begin;
LocalLiveOut.reset(BlockInfo.End);

LocalLiveIn \|= LocalLiveOut;
LocalLiveOut \|= LocalLiveIn;

// After adopting the live bits, we need to turn-off the bits which		// Compute LiveOut by subtracting out lifetimes that end in this
// are de-activated in this block.		// block, then adding in lifetimes that begin in this block. If
		// we have both BEGIN and END markers in the same basic block
		// then we know that the BEGIN marker comes after the END,
		// because we already handle the case where the BEGIN comes
		// before the END when collecting the markers (and building the
		// BEGIN/END vectors).
		BitVector LocalLiveOut = LocalLiveIn;
LocalLiveOut.reset(BlockInfo.End);		LocalLiveOut.reset(BlockInfo.End);
LocalLiveIn.reset(BlockInfo.Begin);		LocalLiveOut \|= BlockInfo.Begin;

// If we have both BEGIN and END markers in the same basic block then
// we know that the BEGIN marker comes after the END, because we already
// handle the case where the BEGIN comes before the END when collecting
// the markers (and building the BEGIN/END vectore).
// Want to enable the LIVE_IN and LIVE_OUT of slots that have both
// BEGIN and END because it means that the value lives before and after
// this basic block.
BitVector LocalEndBegin = BlockInfo.End;
LocalEndBegin &= BlockInfo.Begin;
LocalLiveIn \|= LocalEndBegin;
LocalLiveOut \|= LocalEndBegin;

		// Update block LiveIn set, noting whether it has changed.
if (LocalLiveIn.test(BlockInfo.LiveIn)) {		if (LocalLiveIn.test(BlockInfo.LiveIn)) {
changed = true;		changed = true;
BlockInfo.LiveIn \|= LocalLiveIn;		BlockInfo.LiveIn \|= LocalLiveIn;

NextBBSet.insert(BB->pred_begin(), BB->pred_end());
}		}

		// Update block LiveOut set, noting whether it has changed.
		wmiUnsubmitted Not Done Reply Inline Actions Since we only do dataflow iteration in forward direction now, if only no change in LiveIn, there shouldn't be change in LiveOut. So can we move BlockInfo.LiveOut \|= LocalLiveOut into the above if? wmi: Since we only do dataflow iteration in forward direction now, if only no change in LiveIn…
		thanmAuthorUnsubmitted Not Done Reply Inline Actions I think there would be a problem with that on the first dataflow iteration for blocks containing a GEN (or first-use) of some variable "x". In that case we wind up adding "X" to the live out set, so we need to update the changed flag. On all subsequent iterations what you say is true. thanm: I think there would be a problem with that on the first dataflow iteration for blocks…
if (LocalLiveOut.test(BlockInfo.LiveOut)) {		if (LocalLiveOut.test(BlockInfo.LiveOut)) {
changed = true;		changed = true;
BlockInfo.LiveOut \|= LocalLiveOut;		BlockInfo.LiveOut \|= LocalLiveOut;

NextBBSet.insert(BB->succ_begin(), BB->succ_end());
}		}
}		}

BBSet = std::move(NextBBSet);
}// while changed.		}// while changed.

		NumIterations = NumIters;
}		}

void StackColoring::calculateLiveIntervals(unsigned NumSlots) {		void StackColoring::calculateLiveIntervals(unsigned NumSlots) {
SmallVector<SlotIndex, 16> Starts;		SmallVector<SlotIndex, 16> Starts;
SmallVector<SlotIndex, 16> Finishes;		SmallVector<SlotIndex, 16> Finishes;

// For each block, find which slots are active within this block		// For each block, find which slots are active within this block
// and update the live intervals.		// and update the live intervals.
for (const MachineBasicBlock &MBB : *MF) {		for (const MachineBasicBlock &MBB : *MF) {
Starts.clear();		Starts.clear();
Starts.resize(NumSlots);		Starts.resize(NumSlots);
Finishes.clear();		Finishes.clear();
Finishes.resize(NumSlots);		Finishes.resize(NumSlots);

// Create the interval for the basic blocks with lifetime markers in them.		// Create the interval for the basic blocks containing lifetime begin/end.
for (const MachineInstr *MI : Markers) {		for (const MachineInstr &MI : MBB) {
		qcolombetUnsubmitted Not Done Reply Inline Actions Period. qcolombet: Period.
if (MI->getParent() != &MBB)
continue;

		thanmAuthorUnsubmitted Not Done Reply Inline Actions Will fix. thanm: Will fix.
assert((MI->getOpcode() == TargetOpcode::LIFETIME_START \|\|		SmallVector<int, 4> slots;
MI->getOpcode() == TargetOpcode::LIFETIME_END) &&		bool IsStart = false;
"Invalid Lifetime marker");		if (!isLifetimeStartOrEnd(MI, slots, IsStart))

bool IsStart = MI->getOpcode() == TargetOpcode::LIFETIME_START;
const MachineOperand &Mo = MI->getOperand(0);
int Slot = Mo.getIndex();
if (Slot < 0)
continue;		continue;
		SlotIndex ThisIndex = Indexes->getInstructionIndex(MI);
SlotIndex ThisIndex = Indexes->getInstructionIndex(*MI);		for (auto Slot : slots) {

if (IsStart) {		if (IsStart) {
if (!Starts[Slot].isValid() \|\| Starts[Slot] > ThisIndex)		if (!Starts[Slot].isValid() \|\| Starts[Slot] > ThisIndex)
Starts[Slot] = ThisIndex;		Starts[Slot] = ThisIndex;
} else {		} else {
if (!Finishes[Slot].isValid() \|\| Finishes[Slot] < ThisIndex)		if (!Finishes[Slot].isValid() \|\| Finishes[Slot] < ThisIndex)
Finishes[Slot] = ThisIndex;		Finishes[Slot] = ThisIndex;
}		}
}		}
		}

// Create the interval of the blocks that we previously found to be 'alive'.		// Create the interval of the blocks that we previously found to be 'alive'.
BlockLifetimeInfo &MBBLiveness = BlockLiveness[&MBB];		BlockLifetimeInfo &MBBLiveness = BlockLiveness[&MBB];
for (int pos = MBBLiveness.LiveIn.find_first(); pos != -1;		for (int pos = MBBLiveness.LiveIn.find_first(); pos != -1;
pos = MBBLiveness.LiveIn.find_next(pos)) {		pos = MBBLiveness.LiveIn.find_next(pos)) {
Starts[pos] = Indexes->getMBBStartIdx(&MBB);		Starts[pos] = Indexes->getMBBStartIdx(&MBB);
}		}
for (int pos = MBBLiveness.LiveOut.find_first(); pos != -1;		for (int pos = MBBLiveness.LiveOut.find_first(); pos != -1;
pos = MBBLiveness.LiveOut.find_next(pos)) {		pos = MBBLiveness.LiveOut.find_next(pos)) {
Finishes[pos] = Indexes->getMBBEndIdx(&MBB);		Finishes[pos] = Indexes->getMBBEndIdx(&MBB);
}		}

for (unsigned i = 0; i < NumSlots; ++i) {		for (unsigned i = 0; i < NumSlots; ++i) {
assert(Starts[i].isValid() == Finishes[i].isValid() && "Unmatched range");		//
		// When LifetimeStartOnFirstUse is turned on, data flow analysis
		// is forward (from starts to ends), not bidirectional. A
		// consequence of this is that we can wind up in situations
		// where Starts[i] is invalid but Finishes[i] is valid and vice
		// versa. Example:
		//
		// LIFETIME_START x
		// if (...) {
		// <use of x>
		// throw ...;
		// }
		// LIFETIME_END x
		// return 2;
		//
		//
		// Here the slot for "x" will not be live into the block
		// containing the "return 2" (since lifetimes start with first
		// use, not at the dominating LIFETIME_START marker).
		//
		if (Starts[i].isValid() && !Finishes[i].isValid()) {
		Finishes[i] = Indexes->getMBBEndIdx(&MBB);
		}
if (!Starts[i].isValid())		if (!Starts[i].isValid())
continue;		continue;

assert(Starts[i] && Finishes[i] && "Invalid interval");		assert(Starts[i] && Finishes[i] && "Invalid interval");
VNInfo *ValNum = Intervals[i]->getValNumInfo(0);		VNInfo *ValNum = Intervals[i]->getValNumInfo(0);
SlotIndex S = Starts[i];		SlotIndex S = Starts[i];
SlotIndex F = Finishes[i];		SlotIndex F = Finishes[i];
if (S < F) {		if (S < F) {
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	bool StackColoring::runOnMachineFunction(MachineFunction &Func) {

unsigned NumSlots = MFI->getObjectIndexEnd();		unsigned NumSlots = MFI->getObjectIndexEnd();

// If there are no stack slots then there are no markers to remove.		// If there are no stack slots then there are no markers to remove.
if (!NumSlots)		if (!NumSlots)
return false;		return false;

SmallVector<int, 8> SortedSlots;		SmallVector<int, 8> SortedSlots;

SortedSlots.reserve(NumSlots);		SortedSlots.reserve(NumSlots);
Intervals.reserve(NumSlots);		Intervals.reserve(NumSlots);

unsigned NumMarkers = collectMarkers(NumSlots);		unsigned NumMarkers = collectMarkers(NumSlots);

unsigned TotalSize = 0;		unsigned TotalSize = 0;
DEBUG(dbgs()<<"Found "<<NumMarkers<<" markers and "<<NumSlots<<" slots\n");		DEBUG(dbgs()<<"Found "<<NumMarkers<<" markers and "<<NumSlots<<" slots\n");
DEBUG(dbgs()<<"Slot structure:\n");		DEBUG(dbgs()<<"Slot structure:\n");
Show All 15 Lines	bool StackColoring::runOnMachineFunction(MachineFunction &Func) {
for (unsigned i=0; i < NumSlots; ++i) {		for (unsigned i=0; i < NumSlots; ++i) {
std::unique_ptr<LiveInterval> LI(new LiveInterval(i, 0));		std::unique_ptr<LiveInterval> LI(new LiveInterval(i, 0));
LI->getNextValue(Indexes->getZeroIndex(), VNInfoAllocator);		LI->getNextValue(Indexes->getZeroIndex(), VNInfoAllocator);
Intervals.push_back(std::move(LI));		Intervals.push_back(std::move(LI));
SortedSlots.push_back(i);		SortedSlots.push_back(i);
}		}

// Calculate the liveness of each block.		// Calculate the liveness of each block.
calculateLocalLiveness();		calculateLocalLiveness();
		DEBUG(dbgs() << "Dataflow iterations: " << NumIterations << "\n");
		DEBUG(dump());
		wmiUnsubmitted Not Done Reply Inline Actions There may be an unused warning when NDEBUG is enabled. wmi: There may be an unused warning when NDEBUG is enabled.
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Hmm. Good point. I will rework the code to handle this. thanm: Hmm. Good point. I will rework the code to handle this.

// Propagate the liveness information.		// Propagate the liveness information.
calculateLiveIntervals(NumSlots);		calculateLiveIntervals(NumSlots);
		DEBUG(dumpIntervals());
		wmiUnsubmitted Not Done Reply Inline Actions We want dumpIntervals() after intervals being computed. wmi: We want dumpIntervals() after intervals being computed.

// Search for allocas which are used outside of the declared lifetime		// Search for allocas which are used outside of the declared lifetime
// markers.		// markers.
if (ProtectFromEscapedAllocas)		if (ProtectFromEscapedAllocas)
removeInvalidSlotRanges();		removeInvalidSlotRanges();

// Maps old slots to new slots.		// Maps old slots to new slots.
DenseMap<int, int> SlotRemap;		DenseMap<int, int> SlotRemap;
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

test/CodeGen/X86/StackColoring.ll

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	bb2:
%t6 = add i32 %t3, %t4		%t6 = add i32 %t3, %t4
%t7 = add i32 %t5, %t6		%t7 = add i32 %t5, %t6
ret i32 %t7		ret i32 %t7
bb3:		bb3:
ret i32 0		ret i32 0
}		}

;CHECK-LABEL: myCall_w4:		;CHECK-LABEL: myCall_w4:
;YESCOLOR: subq $200, %rsp		;YESCOLOR: subq $120, %rsp
;NOCOLOR: subq $408, %rsp		;NOCOLOR: subq $408, %rsp

define i32 @myCall_w4(i32 %in) {		define i32 @myCall_w4(i32 %in) {
entry:		entry:
%a1 = alloca [14 x i8*], align 8		%a1 = alloca [14 x i8*], align 8
%a2 = alloca [13 x i8*], align 8		%a2 = alloca [13 x i8*], align 8
%a3 = alloca [12 x i8*], align 8		%a3 = alloca [12 x i8*], align 8
%a4 = alloca [11 x i8*], align 8		%a4 = alloca [11 x i8*], align 8
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	bb2:
%t7 = add i32 %t5, %t6		%t7 = add i32 %t5, %t6
ret i32 %t7		ret i32 %t7
bb3:		bb3:
ret i32 0		ret i32 0
}		}


;CHECK-LABEL: myCall2_nostart:		;CHECK-LABEL: myCall2_nostart:
;YESCOLOR: subq $144, %rsp		;YESCOLOR: subq $272, %rsp
;NOCOLOR: subq $272, %rsp		;NOCOLOR: subq $272, %rsp
define i32 @myCall2_nostart(i32 %in, i1 %d) {		define i32 @myCall2_nostart(i32 %in, i1 %d) {
entry:		entry:
%a = alloca [17 x i8*], align 8		%a = alloca [17 x i8*], align 8
%a2 = alloca [16 x i8*], align 8		%a2 = alloca [16 x i8*], align 8
%b = bitcast [17 x i8] %a to i8*		%b = bitcast [17 x i8] %a to i8*
%b2 = bitcast [16 x i8] %a2 to i8*		%b2 = bitcast [16 x i8] %a2 to i8*
%t1 = call i32 @foo(i32 %in, i8* %b)		%t1 = call i32 @foo(i32 %in, i8* %b)
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	define i32 @shady_range(i32 %argc, i8** nocapture %argv) uwtable {
%z3 = load i32, i32* %z2, align 16		%z3 = load i32, i32* %z2, align 16
%r = call i32 @foo(i32 %z3, i8* %a8)		%r = call i32 @foo(i32 %z3, i8* %a8)
%r2 = call i32 @foo(i32 %z3, i8* %b8)		%r2 = call i32 @foo(i32 %z3, i8* %b8)
call void @llvm.lifetime.end(i64 -1, i8* %a8)		call void @llvm.lifetime.end(i64 -1, i8* %a8)
call void @llvm.lifetime.end(i64 -1, i8* %b8)		call void @llvm.lifetime.end(i64 -1, i8* %b8)
ret i32 9		ret i32 9
}		}

		; In this case 'itar1' and 'itar2' can't be overlapped if we treat
		; lifetime.start as the beginning of the lifetime, but we can
		; overlap if we consider first use of the slot as lifetime
		; start. See llvm bug 25776.

		;CHECK-LABEL: ifthen_twoslots:
		;YESCOLOR: subq $536, %rsp
		;NOCOLOR: subq $1048, %rsp

		define i32 @ifthen_twoslots(i32 %x) #0 {
		entry:
		%retval = alloca i32, align 4
		%x.addr = alloca i32, align 4
		%itar1 = alloca [128 x i32], align 16
		%itar2 = alloca [128 x i32], align 16
		%cleanup.dest.slot = alloca i32
		store i32 %x, i32* %x.addr, align 4
		%itar1_start_8 = bitcast [128 x i32]* %itar1 to i8*
		wmiUnsubmitted Not Done Reply Inline Actions %0, %1, %2 are prone to changing. we can run "opt -instnamer" to rename %0, %1, %2..., so the test will be stabler. wmi: %0, %1, %2 are prone to changing. we can run "opt -instnamer" to rename %0, %1, %2..., so the…
		thanmAuthorUnsubmitted Not Done Reply Inline Actions Thanks-- I hadn't realized that we had such a tool. I'll update the test. thanm: Thanks-- I hadn't realized that we had such a tool. I'll update the test.
		call void @llvm.lifetime.start(i64 512, i8* %itar1_start_8) #3
		%itar2_start_8 = bitcast [128 x i32]* %itar2 to i8*
		call void @llvm.lifetime.start(i64 512, i8* %itar2_start_8) #3
		%xval = load i32, i32* %x.addr, align 4
		%and = and i32 %xval, 1
		%tobool = icmp ne i32 %and, 0
		br i1 %tobool, label %if.then, label %if.else

		if.then: ; preds = %entry
		%arraydecay = getelementptr inbounds [128 x i32], [128 x i32]* %itar1, i32 0, i32 0
		call void @inita(i32* %arraydecay)
		store i32 1, i32* %retval, align 4
		store i32 1, i32* %cleanup.dest.slot, align 4
		%itar2_end_8 = bitcast [128 x i32]* %itar2 to i8*
		call void @llvm.lifetime.end(i64 512, i8* %itar2_end_8) #3
		%itar1_end_8 = bitcast [128 x i32]* %itar1 to i8*
		call void @llvm.lifetime.end(i64 512, i8* %itar1_end_8) #3
		br label %cleanup

		if.else: ; preds = %entry
		%arraydecay1 = getelementptr inbounds [128 x i32], [128 x i32]* %itar2, i32 0, i32 0
		call void @inita(i32* %arraydecay1)
		store i32 0, i32* %retval, align 4
		store i32 1, i32* %cleanup.dest.slot, align 4
		%itar2_end2_8 = bitcast [128 x i32]* %itar2 to i8*
		call void @llvm.lifetime.end(i64 512, i8* %itar2_end2_8) #3
		%itar1_end2_8 = bitcast [128 x i32]* %itar1 to i8*
		call void @llvm.lifetime.end(i64 512, i8* %itar1_end2_8) #3
		br label %cleanup

		cleanup: ; preds = %if.else, %if.then
		%final_retval = load i32,
		i32* %retval, align 4
		ret i32 %final_retval
		}

		; This function is intended to test the case where you
		; have a reference to a stack slot that lies outside of
		; the START/END lifetime markers-- the flow analysis
		; should catch this and build the lifetime based on the
		; markers only.

		;CHECK-LABEL: while_loop:
		;YESCOLOR: subq $1032, %rsp
		;NOCOLOR: subq $1544, %rsp

		define i32 @while_loop(i32 %x) #0 {
		entry:
		%b1 = alloca [128 x i32], align 16
		%b2 = alloca [128 x i32], align 16
		%b3 = alloca [128 x i32], align 16
		%tmp = bitcast [128 x i32]* %b1 to i8*
		call void @llvm.lifetime.start(i64 512, i8* %tmp) #3
		%tmp1 = bitcast [128 x i32]* %b2 to i8*
		call void @llvm.lifetime.start(i64 512, i8* %tmp1) #3
		%and = and i32 %x, 1
		%tobool = icmp eq i32 %and, 0
		br i1 %tobool, label %if.else, label %if.then

		if.then: ; preds = %entry
		%arraydecay = getelementptr inbounds [128 x i32], [128 x i32]* %b2, i64 0, i64 0
		call void @inita(i32* %arraydecay) #3
		br label %if.end

		if.else: ; preds = %entry
		%arraydecay1 = getelementptr inbounds [128 x i32], [128 x i32]* %b1, i64 0, i64 0
		call void @inita(i32* %arraydecay1) #3
		%arraydecay3 = getelementptr inbounds [128 x i32], [128 x i32]* %b3, i64 0, i64 0
		call void @inita(i32* %arraydecay3) #3
		%tobool25 = icmp eq i32 %x, 0
		br i1 %tobool25, label %if.end, label %while.body.lr.ph

		while.body.lr.ph: ; preds = %if.else
		%tmp2 = bitcast [128 x i32]* %b3 to i8*
		br label %while.body

		while.body: ; preds = %while.body.lr.ph, %while.body
		%x.addr.06 = phi i32 [ %x, %while.body.lr.ph ], [ %dec, %while.body ]
		%dec = add nsw i32 %x.addr.06, -1
		call void @llvm.lifetime.start(i64 512, i8* %tmp2) #3
		call void @inita(i32* %arraydecay3) #3
		call void @llvm.lifetime.end(i64 512, i8* %tmp2) #3
		%tobool2 = icmp eq i32 %dec, 0
		br i1 %tobool2, label %if.end.loopexit, label %while.body

		if.end.loopexit: ; preds = %while.body
		br label %if.end

		if.end: ; preds = %if.end.loopexit, %if.else, %if.then
		call void @llvm.lifetime.end(i64 512, i8* %tmp1) #3
		call void @llvm.lifetime.end(i64 512, i8* %tmp) #3
		ret i32 0
		}

		declare void @inita(i32*) #2

declare void @bar([100 x i32]* , [100 x i32]*) nounwind		declare void @bar([100 x i32]* , [100 x i32]*) nounwind

declare void @llvm.lifetime.start(i64, i8* nocapture) nounwind		declare void @llvm.lifetime.start(i64, i8* nocapture) nounwind

declare void @llvm.lifetime.end(i64, i8* nocapture) nounwind		declare void @llvm.lifetime.end(i64, i8* nocapture) nounwind

declare i32 @foo(i32, i8*)		declare i32 @foo(i32, i8*)

test/CodeGen/X86/misched-aa-colored.ll

	Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define hidden { %"class.llvm::SDNode.10.610.970.1930.2050.2290.4090", i32 } @_ZN4llvm16DAGTypeLegalizer18WidenVecRes_BinaryEPNS_6SDNodeE(%"class.llvm::DAGTypeLegalizer.117.717.1077.2037.2157.2397.4197" %this, %"class.llvm::SDNode.10.610.970.1930.2050.2290.4090"* %N) #2 align 2 {			define hidden { %"class.llvm::SDNode.10.610.970.1930.2050.2290.4090", i32 } @_ZN4llvm16DAGTypeLegalizer18WidenVecRes_BinaryEPNS_6SDNodeE(%"class.llvm::DAGTypeLegalizer.117.717.1077.2037.2157.2397.4197" %this, %"class.llvm::SDNode.10.610.970.1930.2050.2290.4090"* %N) #2 align 2 {
	entry:			entry:
	%Op.i43 = alloca %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083", align 8			%Op.i43 = alloca %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083", align 8
	%ref.tmp.i = alloca %"struct.std::pair.112.119.719.1079.2039.2159.2399.4199", align 8			%ref.tmp.i = alloca %"struct.std::pair.112.119.719.1079.2039.2159.2399.4199", align 8
	%Op.i = alloca %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083", align 8			%Op.i = alloca %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083", align 8
	%0 = bitcast %"struct.std::pair.112.119.719.1079.2039.2159.2399.4199"* %ref.tmp.i to i8*			%0 = bitcast %"struct.std::pair.112.119.719.1079.2039.2159.2399.4199"* %ref.tmp.i to i8*
				call void @llvm.lifetime.start(i64 24, i8* %0) #1
	%retval.sroa.0.0.idx.i36 = getelementptr inbounds %"struct.std::pair.112.119.719.1079.2039.2159.2399.4199", %"struct.std::pair.112.119.719.1079.2039.2159.2399.4199"* %ref.tmp.i, i64 0, i32 1, i32 0, i32 0			%retval.sroa.0.0.idx.i36 = getelementptr inbounds %"struct.std::pair.112.119.719.1079.2039.2159.2399.4199", %"struct.std::pair.112.119.719.1079.2039.2159.2399.4199"* %ref.tmp.i, i64 0, i32 1, i32 0, i32 0
	%retval.sroa.0.0.copyload.i37 = load i32, i32* %retval.sroa.0.0.idx.i36, align 8			%retval.sroa.0.0.copyload.i37 = load i32, i32* %retval.sroa.0.0.idx.i36, align 8
	call void @llvm.lifetime.end(i64 24, i8* %0) #1			call void @llvm.lifetime.end(i64 24, i8* %0) #1
	%agg.tmp8.sroa.2.0.copyload = load i32, i32* undef, align 8			%agg.tmp8.sroa.2.0.copyload = load i32, i32* undef, align 8
	%1 = bitcast %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083"* %Op.i to i8*			%1 = bitcast %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083"* %Op.i to i8*
	call void @llvm.lifetime.start(i64 16, i8* %1) #1			call void @llvm.lifetime.start(i64 16, i8* %1) #1
	%2 = getelementptr %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083", %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083"* %Op.i, i64 0, i32 1			%2 = getelementptr %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083", %"class.llvm::SDValue.3.603.963.1923.2043.2283.4083"* %Op.i, i64 0, i32 1
	store i32 %agg.tmp8.sroa.2.0.copyload, i32* %2, align 8			store i32 %agg.tmp8.sroa.2.0.copyload, i32* %2, align 8
	Show All 24 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Rework/enhance stack coloring data flow analysis.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 57378

lib/CodeGen/StackColoring.cpp

test/CodeGen/X86/StackColoring.ll

test/CodeGen/X86/misched-aa-colored.ll

Rework/enhance stack coloring data flow analysis.
ClosedPublic