This is an archive of the discontinued LLVM Phabricator instance.

[Greedy regalloc] Replace analyzeSiblingValues with something new
ClosedPublic

Authored by wmi on Dec 7 2015, 1:51 PM.

Download Raw Diff

Details

Reviewers

qcolombet
• tstellarAMD

Commits

rG9a16d655c718: Recommit r265547, and r265610,r265639,r265657 on top of it, plus two fixes with…
rG18293bef4e4e: Recommit r265309 after fixed an invalid memory reference bug happened when…
rGffbc9c7f3bd9: Replace analyzeSiblingValues with new algorithm to fix its compile time issue.
rL265547: Recommit r265309 after fixed an invalid memory reference bug happened
rL265309: Replace analyzeSiblingValues with new algorithm to fix its compile

Summary

The change is to solve PR17409 (https://llvm.org/bugs/show_bug.cgi?id=17409) and its several duplicates. The change is divided into three parts for easier review.

The major issue of analyzeSiblingValues is when a virtreg is splitted to N siblings with the same original VNI, and there are some PHIDefs generated in its splitting, the VNInfo of every sibling will be added to the Dependents of all other siblings, which creates a NxN network. traceSiblingValue is propagating SibValue info through this NxN network so it has NxN time complexity. In addition, selectOrSplit is called for all the N siblings sequentially. When reg pressure is high, a large percentage of siblings will be spilled (let's suppose N/2 siblings will be spilled), and traceSiblingValue will be called N/2 times indirectly from selectOrSplit, then there will be N^3 time complexity in total.

analyzeSiblingValues has two major usages: One is to figure out the SibValueInfo::SpillVNI of the virtReg to be spill so the spill can be hoisted to the place after SpillVNI->def and redundent spills are eliminated at the same time. The other is to trace the sibling copies back to the original value so the computation of the original value can be used for rematerialization. We replace analyzeSiblingValues by reimplementing these functionalities in Part1 and Part2.

Part1:
Instead of figure out the place to hoist spill for each virtReg to be spilled, we do that all at once when allocatePhysRegs is done. With all spills in place, we group spills with the same values (having the same OrigVNI). For each group of the spills with equal values, first we remove redundent spills dominated by other spill in the group, then traverse the dominate tree in post-order and hoist the spills to less frequent dominate tree node. Since spill can be hoisted to a cold dominate tree node without any sibling's VNI->def in the node, it can be better than the original implementation.

I didn't follow Jakub's proposal in pr17409 to change hoistCopiesForSize because redundent backcopies seen in one round of splitting is limited. Suppose Vreg1 is splitted to Vreg2, Vreg3 and Vreg4 in the first round of splitting, and Vreg4 is further splitted to Vreg5 and Vreg6 in the second round of splitting, the redundent backcopies between Vreg2=Vreg3 and Vreg5=Vreg6 cannot be found (I caught more than 100 such cases in llvm testsuites which left redundent spills in the final asm code)

Part2:
To find out the computation of the original value for rematerialization, we always query the inst at OrigVNI->def. To handle the case that the inst at OrigVNI->def has been removed during rematerialization, we change rematerialization to not delete the inst at OrigVNI->def even if it is already dead. In stead, we change the dest vreg to a new vreg (The new vreg will not be reg allocated so it will not affect the allocation of other vregs), save the inst to a set named as DeadRemats, and shrink the original dest vreg in the same way as previous. The insts in DeadRemats will be removed after allocatePhysRegs is done.

Part3:
The Part3 of the change is to clean up the code related with analyzeSiblingValues.

Test with all three parts combined on x86_64-linux-gnu:

The compile time for 1.c in pr24618 dropped from 0.34s to 0.25s The compile time for interpreter_goto_table_impl.ii in pr24618 dropped from 176.80s to 66.86s. I cannot verify the patch using tests in pr17409 because most bugs related with asan/ubsan have been workaround on sanitizer side.

llvm testsuites. Perf: mostly neutral except one perf regression I havn't addressed: SingleSource/Benchmarks/Misc/mandel. The reason has been understood. It is because we didn't do local spill hoist which the original implementation did. The usage of the local hoist is described in the comment in propagateSiblingValue (Starting with // This is an alternative def earlier in the same MBB.) The local hoist cannot be done after allocatePhysRegs so I will address it in a separate patch. CC time: MultiSource/Applications/sqlite3/sqlite3 has 1.5% improvement steadily.

I havn't cleaned up the llvm unit tests because I expect there will be many changes to the patches during the review. I will clean them up later.

Diff Detail

Repository: rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

wmi updated this revision to Diff 42080.Dec 7 2015, 1:51 PM

Herald added subscribers: qcolombet, MatzeB. · View Herald TranscriptDec 7 2015, 1:51 PM

Change std::set to SmallPtrSet and DenseSet.
Improve comments.

Hi Wei,

Thanks for working on this.

I haven’t looked into it yet, but I wanted to let you know this is on my todolist.

Thanks for your patience,
-Quentin

Hi Wei,

I had a quick look at the patch and although I believe it does the job done, I do not think this is the way to fix the problem.
My understanding is that you are trading an expensive value tracking mechanism with a fancy, but apparently less expensive, spill placement mechanism.

I think it is not the right approach because this creates a (bigger) gap between the cost model of the register allocator and the actual spill cost. Indeed, the cost model of the register allocator, w.r.t. spill cost, is basically reload before the uses, spill after the definitions.
In other word, it is better to keep the spiller simple but be smarter on the splitting of live-ranges so that the register allocator takes the right spilling/splitting decisions. That way, sharing spills/reloads will come naturally without to do anything in the spiller plus we may get better copies placement.

This was also, I believe, what Jakob had in mind when he described a solution in PR17409.

Concretely, what you should do:
0. Create a baseline for performance comparisons without any changes

Add an option to disable the InlineSpiller::analyzeSiblingValues, say -disable-spill-analyze-sibvalue
Benchmark with (-mllvm) -disable-spill-analyze-sibvalue (-mllvm) -split-spill-mode=size (we may want to use “speed" and improve that splitting mode instead of “size")
Investigate the regressions and/or file PRs
Fix the regressions

At this point, the new (or size) split mode should be better or equivalent to the baseline and we can just kill analyzeSiblingValue and make that new mode the default.

Hope that helps.

Cheers,
-Quentin

This revision now requires changes to proceed.Jan 8 2016, 3:00 PM

Thank you for the review.

I think it is not the right approach because this creates a (bigger) gap between the cost model of the register allocator and the actual spill cost. Indeed, the cost model of the register allocator, w.r.t. spill cost, is basically reload before the uses, spill after the definitions.
In other word, it is better to keep the spiller simple but be smarter on the splitting of live-ranges so that the register allocator takes the right spilling/splitting decisions. That way, sharing spills/reloads will come naturally without to do anything in the spiller plus we may get better copies placement.

This was also, I believe, what Jakob had in mind when he described a solution in PR17409.

I have some different ideas here because of the following three things:

I tried Jakob's solution when I just started to work on this problem, but quickly I realized a fundamental problem there compared with the existing way using InlineSpiller::analyzeSiblingValues. For Jakob's solution, it can only see the spills generated from siblings splitted from current VirtReg, .i.e, in current round of selectOrSplit. Let's say current VirtReg R1 is splitted into siblings: R2, R3, R4 (Suppose R2 is for the remainder interval). Suppose R4 is further splitted in the next round of selectOrSplit and a new remainder interval R5 is generated. If a spill for R2 dominate a spill for R5, Jakob's solution cannot remove the spill for R5 because they are generated in different rounds of selectOrSplit. InlineSpiller::analyzeSiblingValues doesn't have the issue and that is exactly why InlineSpiller::analyzeSiblingValues has to pay so much cost to track the siblings with equal values.

To evaluate how serious the problem above is, I did some experiment using Jakob's solution. I wrote a sanity check to caught redundent spills left over in the final stage (To make sure no later phase will clean up the redundent spills) because of the issue above, and I caught such cases in about 100 files when building llvm testsuite. And when I had the solution in this patch ready, I also used the sanity check and have ensured all those redundent spills have been cleaned up.

In Jakob's solution, the spills sharing work is done mostly in func SplitEditor::hoistCopiesForSize. hoistCopiesForSize is called inside SplitEditor::finish (last step of splitting) .i.e, it hoist spills and removes redundent spills after the splitting decision has been done for the current VirtReg. So for Jakob's solution, it has the same cost model issue as what you described here.

For my solution, the major func is called after RegAllocBase::allocatePhysRegs. That is to say it keeps the reload/spills in the original places (reload before the uses, spill after the definitions) during RegAllocBase::allocatePhysRegs, and only try to share/hoist spills after most of the regalloc work is done. So I think my solution seems to be more close to your idea here. It is like a cleanup work after regalloc, which has simpler logic and clearer impact which will be easier for performance tuning. In comparison, Jakob mentioned in PR17409 that enabling either split spill mode is going to affect the live range splitting algorithm a lot, and somebody has to track down the regressions and fix them.

Concretely, what you should do:
0. Create a baseline for performance comparisons without any changes

Add an option to disable the InlineSpiller::analyzeSiblingValues, say -disable-spill-analyze-sibvalue

Benchmark with (-mllvm) -disable-spill-analyze-sibvalue (-mllvm) -split-spill-mode=size (we may want to use “speed" and improve that splitting mode instead of “size")

Investigate the regressions and/or file PRs

Fix the regressions

At this point, the new (or size) split mode should be better or equivalent to the baseline and we can just kill analyzeSiblingValue and make that new mode the default.

Actually when I was working on the patch, I followed your steps here to improve it gradually by comparing with existing implementation and fixing regressions.

Another thing is: No matter which solution is adopted in the end, Part2 is needed because of there is no way to get DefMI for rematerialization after removing InlineSpiller::analyzeSiblingValues.

Thanks,
Wei.

Hi Wei,

Couple of comments:

Out of curiosity, do you have numbers of how many redundant spills we have for the current solution?
I am not saying the live-range splitting mechanism is perfect, but it fits nicely to the exiting framework, in particular w.r.t. the way we model the spill cost. Which leads me to #2.

hoistCopiesForSize is for Copies AFAIR, i.e., split points not spills. That means that, IIRC, we have the save cost model, since we do not care about split insertion in the cost.

I agree the spill hoisting thing is more like a post reg alloc phase. Therefore, I would rather have it a post regalloc pass instead of embedded in the spiller. That being said, I understand this is easier to directly work with the virtual registers.

To summarize, this is fine to have a the spill hoisting optimization where you put it. I believe though that making the splitting smarter would be the first logical step to mitigate the need for such optimization and I am still concerned that it may be bad for compile time.

If you’d like to pursue into that direction anyway, it is okay, just couple of inlined comments.

As for path part 2, it does not do anything at the moment does? (I.e., we clear the set before walking through it).

Thanks,
-Quentin

include/llvm/CodeGen/VirtRegMap.h
66 ↗	(On Diff #44027)	Is there a way this could be computed from the split map?
lib/CodeGen/InlineSpiller.cpp
172	Can set private, right?
173	Ditto.
175	Ditto.
179	Ditto.
lib/CodeGen/Spiller.h
34	Call this method postOptimization and make it a non-abstract method. We do not want the spillers existing out of tree to have to add a default implementation whereas they do not need to do anything.

#1

Out of curiosity, do you have numbers of how many redundant spills we have for the current solution?

I am not saying the live-range splitting mechanism is perfect, but it fits nicely to the exiting framework, in particular w.r.t. the way we model the spill cost. Which leads me to #2.

I did that experiment -- trying to catch redundent spills in current analyzeSiblingValues solution. I did catch some in llvm testsuite(~20, I don't remember exactly), but most of them are left there because of the HoistCondition checking in func propagateSiblingValue, .i.e, because of HoistCondition, current solution left some not very important but fully redundent spills in the code.

#2

hoistCopiesForSize is for Copies AFAIR, i.e., split points not spills. That means that, IIRC, we have the save cost model, since we do not care about split insertion in the cost.

Could you elaberate what the save cost model mean here?

#3

I agree the spill hoisting thing is more like a post reg alloc phase. Therefore, I would rather have it a post regalloc pass instead of embedded in the spiller. That being said, I understand this is easier to directly work with the virtual registers.

To summarize, this is fine to have a the spill hoisting optimization where you put it. I believe though that making the splitting smarter would be the first logical step to mitigate the need for such optimization and I am still concerned that it may be bad for compile time.

I compared the compile time between Jakob's solution (-disable-spill-analyze-sibvalue + -split-spill-mode=size) and my solution for some motivational testcases. They are the same. I will do more careful tests on this side, like using spec.

If you’d like to pursue into that direction anyway, it is okay, just couple of inlined comments.

As for path part 2, it does not do anything at the moment does? (I.e., we clear the set before walking through it).

The set DeadRemats is used to record def instructions which should have been removed when they are found to be dead after rematerialization. However, the def may still be useful for rematerialization of other siblings (Note without DefMI setting in analyzeSiblingValues, for all the siblings with equal values, the original_register_VNI->def is the best place to query the value expression. If the original def is deleted, we have no place to query value expression for rematerialization of siblings in the following rounds of selectOrSplit). So we decide to keep the dead instructions in their original places during the whole lifetime of allocatePhysRegs and use the set of DeadRemats to hold them (Changing the dest reg to a new dummy reg which will never be added to NewVRegs, so the live range can be updated properly in the same way as before). Those dead defs in DeadRemats are deleted after allocatePhysRegs is done.

I noticed a weakness of my hoistSpill patch: When a redundent spill is deleted, the RHS register may become dead and its live range can be shrinked. However, hoistSpill is done after register assignment so it cannot utilize the benefit of live range shrinking caused by deleting redundent spills. I also caught some testcases producing non-optimal code because of it.

To solve it, the best way now I can think of is to combine -split-spill-mode=size (Jakob's solution) and the hoistSpill patch here. So common cases of redundent spills can be deleted by -split-spill-mode=size during register allocation and redundent spills generated from different splitting rounds will be cleaned up by hoistSpill patch here. This combined way generated the best code from my analysis of llvm testsuites.

About compile time, I used spec2006 C/C++ benchmarks to do the evaluation.
hoistSpill + -split-spill-mode=size compared with base: -0.70% compile time decrease on average.
hoistSpill + -split-spill-mode=size compared with -split-spill-mode=size only: +0.18% compile time increase on average.

performance is generally neutral for llvm-testsuite and google benchmarks. except SingleSource/Benchmarks/Misc/mandel's degradation caused by problem1 below.

I reevaluated performance for hoistSpill + -split-spill-mode=size for llvm testsuite and google benchmarks. Generally they are neutral compared with trunk, except SingleSource/Benchmarks/Misc/mandel. mandel degradation is caused by problem1 below.

Other changes:
Addressed Quentin's comments. Code reorganized -- add a HoistSpiller class. Fix some bugs when -regalloc=pbqp and -regalloc=basic are used. Fix unit tests and add new unit tests.

problems unaddressed:

Spill hoist inside BB. propagateSiblingValue has a good description about its benefit in the comment, and I found testcases generating non-optimal code without it. I plan to address it separately.

I deleted CodeGen/AArch64/aarch64-deferred-spilling.ll but I havn't got a good replacement for it. With the hoistSpill + -split-spill-mode=size patch, the pattern checked by the test doesn't appear anymore, and the test is relatively large so it is not easy to look closely whether it is just transformed to another appearance. I did see many cases that defer spills can get phyregs in the end and I got a few small testcases on x86. However, those are still somewhat fragile -- I found when I changed regalloc a little bit, the pattern disappeared.

Thanks,
Wei.

Herald added subscribers: jyknight, srhines, danalbert, tberghammer. · View Herald TranscriptJan 26 2016, 10:20 AM

wmi marked 6 inline comments as done.Jan 26 2016, 10:26 AM

wmi added inline comments.

include/llvm/CodeGen/VirtRegMap.h
66 ↗	(On Diff #44027)	Yes, I removed Virt2SiblingsMap and computed it from split map.

Add SM_Speed split mode and use it as default. SM_Size sometimes will hoist spills from cold region in inner loop to hot region in outer loop, which is bad for performance. SM_Speed will only try to hoist spills from hot region to cold region. If it fails to hoist all the spills to a cold place, step back and remove spills dominated by others.

Compare "hoistSpill + split-spill-mode=speed" with "hoistSpill + split-spill-mode=size", an internal benchmark gets 1.5% improvement. llvm testsuite has no perf change.

Hi Mikhail,

Could you check how this patch impacts our performance?

Thanks,
-Quentin

Hi Mikhail,
Could you check how this patch impacts our performance?

Thanks. This is the patch "hoistspill + split-mode-speed [Part1] + redo rematerialize [Part2] + remove analyzeSiblingValues [Part3]" merged together, which I used to do the performance testing.
// I divided the patches for easier review. To merge the separate patches together, it needs to resolve some conflicts.

Ping.

Hi Wei,

Could you rebase your patches, they do not apply cleanly for me.

Also, while I am here, a couple of inlined comments :).

Cheers,
-Quentin

include/llvm/CodeGen/LiveRangeEdit.h
148	This seems strange that the API allows to drop some of the new registers. At the very least, we should document (i.e., put explanatory comments) why this is useful and why it is okay to drop such references. In general, it should not.
184	Instead of having at additional field which may be the same as ParentVNI in a lot of cases, this one could be computed. Then, if this has some performance problem, we can think of a better caching mechanism.
235–239	Replace 'it' by this live interval or something. The context is now high in the source file and repeating it wouldn't hurt IMO.
lib/CodeGen/InlineSpiller.cpp
56	Since this class does not inherit from Spiller, what about naming it HoistSpillHelper or something.

I am rebasing the patch now.

Wei.

Rebase the merged patch to r262808. Patch verified using llvm testsuite on x86-64 and qemu-aarch64.

Hi Wei,

The benchmarks are still running, but so far, the numbers look good.

Anyhow, I finally made time for the review.

Generally speaking this looks almost good to me. The quadratic behavior of the first loop in runHoistAllSpills scares me and we need to look for a better alternative.
Moreover we need better comments for the APIs.
Finally, the tests change with more moves are worrisome. Could you explain why this happens and how we will fix that?
It seems to me we are choosing an insertion point for the store that happens too late.

I have highlighted all my concerns with the inline comments.

Cheers,
-Quentin

include/llvm/CodeGen/LiveRangeEdit.h
123	Put that as at the end of the list with nullptr as default parameter.
221	Maybe just say that DeadRemats is an optional field. Mentioning Greedy here does not bring any value IMO.
lib/CodeGen/InlineSpiller.cpp
74	mergeable
79	Do not repeat the name of the field in the comment.
81	[…] as the source (instead of RHS) of the new ..
82	How big are the sets? I would expect very few siblings on average and was wondering if a SmallSetVector or SmallSet would be more appropriate.
85	Please use reference for values that cannot be nullptr. I.e., OrigVNI and BB.
90	SpillsToKeep
213	Hid the instantiation of the hoist spiller helper in the inliner spiller. The positive side effect is that we won’t leak the memory!
223	Hid the call to the hoist spiller helper in the inliner spiller.
998	.i.e => i.e.
998	Use doxygen style comment.
1010	DenseSet does not guarantee that the iteration order is stable from one run to another, does it? Although we should have several siblings live at the same time, this is theoretically possible. In other words, we should use a container that has a deterministic iteration order for reproducibility. See the earlier suggestion I made.
1024	This method would benefit a more verbose comment. Maybe something along the line: Starting from \p Root find a top-down traversal order of the dominator tree to visit all basic blocks containing the elements of \p Spills. Redundent spills will be found and put into \p SpillsToRm at the same time. \p SpillBBToSpill will populated as part of the process and maps a basic block to the first store occurring in the basic block. \post SpillToRm.union(Spills@post) == Spills@pre What is the usage of SpillsToKept? In particular the unsigned part? Should we consider to moving some of the arguments to field of the current hoistspill instance?
1024	This method does a bunch of things! Although I understand we want to share the logic that does the traversal and such, I found that it makes the code harder to read. I’d say as it is now and with more comment like I suggested, this is fine, but in general I rather have a better separation of concerns then try to optimize if it turns out to be problematic. I am guessing you already went through that process, we are just lacking the history :).
1043	Any chance this could be updated when we insert the spill? Like I said, it just feels like getVisitOrders does too many things.
1051	Please document what is WorkSet supposed to contain.
1053	I think we should describe what is the expected root, because it seems strange to me that we don’t just take the node for the Root.
1065	More comments, e.g., Node dominates Block and already store the value. This store is redundant.
1078	Ok, found the meaning of the unsigned elsewhere… A comment here as well would be great.
1098	Assert Orders.size == WorkSet.size?
1120	We do not insert the original store, it is already there, right?
1135	I believe we usually use bottom-up instead of bottom-top.
1137	have
1174	If the subtrees get big, we will end up recomputing this cost a bunch of time. Could it be something we keep alongside the subtree?
1181	We could add a mode for the hoist spiller, where code size is the priority. I.e., always hoist when SpillsInSubTree.size > 1 A follow-up patch is fine.
1208	typo.
1210	Variable must start with a capital letter. Also why use ent for the name?
1218	Explain the general algorithm here.
1234	This loop scares me. Any chance this information can be built as we insert spill.
1242	empty
lib/CodeGen/RegAllocGreedy.cpp
401	I would have put this into the base class.
2571	spiller().postOptimization()
2597	Should be created within the inline spiller. See my comment on createInlineSpiller.
lib/CodeGen/RegAllocPBQP.cpp
130	This should be a separate patch.
156	Ditto.
731	We shouldn’t have to touch this.
lib/CodeGen/RegisterCoalescer.cpp
463	We shouldn’t have to touch this.
lib/CodeGen/Spiller.h
38	add a bool here that default to false for using a post optimization.
46	Get rid of those.
lib/CodeGen/SplitKit.cpp
727	Please don’t repeat the comment from the declaration.
742	i.e.
749	We should we start iterating with the next iterator instead of starting over. The next call to count should early continue the loop but still!
1134	Add a comment saying that hoistCopies will behave differently between size and speed, otherwise it feels like those modes are the same.
lib/CodeGen/SplitKit.h
333	Don’t repeat the name of the method.
335	Given how this is used, the actual name of this method should be computeRedundentBackCopies.
338	Ditto.
test/CodeGen/X86/vselect-minmax.ll
4898	Why is this happening?
7620	Ditto.

This revision now requires changes to proceed.Mar 15 2016, 5:37 PM

Quentin, I addressed most of your comments.

Major changes or changes needs to pay attention to:

Added the patch that hoist Spill inside of BB to earlier place when the src of the spill is killed. It is done in InlineSpiller::hoistSpillInsideBB. With this part of change, some test changes are removed.

I am not sure I made the exact change as you expect about where to put postOptimization.

I found there was a comment in my previous patch saying DeadRemat is non-null only when regalloc is Greedy. It was wrong. All kinds of register allocator share the same InlineSpiller logic, so DeadRemat and original eliminateDeadRemats (Now it is put into RegAllocBase::postOptimization and RegAllocPBQP::postOptimization) are also necessary for PBQP and Basic.

I also noticed hoistCopies can be improved further. I plan to address it in the following patch.
With HoistSpillHelper, We still need hoistCopies when split-spill-mode=Speed because after removing some redundent spills, the sources of those spills may be shrinked. But when I addressed the review comments, I also found there is case that removing redundent spills not only cannot shrink the source of redundent spills, but also lengthen the live range of dst of those spills. Since we don't depend on hoistCopies to remove redundent spills (HoistSpillHelper can do that work better), we can change hoistCopies to remove spills only when it can shrink the source of redundent spills or at least not lengthen the live range of dst of spills. I can possibly do that by removing hoistCopies and extend hoistSpillInsideBB to handle cases across BBs -- to hoist spill only when its source is killed.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptMar 21 2016, 9:45 AM

wmi added inline comments.Mar 21 2016, 9:46 AM

lib/CodeGen/SplitKit.cpp
742	Fixed.

wmi added inline comments.Mar 21 2016, 9:46 AM

include/llvm/CodeGen/LiveRangeEdit.h
123	Fixed.
148	I Added comments to explain it. In short, we don't want to allocate phys register for the dummy register used as temporary dst register of instruction in DeadRemats set.
184	You are right. I don't have to save OrigVNI in struct Remat. Instead, add a parameter VNInfo *OrigVNI for LiveRangeEdit::canRematerializeAt.
221	Fixed.
235–239	Fixed.
lib/CodeGen/InlineSpiller.cpp
79	Fixed
81	Fixed
82	Most of the cases the size of it is less than 16 I guess, so I use SmallSetVector instead. I cannot use SmallSet because the set needs to be iterated.
85	Fixed.
90	Fixed.
213	Make HoistSpillHelper a field in inline spiller.
223	I Added postOptimization as a pure virtual func in class Spiller, and put the code of hoist spiller helper inside of InlineSpiller::postOptimization.
998	Fixed.
998	Fixed.
1010	Yes, when turn on split-mode-size, after hoistCopies, it is possible that several siblings live at the same time. it is not just therotically possible but realistic. I use SmallSetVector instead here.
1024	Thanks for your example comment. It is good. I copy most of them to the code. Should we consider to moving some of the arguments to field of the current hoistspill instance? hoistSpillHelper now has the same lifetime as InlineSpiller instance, so the lifes of those arguments are much shorter than that. That is why I choose to keep them as func local objects.
1024	After separate part of the work into HoistSpillHelper::rmRedundentSpills and add more comments in the func body, it may make the code easier to read now.
1043	I separate the first part of the work to another func: HoistSpillHelper::rmRedundentSpills, so HoistSpiller::getVisitOrders is more focus on what its name describes.
1051	Done.
1053	Done.
1065	Done.
1078	Done.
1098	Done.
1120	Yes, that is right.
1135	Fixed.
1137	Fixed.
1174	I keep it along side the subtree in SpillsInSubTreeMap.
1181	Ok, I will do it in a follow-up patch.
1208	Fixed.
1210	Fixed.
1218	Done.
1234	I simply remove the inner loop. SlotToOrigReg map will become somewhat bigger, but not a lot.
1242	Fixed.
lib/CodeGen/RegAllocGreedy.cpp
401	Done.
2571	Done.
2597	Done.
lib/CodeGen/RegAllocPBQP.cpp
130	InlineSpiller is shared by all kinds of register allocator, so the DeadRemats logic is also needed by PBQP. If I separate the part out, I need to fix the related unit test.
731	My original comment that DeadRemats is non-empty only when the regalloc is Greedy is wrong. Actually, InlineSpiller and related Remat logic are shared by all register allocators. And RegAllocPBQP is not a subclass of RegAllocBase, so the code is needed.
lib/CodeGen/RegisterCoalescer.cpp
463	Fixed.
lib/CodeGen/Spiller.h
38	I don't get the intention to add a bool here. Is it used to guard post optimization? why it is needed?
46	Done.
lib/CodeGen/SplitKit.cpp
727	Comment removed.
749	Fixed.
1134	Added.
lib/CodeGen/SplitKit.h
333	Fixed.
335	Fixed.
338	Fixed.

• tstellarAMD added inline comments.Mar 21 2016, 10:28 AM

test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
24 ↗	(On Diff #51175)	This is change confusing to me, because if we only use 254 VGPRs then there shouldn't be any spills, but there are still spill instructions being emitted. It seems like this is probably a bug, but I will need to look at it more closely to see if it is an AMDGPU bug or a generic regalloc bug.

I noticed that even without my change, although compiler output "GCN:
NumVgprs is 256", when I looked at the trace of -debug-only=regalloc,
I found there were some VGPR unused.

Here is what I did:
~/workarea/llvm-r262808/dbuild/./bin/llc -march=amdgcn -mcpu=tahiti
-mattr=+vgpr-spilling -verify-machineinstrs <
~/workarea/llvm-r262808/src/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
-debug-only=regalloc >/dev/null 2>out1

Delete the trace from out1 before the section of "REGISTER MAP", then
execute the command below:
for ((i=0; i<256; i++)); do

grep "VGPR$i[^0-9]" out1 &>/dev/null
if [[ "$?" != "0" ]]; then
  echo VGPR$i
fi

done

The output is:
VGPR40
VGPR189
VGPR190

So even if the compiler says GCN: NumVgprs is 256, there are three
VGPRs never used.

Thanks,
Wei.

Hi Wei,

I believe we are almost done. Thanks for your work and patience on this.

There are mainly three items to address:

There are typos widely spread in the file; mergable -> mergeable, redundent -> redundant
Do not repeat method names on comment.
Fix on the test cases. See the inline comment.

As for the benchmarking, almost all the diffs came back as improvement of up to 7%! This is impressive.
The regressions seem like side effect, i.e., we generate less load pair in a few case, because the related spill slots are not next to each other anymore. This was luck previously.

Anyhow, looking forward for the final fix-ups.

Cheers,
-Quentin

include/llvm/CodeGen/LiveRangeEdit.h
78	Switch to SmallPtrSetImpl, this the size of the type is not relevant.
150	Don’t repeat the method name in the comment.
lib/CodeGen/InlineSpiller.cpp
76	Mergeable. Do a search, the typo is widely spread :).
332–333	Don’t repeat the method name.
974	Don’t repeat the name of the method.
985	More mergeable typos...
lib/CodeGen/LiveRangeEdit.cpp
383	Some update problem I believe.
lib/CodeGen/RegAllocBase.cpp
159	Capitale letter for the first letter of the variable name.
lib/CodeGen/RegAllocBase.h
88	Add virtual keyword. Subclasses may want to do additional things.
lib/CodeGen/RegAllocPBQP.cpp
728	Variables start with a capital letter.
lib/CodeGen/Spiller.h
31–32	Other spillers out-of-tree may exist and there is little interest in having them to implement a post optimization method if they do not need it. In other words, instead of a pure virtual method, do nothing for the default implementation.
38–46	I was thinking in case we want to test without the post-optimization. But I am fine if it is always enabled.
lib/CodeGen/SplitKit.h
335	Typo: redundant
test/CodeGen/X86/hoist-spill.ll
2	Make this a file check test.
116	Get rid of the attributes if they are not actually needed.
test/CodeGen/X86/new-remat.ll
13	Use opt -instnamer to get rid of the %[0-9]+ variables.

In D15302#379497, @wmi wrote:
I noticed that even without my change, although compiler output "GCN:
NumVgprs is 256", when I looked at the trace of -debug-only=regalloc,
I found there were some VGPR unused.

Here is what I did:
~/workarea/llvm-r262808/dbuild/./bin/llc -march=amdgcn -mcpu=tahiti
-mattr=+vgpr-spilling -verify-machineinstrs <
~/workarea/llvm-r262808/src/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
-debug-only=regalloc >/dev/null 2>out1

Delete the trace from out1 before the section of "REGISTER MAP", then
execute the command below:
for ((i=0; i<256; i++)); do
grep "VGPR$i[^0-9]" out1 &>/dev/null
if [[ "$?" != "0" ]]; then
  echo VGPR$i
fi
done

The output is:
VGPR40
VGPR189
VGPR190

So even if the compiler says GCN: NumVgprs is 256, there are three
VGPRs never used.

NumVgprs is the number of VGPRs that need to be allocated for the program, so the fact that there are gaps doesn't matter (though this is strange). If you use only register v255, you still need to allocate all 256 registers.

Fix all the comments Quentin suggested. Thanks for the careful review.

lib/CodeGen/InlineSpiller.cpp
76	Fixed.
332–333	Fixed.
974	All similar comments fixed.
985	All such typos Fixed.
lib/CodeGen/LiveRangeEdit.cpp
383	Fixed.
lib/CodeGen/RegAllocBase.cpp
159	Fixed.
lib/CodeGen/RegAllocBase.h
88	Fixed.
lib/CodeGen/RegAllocPBQP.cpp
728	Fixed.
lib/CodeGen/Spiller.h
31–32	Make sense. Fixed.
38–46	Ok, I leave it there for now.
lib/CodeGen/SplitKit.h
335	Fixed here and many other places.
test/CodeGen/X86/hoist-spill.ll
2	I felt the file check test was not as general as the above test, but filecheck can still work, so I switch to file check here.
116	Fixed.
test/CodeGen/X86/new-remat.ll
13	Fixed.

Hi Wei,

I think we will need to wait for Tom to double check what happened for AMDGPU.

One question though, this revision ended up being the combination of the 3 parts, right?

Cheers,
-Quentin

test/CodeGen/X86/hoist-spill.ll
3	You could check where the spills actually are. But it already looks pretty good now :).

So even if the compiler says GCN: NumVgprs is 256, there are three
VGPRs never used.

NumVgprs is the number of VGPRs that need to be allocated for the program, so the fact that there are gaps doesn't matter (though this is strange). If you use only register v255, you still need to allocate all 256 registers.

Hi Tom,

I found with my patch here, the Spill num for the testcase increases
from 68 to 152, and Reload num increases from 72 to 188. I havn't
throughly understood what is wrong here, but I can roughly describe
how the problem happen and say it may be a problem of local splitting,
instead of my patch.

In the testcase, there are roughly 64 VReg_128 vars overlapping with
each other consuming all the 256 VGPRs and some other scattered VGPR
uses. Each VReg_128 var occupies 4 consecutive VGPRs, so VGPR
registers are allocated in this way: vreg1: VGPR0_VGPR1_VGPR2_VGPR3;
vreg2: VGPR4_VGPR5_VGPR6_VGPR7; ......

Because we have some other scattered VGPR uses, we cannot allocate all
the 64 VReg_128 vars in register, so splitting is needed. region
splitting will not bring trouble because it only tries to fill holes,
i.e., vregs after the splitting usually will not evict other vregs.
local splitting can bring a lot of mess to the allocation here.
Suppose it tries to find a local gap inside BB to split vreg3
(VReg_128 type). After the local split is done, vreg3 will be splitted
into vreg3-1 and vreg3-2. vreg3-1 and vreg3-2 have short live ranges
so both of them have relatively larger weight. vreg3-1 may find a hole
and is allocated to VGPR2_VGPR3_VGPR4_VGPR5, then vreg3-2 will get a
hint of VGPR2_VGPR3_VGPR4_VGPR5 and will evict vreg1
(VGPR0_VGPR1_VGPR2_VGPR3) and vreg2 (VGPR4_VGPR5_VGPR6_VGPR7) above.
To find consecutive VGPRs for vreg1 and vreg2, reg alloc will do more
region splitting/local splitting and more evictions, and causes more
and more vregs hard to find consecutive VGPRs.

With my patch, it will add one more VReg_128 interval during splitting
because of hoisting (This is a separate problem I described in a TODO
about improving hoistCopies in previous reply). To allocate the
VReg_128 var, it triggers more region splitting and local splitting,
and makes more vars spilled.

To show the problem, I experimentally turn off local splitting for
trunk without my patch, the Spill num for the testcase drops from 68
to 56, and Reload num drops from 72 to 36. When turn off local
splitting for trunk with my patch, the Spill num for the testcase
drops from 152 to 24, and Reload num drops from 188 to 24.

So this is probably a separate issue for architecture using
consecutive combined registers for large data type.

Thanks,
Wei.

Hi Tom,

Do you think the issue is a blocker for this patch or a separated one?
Want to get your confirmation so I can decide how to push the work
forward.

As for using 254 VGPRs instead of 256 VGPRs, I think it just cannot
find 4 consecutive VGPRs for VReg_128 data. The holes in the end (v254
v255) have no difference with holes in the middle. Is it correct?

Thanks,
Wei.

I think your analysis is correct about why it doesn't use all 256 register. I actually hit this same thing in another patch I'm working on. I have to objections to this patch being pushed.

It turns out this was ready just in time: we just noticed that r263460 essentially undermines all of the work to avoid PR17409, and we now have widespread superlinear compile times with sanitizers (and possibly other code).

Just wanted to confirm with you Tom that this LGTM, and encourage Wei to go ahead and land it as soon as Tom acks. =D We have a *bunch* of stuff blocked on the compile time issues here.

One minor nit below.

lib/CodeGen/InlineSpiller.cpp
57	I get a warning saying this is unused when building with this patched in...

Thanks for the support of this patch. Looks like Tom's "to objection"
is a typo of "no objection". I will prepare to commit the patch.

Wei.

Fix my mistake introduced when I was addressing the review comments:

I accidentally remove the virtual keyword of postOptimization in lib/CodeGen/Spiller.h. It should not be a pure virtual function, but still should be virtual.

This will fix the warning Chandler saw.

Closed by commit rL265309: Replace analyzeSiblingValues with new algorithm to fix its compile (authored by wmi). · Explain WhyApr 4 2016, 9:48 AM

This revision was automatically updated to reflect the committed changes.

wmi retitled this revision from [Greedy regalloc] Replace analyzeSiblingValues with something new [Part1] to [Greedy regalloc] Replace analyzeSiblingValues with something new.Apr 4 2016, 9:49 AM

chandlerc mentioned this in rL265331: Revert r263460: [SpillPlacement] Fix a quadratic behavior in spill placement..Apr 4 2016, 12:03 PM

Hi Quentin,

Recently, I committed some bug fixes for the patch here without getting approvement first because I think they are relatively trivial. Please give them a postcommit review: http://reviews.llvm.org/D18934

There are another two fixes which are somewhat substantial, which I think needs to be reviewed before commit.
http://reviews.llvm.org/D18935
http://reviews.llvm.org/D18936

Thanks,
Wei.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

LiveRangeEdit.h

33 lines

lib/

CodeGen/

956 lines

58 lines

9 lines

8 lines

5 lines

38 lines

20 lines

RegisterCoalescer.cpp

4 lines

Spiller.h

9 lines

SplitKit.h

11 lines

SplitKit.cpp

92 lines

test/

CodeGen/

AArch64/

aarch64-deferred-spilling.ll

514 lines

ARM/

subreg-remat.ll

6 lines

SPARC/

spill.ll

13 lines

X86/

avx512-bugfix-25270.ll

2 lines

fold-push.ll

10 lines

hoist-spill.ll

115 lines

new-remat.ll

75 lines

ragreedy-hoist-spill.ll

9 lines

vselect-minmax.ll

28 lines

Diff 50643

include/llvm/CodeGen/LiveRangeEdit.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	public:
virtual void LRE_DidCloneVirtReg(unsigned New, unsigned Old) {}		virtual void LRE_DidCloneVirtReg(unsigned New, unsigned Old) {}

virtual ~Delegate() {}		virtual ~Delegate() {}
};		};

private:		private:
LiveInterval *Parent;		LiveInterval *Parent;
SmallVectorImpl<unsigned> &NewRegs;		SmallVectorImpl<unsigned> &NewRegs;
		SmallPtrSet<MachineInstr , 32> DeadRemats;
MachineRegisterInfo &MRI;		MachineRegisterInfo &MRI;
LiveIntervals &LIS;		LiveIntervals &LIS;
VirtRegMap *VRM;		VirtRegMap *VRM;
const TargetInstrInfo &TII;		const TargetInstrInfo &TII;
Delegate *const TheDelegate;		Delegate *const TheDelegate;

/// FirstNew - Index of the first register added to NewRegs.		/// FirstNew - Index of the first register added to NewRegs.
const unsigned FirstNew;		const unsigned FirstNew;

/// ScannedRemattable - true when remattable values have been identified.		/// ScannedRemattable - true when remattable values have been identified.
bool ScannedRemattable;		bool ScannedRemattable;

/// Remattable - Values defined by remattable instructions as identified by		/// Remattable - Values defined by remattable instructions as identified by
/// tii.isTriviallyReMaterializable().		/// tii.isTriviallyReMaterializable().
SmallPtrSet<const VNInfo*,4> Remattable;		SmallPtrSet<const VNInfo*,4> Remattable;
		qcolombetUnsubmitted Not Done Reply Inline Actions Switch to SmallPtrSetImpl, this the size of the type is not relevant. qcolombet: Switch to SmallPtrSetImpl, this the size of the type is not relevant.

/// Rematted - Values that were actually rematted, and so need to have their		/// Rematted - Values that were actually rematted, and so need to have their
/// live range trimmed or entirely removed.		/// live range trimmed or entirely removed.
SmallPtrSet<const VNInfo*,4> Rematted;		SmallPtrSet<const VNInfo*,4> Rematted;

/// scanRemattable - Identify the Parent values that may rematerialize.		/// scanRemattable - Identify the Parent values that may rematerialize.
void scanRemattable(AliasAnalysis *aa);		void scanRemattable(AliasAnalysis *aa);

Show All 20 Lines	private:
/// main live range of \p LI or in one of the matching subregister ranges.		/// main live range of \p LI or in one of the matching subregister ranges.
bool useIsKill(const LiveInterval &LI, const MachineOperand &MO) const;		bool useIsKill(const LiveInterval &LI, const MachineOperand &MO) const;

public:		public:
/// Create a LiveRangeEdit for breaking down parent into smaller pieces.		/// Create a LiveRangeEdit for breaking down parent into smaller pieces.
/// @param parent The register being spilled or split.		/// @param parent The register being spilled or split.
/// @param newRegs List to receive any new registers created. This needn't be		/// @param newRegs List to receive any new registers created. This needn't be
/// empty initially, any existing registers are ignored.		/// empty initially, any existing registers are ignored.
		/// @param deadRemats The collection of all the instructions defining an
		/// original reg and are dead after remat.
/// @param MF The MachineFunction the live range edit is taking place in.		/// @param MF The MachineFunction the live range edit is taking place in.
/// @param lis The collection of all live intervals in this function.		/// @param lis The collection of all live intervals in this function.
/// @param vrm Map of virtual registers to physical registers for this		/// @param vrm Map of virtual registers to physical registers for this
/// function. If NULL, no virtual register map updates will		/// function. If NULL, no virtual register map updates will
/// be done. This could be the case if called before Regalloc.		/// be done. This could be the case if called before Regalloc.
LiveRangeEdit(LiveInterval *parent, SmallVectorImpl<unsigned> &newRegs,		LiveRangeEdit(LiveInterval *parent, SmallVectorImpl<unsigned> &newRegs,
		SmallPtrSet<MachineInstr , 32> deadRemats,
		qcolombetUnsubmitted Not Done Reply Inline Actions Put that as at the end of the list with nullptr as default parameter. qcolombet: Put that as at the end of the list with nullptr as default parameter.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
MachineFunction &MF, LiveIntervals &lis, VirtRegMap *vrm,		MachineFunction &MF, LiveIntervals &lis, VirtRegMap *vrm,
Delegate *delegate = nullptr)		Delegate *delegate = nullptr)
: Parent(parent), NewRegs(newRegs), MRI(MF.getRegInfo()), LIS(lis),		: Parent(parent), NewRegs(newRegs), DeadRemats(deadRemats),
VRM(vrm), TII(*MF.getSubtarget().getInstrInfo()),		MRI(MF.getRegInfo()), LIS(lis), VRM(vrm),
TheDelegate(delegate), FirstNew(newRegs.size()),		TII(*MF.getSubtarget().getInstrInfo()), TheDelegate(delegate),
ScannedRemattable(false) {		FirstNew(newRegs.size()), ScannedRemattable(false) {
MRI.setDelegate(this);		MRI.setDelegate(this);
}		}

~LiveRangeEdit() override { MRI.resetDelegate(this); }		~LiveRangeEdit() override { MRI.resetDelegate(this); }

LiveInterval &getParent() const {		LiveInterval &getParent() const {
assert(Parent && "No parent LiveInterval");		assert(Parent && "No parent LiveInterval");
return *Parent;		return *Parent;
}		}
unsigned getReg() const { return getParent().reg; }		unsigned getReg() const { return getParent().reg; }

/// Iterator for accessing the new registers added by this edit.		/// Iterator for accessing the new registers added by this edit.
typedef SmallVectorImpl<unsigned>::const_iterator iterator;		typedef SmallVectorImpl<unsigned>::const_iterator iterator;
iterator begin() const { return NewRegs.begin()+FirstNew; }		iterator begin() const { return NewRegs.begin()+FirstNew; }
iterator end() const { return NewRegs.end(); }		iterator end() const { return NewRegs.end(); }
unsigned size() const { return NewRegs.size()-FirstNew; }		unsigned size() const { return NewRegs.size()-FirstNew; }
bool empty() const { return size() == 0; }		bool empty() const { return size() == 0; }
unsigned get(unsigned idx) const { return NewRegs[idx+FirstNew]; }		unsigned get(unsigned idx) const { return NewRegs[idx+FirstNew]; }
		void pop_back() { NewRegs.pop_back(); }
		qcolombetUnsubmitted Not Done Reply Inline Actions This seems strange that the API allows to drop some of the new registers. At the very least, we should document (i.e., put explanatory comments) why this is useful and why it is okay to drop such references. In general, it should not. qcolombet: This seems strange that the API allows to drop some of the new registers. At the very least, we…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I Added comments to explain it. In short, we don't want to allocate phys register for the dummy register used as temporary dst register of instruction in DeadRemats set. wmi: I Added comments to explain it. In short, we don't want to allocate phys register for the dummy…

ArrayRef<unsigned> regs() const {		ArrayRef<unsigned> regs() const {
		qcolombetUnsubmitted Not Done Reply Inline Actions Don’t repeat the method name in the comment. qcolombet: Don’t repeat the method name in the comment.
return makeArrayRef(NewRegs).slice(FirstNew);		return makeArrayRef(NewRegs).slice(FirstNew);
}		}

/// createEmptyIntervalFrom - Create a new empty interval based on OldReg.		/// createEmptyIntervalFrom - Create a new empty interval based on OldReg.
LiveInterval &createEmptyIntervalFrom(unsigned OldReg);		LiveInterval &createEmptyIntervalFrom(unsigned OldReg);

/// createFrom - Create a new virtual register based on OldReg.		/// createFrom - Create a new virtual register based on OldReg.
unsigned createFrom(unsigned OldReg);		unsigned createFrom(unsigned OldReg);
Show All 16 Lines	public:
/// checkRematerializable - Manually add VNI to the list of rematerializable		/// checkRematerializable - Manually add VNI to the list of rematerializable
/// values if DefMI may be rematerializable.		/// values if DefMI may be rematerializable.
bool checkRematerializable(VNInfo VNI, const MachineInstr DefMI,		bool checkRematerializable(VNInfo VNI, const MachineInstr DefMI,
AliasAnalysis*);		AliasAnalysis*);

/// Remat - Information needed to rematerialize at a specific location.		/// Remat - Information needed to rematerialize at a specific location.
struct Remat {		struct Remat {
VNInfo *ParentVNI; // parent_'s value at the remat location.		VNInfo *ParentVNI; // parent_'s value at the remat location.
MachineInstr *OrigMI; // Instruction defining ParentVNI.		VNInfo *OrigVNI; // ParentVNI.def may be a copy only. OrigVNI.def
explicit Remat(VNInfo *ParentVNI) : ParentVNI(ParentVNI), OrigMI(nullptr) {}		// contains the real expr for remat.
		qcolombetUnsubmitted Not Done Reply Inline Actions Instead of having at additional field which may be the same as ParentVNI in a lot of cases, this one could be computed. Then, if this has some performance problem, we can think of a better caching mechanism. qcolombet: Instead of having at additional field which may be the same as ParentVNI in a lot of cases…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions You are right. I don't have to save OrigVNI in struct Remat. Instead, add a parameter VNInfo OrigVNI for LiveRangeEdit::canRematerializeAt. wmi:* You are right. I don't have to save OrigVNI in struct Remat. Instead, add a parameter VNInfo…
		MachineInstr *OrigMI; // Instruction defining OrigVNI.
		explicit Remat(VNInfo ParentVNI, VNInfo OrigVNI)
		: ParentVNI(ParentVNI), OrigVNI(OrigVNI), OrigMI(nullptr) {}
};		};

/// canRematerializeAt - Determine if ParentVNI can be rematerialized at		/// canRematerializeAt - Determine if ParentVNI can be rematerialized at
/// UseIdx. It is assumed that parent_.getVNINfoAt(UseIdx) == ParentVNI.		/// UseIdx. It is assumed that parent_.getVNINfoAt(UseIdx) == ParentVNI.
/// When cheapAsAMove is set, only cheap remats are allowed.		/// When cheapAsAMove is set, only cheap remats are allowed.
bool canRematerializeAt(Remat &RM,		bool canRematerializeAt(Remat &RM,
SlotIndex UseIdx,		SlotIndex UseIdx,
bool cheapAsAMove);		bool cheapAsAMove);
Show All 15 Lines	void markRematerialized(const VNInfo *ParentVNI) {
Rematted.insert(ParentVNI);		Rematted.insert(ParentVNI);
}		}

/// didRematerialize - Return true if ParentVNI was rematerialized anywhere.		/// didRematerialize - Return true if ParentVNI was rematerialized anywhere.
bool didRematerialize(const VNInfo *ParentVNI) const {		bool didRematerialize(const VNInfo *ParentVNI) const {
return Rematted.count(ParentVNI);		return Rematted.count(ParentVNI);
}		}

		void markDeadRemat(MachineInstr *inst) {
		// For regallocs other than Greedy, DeadRemats is nullptr for now.
		if (DeadRemats)
		qcolombetUnsubmitted Not Done Reply Inline Actions Maybe just say that DeadRemats is an optional field. Mentioning Greedy here does not bring any value IMO. qcolombet: Maybe just say that DeadRemats is an optional field. Mentioning Greedy here does not bring any…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		DeadRemats->insert(inst);
		}

/// eraseVirtReg - Notify the delegate that Reg is no longer in use, and try		/// eraseVirtReg - Notify the delegate that Reg is no longer in use, and try
/// to erase it from LIS.		/// to erase it from LIS.
void eraseVirtReg(unsigned Reg);		void eraseVirtReg(unsigned Reg);

/// eliminateDeadDefs - Try to delete machine instructions that are now dead		/// eliminateDeadDefs - Try to delete machine instructions that are now dead
/// (allDefsAreDead returns true). This may cause live intervals to be trimmed		/// (allDefsAreDead returns true). This may cause live intervals to be trimmed
/// and further dead efs to be eliminated.		/// and further dead efs to be eliminated.
/// RegsBeingSpilled lists registers currently being spilled by the register		/// RegsBeingSpilled lists registers currently being spilled by the register
/// allocator. These registers should not be split into new intervals		/// allocator. These registers should not be split into new intervals
/// as currently those new intervals are not guaranteed to spill.		/// as currently those new intervals are not guaranteed to spill.
		/// NoSplit indicates it is used after the iterations of selectOrSplit and
		/// registers should not be split into new intervals.
void eliminateDeadDefs(SmallVectorImpl<MachineInstr*> &Dead,		void eliminateDeadDefs(SmallVectorImpl<MachineInstr *> &Dead,
ArrayRef<unsigned> RegsBeingSpilled = None);		ArrayRef<unsigned> RegsBeingSpilled = None,
		bool NoSplit = false);
		qcolombetUnsubmitted Not Done Reply Inline Actions Replace 'it' by this live interval or something. The context is now high in the source file and repeating it wouldn't hurt IMO. qcolombet: Replace 'it' by this live interval or something. The context is now high in the source file and…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.

/// calculateRegClassAndHint - Recompute register class and hint for each new		/// calculateRegClassAndHint - Recompute register class and hint for each new
/// register.		/// register.
void calculateRegClassAndHint(MachineFunction&,		void calculateRegClassAndHint(MachineFunction&,
const MachineLoopInfo&,		const MachineLoopInfo&,
const MachineBlockFrequencyInfo&);		const MachineBlockFrequencyInfo&);
};		};

}		}

#endif		#endif

lib/CodeGen/InlineSpiller.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
STATISTIC(NumSnippets, "Number of spilled snippets");		STATISTIC(NumSnippets, "Number of spilled snippets");
STATISTIC(NumSpills, "Number of spills inserted");		STATISTIC(NumSpills, "Number of spills inserted");
STATISTIC(NumSpillsRemoved, "Number of spills removed");		STATISTIC(NumSpillsRemoved, "Number of spills removed");
STATISTIC(NumReloads, "Number of reloads inserted");		STATISTIC(NumReloads, "Number of reloads inserted");
STATISTIC(NumReloadsRemoved, "Number of reloads removed");		STATISTIC(NumReloadsRemoved, "Number of reloads removed");
STATISTIC(NumFolded, "Number of folded stack accesses");		STATISTIC(NumFolded, "Number of folded stack accesses");
STATISTIC(NumFoldedLoads, "Number of folded loads");		STATISTIC(NumFoldedLoads, "Number of folded loads");
STATISTIC(NumRemats, "Number of rematerialized defs for spilling");		STATISTIC(NumRemats, "Number of rematerialized defs for spilling");
STATISTIC(NumOmitReloadSpill, "Number of omitted spills of reloads");
STATISTIC(NumHoists, "Number of hoisted spills");

static cl::opt<bool> DisableHoisting("disable-spill-hoist", cl::Hidden,		static cl::opt<bool> DisableHoisting("disable-spill-hoist", cl::Hidden,
cl::desc("Disable inline spill hoisting"));		cl::desc("Disable inline spill hoisting"));

namespace {		namespace {
		class HoistSpiller {
		qcolombetUnsubmitted Not Done Reply Inline Actions Since this class does not inherit from Spiller, what about naming it HoistSpillHelper or something. qcolombet: Since this class does not inherit from Spiller, what about naming it HoistSpillHelper or…
		MachineFunction &MF;
		chandlercUnsubmitted Not Done Reply Inline Actions I get a warning saying this is unused when building with this patched in... chandlerc: I get a warning saying this is unused when building with this patched in...
		LiveIntervals &LIS;
		LiveStacks &LSS;
		AliasAnalysis *AA;
		MachineDominatorTree &MDT;
		MachineLoopInfo &Loops;
		VirtRegMap &VRM;
		MachineFrameInfo &MFI;
		MachineRegisterInfo &MRI;
		const TargetInstrInfo &TII;
		const TargetRegisterInfo &TRI;
		const MachineBlockFrequencyInfo &MBFI;

		// Map from StackSlot to its original register.
		DenseMap<int, unsigned> StackSlotToReg;
		// Map from pair of (StackSlot and Original VNI) to a set of spills which
		// have the same stackslot and have equal values defined by Original VNI.
		// These spills are mergable and are hoist candiates.
		qcolombetUnsubmitted Not Done Reply Inline Actions mergeable qcolombet: mergeable
		typedef DenseMap<std::pair<int, VNInfo >, SmallPtrSet<MachineInstr , 16>>
		MergableSpillsMap;
		qcolombetUnsubmitted Not Done Reply Inline Actions Mergeable. Do a search, the typo is widely spread :). qcolombet: Mergeable. Do a search, the typo is widely spread :).
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		MergableSpillsMap MergableSpills;

		/// Virt2SibingsMap - This is the map from original register to a set
		qcolombetUnsubmitted Not Done Reply Inline Actions Do not repeat the name of the field in the comment. qcolombet: Do not repeat the name of the field in the comment.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed wmi: Fixed
		/// containing all its siblings. To hoist a spill to another BB, we need
		/// to find out a live sibling there and use it as the RHS of the new spill.
		qcolombetUnsubmitted Not Done Reply Inline Actions […] as the source (instead of RHS) of the new .. qcolombet: […] as the source (instead of RHS) of the new ..
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed wmi: Fixed
		DenseMap<unsigned, DenseSet<unsigned>> Virt2SiblingsMap;
		qcolombetUnsubmitted Not Done Reply Inline Actions How big are the sets? I would expect very few siblings on average and was wondering if a SmallSetVector or SmallSet would be more appropriate. qcolombet: How big are the sets? I would expect very few siblings on average and was wondering if a…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Most of the cases the size of it is less than 16 I guess, so I use SmallSetVector instead. I cannot use SmallSet because the set needs to be iterated. wmi: Most of the cases the size of it is less than 16 I guess, so I use SmallSetVector instead. I…

		bool isSpillCandBB(unsigned OrigReg, VNInfo OrigVNI, MachineBasicBlock BB,
		unsigned &LiveReg);
		qcolombetUnsubmitted Not Done Reply Inline Actions Please use reference for values that cannot be nullptr. I.e., OrigVNI and BB. qcolombet: Please use reference for values that cannot be nullptr. I.e., OrigVNI and BB.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		void getVisitOrders(
		MachineBasicBlock Root, SmallPtrSet<MachineInstr , 16> &Spills,
		SmallVectorImpl<MachineDomTreeNode *> &Orders,
		SmallVectorImpl<MachineInstr *> &SpillsToRm,
		DenseMap<MachineDomTreeNode *, unsigned> &SpillsToKept,
		qcolombetUnsubmitted Not Done Reply Inline Actions SpillsToKeep qcolombet: SpillsToKeep
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		DenseMap<MachineDomTreeNode , MachineInstr > &SpillBBToSpill);
		void runHoistSpills(unsigned OrigReg, VNInfo *OrigVNI,
		SmallPtrSet<MachineInstr *, 16> &Spills,
		SmallVectorImpl<MachineInstr *> &SpillsToRm,
		DenseMap<MachineBasicBlock *, unsigned> &SpillsToIns);

		public:
		HoistSpiller(MachineFunctionPass &pass, MachineFunction &mf, VirtRegMap &vrm)
		: MF(mf), LIS(pass.getAnalysis<LiveIntervals>()),
		LSS(pass.getAnalysis<LiveStacks>()),
		AA(&pass.getAnalysis<AAResultsWrapperPass>().getAAResults()),
		MDT(pass.getAnalysis<MachineDominatorTree>()),
		Loops(pass.getAnalysis<MachineLoopInfo>()), VRM(vrm),
		MFI(*mf.getFrameInfo()), MRI(mf.getRegInfo()),
		TII(*mf.getSubtarget().getInstrInfo()),
		TRI(*mf.getSubtarget().getRegisterInfo()),
		MBFI(pass.getAnalysis<MachineBlockFrequencyInfo>()) {}

		void addToMergableSpills(MachineInstr *Spill, int StackSlot,
		unsigned Original);
		bool rmFromMergableSpills(MachineInstr *Spill, int StackSlot);
		void hoistAllSpills(LiveRangeEdit &Edit);
		};

class InlineSpiller : public Spiller {		class InlineSpiller : public Spiller {
MachineFunction &MF;		MachineFunction &MF;
LiveIntervals &LIS;		LiveIntervals &LIS;
LiveStacks &LSS;		LiveStacks &LSS;
AliasAnalysis *AA;		AliasAnalysis *AA;
MachineDominatorTree &MDT;		MachineDominatorTree &MDT;
MachineLoopInfo &Loops;		MachineLoopInfo &Loops;
VirtRegMap &VRM;		VirtRegMap &VRM;
Show All 14 Lines	class InlineSpiller : public Spiller {

// All COPY instructions to/from snippets.		// All COPY instructions to/from snippets.
// They are ignored since both operands refer to the same stack slot.		// They are ignored since both operands refer to the same stack slot.
SmallPtrSet<MachineInstr*, 8> SnippetCopies;		SmallPtrSet<MachineInstr*, 8> SnippetCopies;

// Values that failed to remat at some point.		// Values that failed to remat at some point.
SmallPtrSet<VNInfo*, 8> UsedValues;		SmallPtrSet<VNInfo*, 8> UsedValues;

public:
// Information about a value that was defined by a copy from a sibling
// register.
struct SibValueInfo {
// True when all reaching defs were reloads: No spill is necessary.
bool AllDefsAreReloads;

// True when value is defined by an original PHI not from splitting.
bool DefByOrigPHI;

// True when the COPY defining this value killed its source.
bool KillsSource;

// The preferred register to spill.
unsigned SpillReg;

// The value of SpillReg that should be spilled.
VNInfo *SpillVNI;

// The block where SpillVNI should be spilled. Currently, this must be the
// block containing SpillVNI->def.
MachineBasicBlock *SpillMBB;

// A defining instruction that is not a sibling copy or a reload, or NULL.
// This can be used as a template for rematerialization.
MachineInstr *DefMI;

// List of values that depend on this one. These values are actually the
// same, but live range splitting has placed them in different registers,
// or SSA update needed to insert PHI-defs to preserve SSA form. This is
// copies of the current value and phi-kills. Usually only phi-kills cause
// more than one dependent value.
TinyPtrVector<VNInfo*> Deps;

SibValueInfo(unsigned Reg, VNInfo *VNI)
: AllDefsAreReloads(true), DefByOrigPHI(false), KillsSource(false),
SpillReg(Reg), SpillVNI(VNI), SpillMBB(nullptr), DefMI(nullptr) {}

// Returns true when a def has been found.
bool hasDef() const { return DefByOrigPHI \|\| DefMI; }
};

private:
// Values in RegsToSpill defined by sibling copies.
typedef DenseMap<VNInfo*, SibValueInfo> SibValueMap;
SibValueMap SibValues;

// Dead defs generated during spilling.		// Dead defs generated during spilling.
SmallVector<MachineInstr*, 8> DeadDefs;		SmallVector<MachineInstr*, 8> DeadDefs;

		// Object records spills information and does the hoisting.
		HoistSpiller *HSpiller;

~InlineSpiller() override {}		~InlineSpiller() override {}

public:		public:
InlineSpiller(MachineFunctionPass &pass, MachineFunction &mf, VirtRegMap &vrm)		InlineSpiller(MachineFunctionPass &pass, MachineFunction &mf, VirtRegMap &vrm)
: MF(mf), LIS(pass.getAnalysis<LiveIntervals>()),		: MF(mf), LIS(pass.getAnalysis<LiveIntervals>()),
LSS(pass.getAnalysis<LiveStacks>()),		LSS(pass.getAnalysis<LiveStacks>()),
AA(&pass.getAnalysis<AAResultsWrapperPass>().getAAResults()),		AA(&pass.getAnalysis<AAResultsWrapperPass>().getAAResults()),
MDT(pass.getAnalysis<MachineDominatorTree>()),		MDT(pass.getAnalysis<MachineDominatorTree>()),
Loops(pass.getAnalysis<MachineLoopInfo>()), VRM(vrm),		Loops(pass.getAnalysis<MachineLoopInfo>()), VRM(vrm),
MFI(*mf.getFrameInfo()), MRI(mf.getRegInfo()),		MFI(*mf.getFrameInfo()), MRI(mf.getRegInfo()),
TII(*mf.getSubtarget().getInstrInfo()),		TII(*mf.getSubtarget().getInstrInfo()),
TRI(*mf.getSubtarget().getRegisterInfo()),		TRI(*mf.getSubtarget().getRegisterInfo()),
MBFI(pass.getAnalysis<MachineBlockFrequencyInfo>()) {}		MBFI(pass.getAnalysis<MachineBlockFrequencyInfo>()), HSpiller(nullptr) {
		}

void spill(LiveRangeEdit &) override;		void spill(LiveRangeEdit &) override;
		void setHSpiller(HoistSpiller *HS) { HSpiller = HS; }
		HoistSpiller *getHSpiller() { return HSpiller; }
		/// Methods for support type inquiry through isa, cast, and dyn_cast:
		static inline bool classof(const Spiller *V) { return true; }

private:		private:
		qcolombetUnsubmitted Done Reply Inline Actions Can set private, right? qcolombet: Can set private, right?
bool isSnippet(const LiveInterval &SnipLI);		bool isSnippet(const LiveInterval &SnipLI);
		qcolombetUnsubmitted Done Reply Inline Actions Ditto. qcolombet: Ditto.
void collectRegsToSpill();		void collectRegsToSpill();

		qcolombetUnsubmitted Done Reply Inline Actions Ditto. qcolombet: Ditto.
bool isRegToSpill(unsigned Reg) {		bool isRegToSpill(unsigned Reg) {
return std::find(RegsToSpill.begin(),		return std::find(RegsToSpill.begin(),
RegsToSpill.end(), Reg) != RegsToSpill.end();		RegsToSpill.end(), Reg) != RegsToSpill.end();
}		}
		qcolombetUnsubmitted Done Reply Inline Actions Ditto. qcolombet: Ditto.

bool isSibling(unsigned Reg);		bool isSibling(unsigned Reg);
MachineInstr traceSiblingValue(unsigned, VNInfo, VNInfo*);
void propagateSiblingValue(SibValueMap::iterator, VNInfo *VNI = nullptr);
void analyzeSiblingValues();

bool hoistSpill(LiveInterval &SpillLI, MachineInstr &CopyMI);
void eliminateRedundantSpills(LiveInterval &LI, VNInfo *VNI);		void eliminateRedundantSpills(LiveInterval &LI, VNInfo *VNI);

void markValueUsed(LiveInterval, VNInfo);		void markValueUsed(LiveInterval, VNInfo);
bool reMaterializeFor(LiveInterval &, MachineInstr &MI);		bool reMaterializeFor(LiveInterval &, MachineInstr &MI);
void reMaterializeAll();		void reMaterializeAll();

bool coalesceStackAccess(MachineInstr *MI, unsigned Reg);		bool coalesceStackAccess(MachineInstr *MI, unsigned Reg);
bool foldMemoryOperand(ArrayRef<std::pair<MachineInstr*, unsigned> >,		bool foldMemoryOperand(ArrayRef<std::pair<MachineInstr*, unsigned> >,
Show All 12 Lines
void Spiller::anchor() { }		void Spiller::anchor() { }

Spiller *createInlineSpiller(MachineFunctionPass &pass,		Spiller *createInlineSpiller(MachineFunctionPass &pass,
MachineFunction &mf,		MachineFunction &mf,
VirtRegMap &vrm) {		VirtRegMap &vrm) {
return new InlineSpiller(pass, mf, vrm);		return new InlineSpiller(pass, mf, vrm);
}		}

		void createHoistSpiller(MachineFunctionPass &pass, MachineFunction &mf,
		VirtRegMap &vrm, Spiller *spiller) {
		HoistSpiller *HSpiller = new HoistSpiller(pass, mf, vrm);
		(dyn_cast<InlineSpiller>(spiller))->setHSpiller(HSpiller);
		qcolombetUnsubmitted Not Done Reply Inline Actions Hid the instantiation of the hoist spiller helper in the inliner spiller. The positive side effect is that we won’t leak the memory! qcolombet: Hid the instantiation of the hoist spiller helper in the inliner spiller. The positive side…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Make HoistSpillHelper a field in inline spiller. wmi: Make HoistSpillHelper a field in inline spiller.
		}

		void startHoistSpiller(MachineFunction &mf, VirtRegMap &vrm, LiveIntervals &lis,
		Spiller *spiller) {
		SmallVector<unsigned, 4> NewVRegs;
		LiveRangeEdit LRE(nullptr, NewVRegs, nullptr, mf, lis, &vrm, nullptr);
		HoistSpiller *HSpiller = (dyn_cast<InlineSpiller>(spiller))->getHSpiller();
		HSpiller->hoistAllSpills(LRE);
		assert(NewVRegs.size() == 0 &&
		"No new vregs should be generated in hoistAllSpills");
		qcolombetUnsubmitted Not Done Reply Inline Actions Hid the call to the hoist spiller helper in the inliner spiller. qcolombet: Hid the call to the hoist spiller helper in the inliner spiller.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I Added postOptimization as a pure virtual func in class Spiller, and put the code of hoist spiller helper inside of InlineSpiller::postOptimization. wmi: I Added postOptimization as a pure virtual func in class Spiller, and put the code of hoist…
		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Snippets		// Snippets
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// When spilling a virtual register, we also spill any snippets it is connected		// When spilling a virtual register, we also spill any snippets it is connected
// to. The snippets are small live ranges that only have a single real use,		// to. The snippets are small live ranges that only have a single real use,
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	for (MachineRegisterInfo::reg_instr_iterator
if (isRegToSpill(SnipReg))		if (isRegToSpill(SnipReg))
continue;		continue;
RegsToSpill.push_back(SnipReg);		RegsToSpill.push_back(SnipReg);
DEBUG(dbgs() << "\talso spill snippet " << SnipLI << '\n');		DEBUG(dbgs() << "\talso spill snippet " << SnipLI << '\n');
++NumSnippets;		++NumSnippets;
}		}
}		}


//===----------------------------------------------------------------------===//
// Sibling Values
//===----------------------------------------------------------------------===//

// After live range splitting, some values to be spilled may be defined by
// copies from sibling registers. We trace the sibling copies back to the
// original value if it still exists. We need it for rematerialization.
//
// Even when the value can't be rematerialized, we still want to determine if
// the value has already been spilled, or we may want to hoist the spill from a
// loop.

bool InlineSpiller::isSibling(unsigned Reg) {		bool InlineSpiller::isSibling(unsigned Reg) {
return TargetRegisterInfo::isVirtualRegister(Reg) &&		return TargetRegisterInfo::isVirtualRegister(Reg) &&
VRM.getOriginal(Reg) == Original;		VRM.getOriginal(Reg) == Original;
}		}

#ifndef NDEBUG
static raw_ostream &operator<<(raw_ostream &OS,
const InlineSpiller::SibValueInfo &SVI) {
OS << "spill " << PrintReg(SVI.SpillReg) << ':'
<< SVI.SpillVNI->id << '@' << SVI.SpillVNI->def;
if (SVI.SpillMBB)
OS << " in BB#" << SVI.SpillMBB->getNumber();
if (SVI.AllDefsAreReloads)
OS << " all-reloads";
if (SVI.DefByOrigPHI)
OS << " orig-phi";
if (SVI.KillsSource)
OS << " kill";
OS << " deps[";
for (VNInfo *Dep : SVI.Deps)
OS << ' ' << Dep->id << '@' << Dep->def;
OS << " ]";
if (SVI.DefMI)
OS << " def: " << *SVI.DefMI;
else
OS << '\n';
return OS;
}
#endif

/// propagateSiblingValue - Propagate the value in SVI to dependents if it is
/// known. Otherwise remember the dependency for later.
///
/// @param SVIIter SibValues entry to propagate.
/// @param VNI Dependent value, or NULL to propagate to all saved dependents.
void InlineSpiller::propagateSiblingValue(SibValueMap::iterator SVIIter,
VNInfo *VNI) {
SibValueMap::value_type SVI = &SVIIter;

// When VNI is non-NULL, add it to SVI's deps, and only propagate to that.
TinyPtrVector<VNInfo*> FirstDeps;
if (VNI) {
FirstDeps.push_back(VNI);
SVI->second.Deps.push_back(VNI);
}

// Has the value been completely determined yet? If not, defer propagation.
if (!SVI->second.hasDef())
return;

// Work list of values to propagate.
SmallSetVector<SibValueMap::value_type *, 8> WorkList;
WorkList.insert(SVI);

do {
SVI = WorkList.pop_back_val();
TinyPtrVector<VNInfo> Deps = VNI ? &FirstDeps : &SVI->second.Deps;
VNI = nullptr;

SibValueInfo &SV = SVI->second;
if (!SV.SpillMBB)
SV.SpillMBB = LIS.getMBBFromIndex(SV.SpillVNI->def);

DEBUG(dbgs() << " prop to " << Deps->size() << ": "
<< SVI->first->id << '@' << SVI->first->def << ":\t" << SV);

assert(SV.hasDef() && "Propagating undefined value");

// Should this value be propagated as a preferred spill candidate? We don't
// propagate values of registers that are about to spill.
bool PropSpill = !DisableHoisting && !isRegToSpill(SV.SpillReg);
unsigned SpillDepth = ~0u;

for (VNInfo Dep : Deps) {
SibValueMap::iterator DepSVI = SibValues.find(Dep);
assert(DepSVI != SibValues.end() && "Dependent value not in SibValues");
SibValueInfo &DepSV = DepSVI->second;
if (!DepSV.SpillMBB)
DepSV.SpillMBB = LIS.getMBBFromIndex(DepSV.SpillVNI->def);

bool Changed = false;

// Propagate defining instruction.
if (!DepSV.hasDef()) {
Changed = true;
DepSV.DefMI = SV.DefMI;
DepSV.DefByOrigPHI = SV.DefByOrigPHI;
}

// Propagate AllDefsAreReloads. For PHI values, this computes an AND of
// all predecessors.
if (!SV.AllDefsAreReloads && DepSV.AllDefsAreReloads) {
Changed = true;
DepSV.AllDefsAreReloads = false;
}

// Propagate best spill value.
if (PropSpill && SV.SpillVNI != DepSV.SpillVNI) {
if (SV.SpillMBB == DepSV.SpillMBB) {
// DepSV is in the same block. Hoist when dominated.
if (DepSV.KillsSource && SV.SpillVNI->def < DepSV.SpillVNI->def) {
// This is an alternative def earlier in the same MBB.
// Hoist the spill as far as possible in SpillMBB. This can ease
// register pressure:
//
// x = def
// y = use x
// s = copy x
//
// Hoisting the spill of s to immediately after the def removes the
// interference between x and y:
//
// x = def
// spill x
// y = use x<kill>
//
// This hoist only helps when the DepSV copy kills its source.
Changed = true;
DepSV.SpillReg = SV.SpillReg;
DepSV.SpillVNI = SV.SpillVNI;
DepSV.SpillMBB = SV.SpillMBB;
}
} else {
// DepSV is in a different block.
if (SpillDepth == ~0u)
SpillDepth = Loops.getLoopDepth(SV.SpillMBB);

// Also hoist spills to blocks with smaller loop depth, but make sure
// that the new value dominates. Non-phi dependents are always
// dominated, phis need checking.

const BranchProbability MarginProb(4, 5); // 80%
// Hoist a spill to outer loop if there are multiple dependents (it
// can be beneficial if more than one dependents are hoisted) or
// if DepSV (the hoisting source) is hotter than SV (the hoisting
// destination) (we add a 80% margin to bias a little towards
// loop depth).
bool HoistCondition =
(MBFI.getBlockFreq(DepSV.SpillMBB) >=
(MBFI.getBlockFreq(SV.SpillMBB) * MarginProb)) \|\|
Deps->size() > 1;

if ((Loops.getLoopDepth(DepSV.SpillMBB) > SpillDepth) &&
HoistCondition &&
(!DepSVI->first->isPHIDef() \|\|
MDT.dominates(SV.SpillMBB, DepSV.SpillMBB))) {
Changed = true;
DepSV.SpillReg = SV.SpillReg;
DepSV.SpillVNI = SV.SpillVNI;
DepSV.SpillMBB = SV.SpillMBB;
}
}
}

if (!Changed)
continue;

// Something changed in DepSVI. Propagate to dependents.
WorkList.insert(&*DepSVI);

DEBUG(dbgs() << " update " << DepSVI->first->id << '@'
<< DepSVI->first->def << " to:\t" << DepSV);
}
} while (!WorkList.empty());
}

/// traceSiblingValue - Trace a value that is about to be spilled back to the
/// real defining instructions by looking through sibling copies. Always stay
/// within the range of OrigVNI so the registers are known to carry the same
/// value.
///
/// Determine if the value is defined by all reloads, so spilling isn't
/// necessary - the value is already in the stack slot.
///
/// Return a defining instruction that may be a candidate for rematerialization.
///
MachineInstr InlineSpiller::traceSiblingValue(unsigned UseReg, VNInfo UseVNI,
VNInfo *OrigVNI) {
// Check if a cached value already exists.
SibValueMap::iterator SVI;
bool Inserted;
std::tie(SVI, Inserted) =
SibValues.insert(std::make_pair(UseVNI, SibValueInfo(UseReg, UseVNI)));
if (!Inserted) {
DEBUG(dbgs() << "Cached value " << PrintReg(UseReg) << ':'
<< UseVNI->id << '@' << UseVNI->def << ' ' << SVI->second);
return SVI->second.DefMI;
}

DEBUG(dbgs() << "Tracing value " << PrintReg(UseReg) << ':'
<< UseVNI->id << '@' << UseVNI->def << '\n');

// List of (Reg, VNI) that have been inserted into SibValues, but need to be
// processed.
SmallVector<std::pair<unsigned, VNInfo*>, 8> WorkList;
WorkList.push_back(std::make_pair(UseReg, UseVNI));

LiveInterval &OrigLI = LIS.getInterval(Original);
do {
unsigned Reg;
VNInfo *VNI;
std::tie(Reg, VNI) = WorkList.pop_back_val();
DEBUG(dbgs() << " " << PrintReg(Reg) << ':' << VNI->id << '@' << VNI->def
<< ":\t");

// First check if this value has already been computed.
SVI = SibValues.find(VNI);
assert(SVI != SibValues.end() && "Missing SibValues entry");

// Trace through PHI-defs created by live range splitting.
if (VNI->isPHIDef()) {
// Stop at original PHIs. We don't know the value at the
// predecessors. Look up the VNInfo for the current definition
// in OrigLI, to properly determine whether or not this phi was
// added by splitting.
if (VNI->def == OrigLI.getVNInfoAt(VNI->def)->def) {
DEBUG(dbgs() << "orig phi value\n");
SVI->second.DefByOrigPHI = true;
SVI->second.AllDefsAreReloads = false;
propagateSiblingValue(SVI);
continue;
}

// This is a PHI inserted by live range splitting. We could trace the
// live-out value from predecessor blocks, but that search can be very
// expensive if there are many predecessors and many more PHIs as
// generated by tail-dup when it sees an indirectbr. Instead, look at
// all the non-PHI defs that have the same value as OrigVNI. They must
// jointly dominate VNI->def. This is not optimal since VNI may actually
// be jointly dominated by a smaller subset of defs, so there is a change
// we will miss a AllDefsAreReloads optimization.

// Separate all values dominated by OrigVNI into PHIs and non-PHIs.
SmallVector<VNInfo*, 8> PHIs, NonPHIs;
LiveInterval &LI = LIS.getInterval(Reg);

for (LiveInterval::vni_iterator VI = LI.vni_begin(), VE = LI.vni_end();
VI != VE; ++VI) {
VNInfo VNI2 = VI;
if (VNI2->isUnused())
continue;
if (!OrigLI.containsOneValue() &&
OrigLI.getVNInfoAt(VNI2->def) != OrigVNI)
continue;
if (VNI2->isPHIDef() && VNI2->def != OrigVNI->def)
PHIs.push_back(VNI2);
else
NonPHIs.push_back(VNI2);
}
DEBUG(dbgs() << "split phi value, checking " << PHIs.size()
<< " phi-defs, and " << NonPHIs.size()
<< " non-phi/orig defs\n");

// Create entries for all the PHIs. Don't add them to the worklist, we
// are processing all of them in one go here.
for (VNInfo *PHI : PHIs)
SibValues.insert(std::make_pair(PHI, SibValueInfo(Reg, PHI)));

// Add every PHI as a dependent of all the non-PHIs.
for (VNInfo *NonPHI : NonPHIs) {
// Known value? Try an insertion.
std::tie(SVI, Inserted) =
SibValues.insert(std::make_pair(NonPHI, SibValueInfo(Reg, NonPHI)));
// Add all the PHIs as dependents of NonPHI.
SVI->second.Deps.insert(SVI->second.Deps.end(), PHIs.begin(),
PHIs.end());
// This is the first time we see NonPHI, add it to the worklist.
if (Inserted)
WorkList.push_back(std::make_pair(Reg, NonPHI));
else
// Propagate to all inserted PHIs, not just VNI.
propagateSiblingValue(SVI);
}

// Next work list item.
continue;
}

MachineInstr *MI = LIS.getInstructionFromIndex(VNI->def);
assert(MI && "Missing def");

// Trace through sibling copies.
if (unsigned SrcReg = isFullCopyOf(MI, Reg)) {
if (isSibling(SrcReg)) {
LiveInterval &SrcLI = LIS.getInterval(SrcReg);
LiveQueryResult SrcQ = SrcLI.Query(VNI->def);
assert(SrcQ.valueIn() && "Copy from non-existing value");
// Check if this COPY kills its source.
SVI->second.KillsSource = SrcQ.isKill();
VNInfo *SrcVNI = SrcQ.valueIn();
DEBUG(dbgs() << "copy of " << PrintReg(SrcReg) << ':'
<< SrcVNI->id << '@' << SrcVNI->def
<< " kill=" << unsigned(SVI->second.KillsSource) << '\n');
// Known sibling source value? Try an insertion.
std::tie(SVI, Inserted) = SibValues.insert(
std::make_pair(SrcVNI, SibValueInfo(SrcReg, SrcVNI)));
// This is the first time we see Src, add it to the worklist.
if (Inserted)
WorkList.push_back(std::make_pair(SrcReg, SrcVNI));
propagateSiblingValue(SVI, VNI);
// Next work list item.
continue;
}
}

// Track reachable reloads.
SVI->second.DefMI = MI;
SVI->second.SpillMBB = MI->getParent();
int FI;
if (Reg == TII.isLoadFromStackSlot(MI, FI) && FI == StackSlot) {
DEBUG(dbgs() << "reload\n");
propagateSiblingValue(SVI);
// Next work list item.
continue;
}

// Potential remat candidate.
DEBUG(dbgs() << "def " << *MI);
SVI->second.AllDefsAreReloads = false;
propagateSiblingValue(SVI);
} while (!WorkList.empty());

// Look up the value we were looking for. We already did this lookup at the
// top of the function, but SibValues may have been invalidated.
SVI = SibValues.find(UseVNI);
assert(SVI != SibValues.end() && "Didn't compute requested info");
DEBUG(dbgs() << " traced to:\t" << SVI->second);
return SVI->second.DefMI;
}

/// analyzeSiblingValues - Trace values defined by sibling copies back to
/// something that isn't a sibling copy.
///
/// Keep track of values that may be rematerializable.
void InlineSpiller::analyzeSiblingValues() {
SibValues.clear();

// No siblings at all?
if (Edit->getReg() == Original)
return;

LiveInterval &OrigLI = LIS.getInterval(Original);
for (unsigned Reg : RegsToSpill) {
LiveInterval &LI = LIS.getInterval(Reg);
for (LiveInterval::const_vni_iterator VI = LI.vni_begin(),
VE = LI.vni_end(); VI != VE; ++VI) {
VNInfo VNI = VI;
if (VNI->isUnused())
continue;
MachineInstr *DefMI = nullptr;
if (!VNI->isPHIDef()) {
DefMI = LIS.getInstructionFromIndex(VNI->def);
assert(DefMI && "No defining instruction");
}
// Check possible sibling copies.
if (VNI->isPHIDef() \|\| DefMI->isCopy()) {
VNInfo *OrigVNI = OrigLI.getVNInfoAt(VNI->def);
assert(OrigVNI && "Def outside original live range");
if (OrigVNI->def != VNI->def)
DefMI = traceSiblingValue(Reg, VNI, OrigVNI);
}
if (DefMI && Edit->checkRematerializable(VNI, DefMI, AA)) {
DEBUG(dbgs() << "Value " << PrintReg(Reg) << ':' << VNI->id << '@'
<< VNI->def << " may remat from " << *DefMI);
}
}
}
}

/// hoistSpill - Given a sibling copy that defines a value to be spilled, insert
/// a spill at a better location.
bool InlineSpiller::hoistSpill(LiveInterval &SpillLI, MachineInstr &CopyMI) {
SlotIndex Idx = LIS.getInstructionIndex(CopyMI);
VNInfo *VNI = SpillLI.getVNInfoAt(Idx.getRegSlot());
assert(VNI && VNI->def == Idx.getRegSlot() && "Not defined by copy");
SibValueMap::iterator I = SibValues.find(VNI);
if (I == SibValues.end())
return false;

const SibValueInfo &SVI = I->second;

// Let the normal folding code deal with the boring case.
if (!SVI.AllDefsAreReloads && SVI.SpillVNI == VNI)
return false;

// SpillReg may have been deleted by remat and DCE.
if (!LIS.hasInterval(SVI.SpillReg)) {
DEBUG(dbgs() << "Stale interval: " << PrintReg(SVI.SpillReg) << '\n');
SibValues.erase(I);
return false;
}

LiveInterval &SibLI = LIS.getInterval(SVI.SpillReg);
if (!SibLI.containsValue(SVI.SpillVNI)) {
DEBUG(dbgs() << "Stale value: " << PrintReg(SVI.SpillReg) << '\n');
SibValues.erase(I);
return false;
}

// Conservatively extend the stack slot range to the range of the original
// value. We may be able to do better with stack slot coloring by being more
// careful here.
assert(StackInt && "No stack slot assigned yet.");
LiveInterval &OrigLI = LIS.getInterval(Original);
VNInfo *OrigVNI = OrigLI.getVNInfoAt(Idx);
StackInt->MergeValueInAsValue(OrigLI, OrigVNI, StackInt->getValNumInfo(0));
DEBUG(dbgs() << "\tmerged orig valno " << OrigVNI->id << ": "
<< *StackInt << '\n');

// Already spilled everywhere.
if (SVI.AllDefsAreReloads) {
DEBUG(dbgs() << "\tno spill needed: " << SVI);
++NumOmitReloadSpill;
return true;
}
// We are going to spill SVI.SpillVNI immediately after its def, so clear out
// any later spills of the same value.
eliminateRedundantSpills(SibLI, SVI.SpillVNI);

MachineBasicBlock *MBB = LIS.getMBBFromIndex(SVI.SpillVNI->def);
MachineBasicBlock::iterator MII;
if (SVI.SpillVNI->isPHIDef())
MII = MBB->SkipPHIsAndLabels(MBB->begin());
else {
MachineInstr *DefMI = LIS.getInstructionFromIndex(SVI.SpillVNI->def);
assert(DefMI && "Defining instruction disappeared");
MII = DefMI;
++MII;
}
// Insert spill without kill flag immediately after def.
TII.storeRegToStackSlot(*MBB, MII, SVI.SpillReg, false, StackSlot,
MRI.getRegClass(SVI.SpillReg), &TRI);
--MII; // Point to store instruction.
LIS.InsertMachineInstrInMaps(*MII);
DEBUG(dbgs() << "\thoisted: " << SVI.SpillVNI->def << '\t' << *MII);

++NumSpills;
++NumHoists;
return true;
}

/// eliminateRedundantSpills - SLI:VNI is known to be on the stack. Remove any		/// eliminateRedundantSpills - SLI:VNI is known to be on the stack. Remove any
		qcolombetUnsubmitted Not Done Reply Inline Actions Don’t repeat the method name. qcolombet: Don’t repeat the method name.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
/// redundant spills of this value in SLI.reg and sibling copies.		/// redundant spills of this value in SLI.reg and sibling copies.
void InlineSpiller::eliminateRedundantSpills(LiveInterval &SLI, VNInfo *VNI) {		void InlineSpiller::eliminateRedundantSpills(LiveInterval &SLI, VNInfo *VNI) {
assert(VNI && "Missing value");		assert(VNI && "Missing value");
SmallVector<std::pair<LiveInterval, VNInfo>, 8> WorkList;		SmallVector<std::pair<LiveInterval, VNInfo>, 8> WorkList;
WorkList.push_back(std::make_pair(&SLI, VNI));		WorkList.push_back(std::make_pair(&SLI, VNI));
assert(StackInt && "No stack slot assigned yet.");		assert(StackInt && "No stack slot assigned yet.");

do {		do {
Show All 37 Lines	for (MachineRegisterInfo::use_instr_nodbg_iterator
// Erase spills.		// Erase spills.
int FI;		int FI;
if (Reg == TII.isStoreToStackSlot(MI, FI) && FI == StackSlot) {		if (Reg == TII.isStoreToStackSlot(MI, FI) && FI == StackSlot) {
DEBUG(dbgs() << "Redundant spill " << Idx << '\t' << *MI);		DEBUG(dbgs() << "Redundant spill " << Idx << '\t' << *MI);
// eliminateDeadDefs won't normally remove stores, so switch opcode.		// eliminateDeadDefs won't normally remove stores, so switch opcode.
MI->setDesc(TII.get(TargetOpcode::KILL));		MI->setDesc(TII.get(TargetOpcode::KILL));
DeadDefs.push_back(MI);		DeadDefs.push_back(MI);
++NumSpillsRemoved;		++NumSpillsRemoved;
		if (HSpiller && HSpiller->rmFromMergableSpills(MI, StackSlot))
--NumSpills;		--NumSpills;
}		}
}		}
} while (!WorkList.empty());		} while (!WorkList.empty());
}		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Rematerialization		// Rematerialization
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (!ParentVNI) {
}		}
DEBUG(dbgs() << UseIdx << '\t' << MI);		DEBUG(dbgs() << UseIdx << '\t' << MI);
return true;		return true;
}		}

if (SnippetCopies.count(&MI))		if (SnippetCopies.count(&MI))
return false;		return false;

// Use an OrigVNI from traceSiblingValue when ParentVNI is a sibling copy.		LiveInterval &OrigLI = LIS.getInterval(Original);
LiveRangeEdit::Remat RM(ParentVNI);		VNInfo *OrigVNI = OrigLI.getVNInfoAt(UseIdx);
SibValueMap::const_iterator SibI = SibValues.find(ParentVNI);		LiveRangeEdit::Remat RM(ParentVNI, OrigVNI);
if (SibI != SibValues.end())		RM.OrigMI = LIS.getInstructionFromIndex(OrigVNI->def);
RM.OrigMI = SibI->second.DefMI;
if (!Edit->canRematerializeAt(RM, UseIdx, false)) {		if (!Edit->canRematerializeAt(RM, UseIdx, false)) {
markValueUsed(&VirtReg, ParentVNI);		markValueUsed(&VirtReg, ParentVNI);
DEBUG(dbgs() << "\tcannot remat for " << UseIdx << '\t' << MI);		DEBUG(dbgs() << "\tcannot remat for " << UseIdx << '\t' << MI);
return false;		return false;
}		}

// If the instruction also writes VirtReg.reg, it had better not require the		// If the instruction also writes VirtReg.reg, it had better not require the
// same register for uses and defs.		// same register for uses and defs.
Show All 34 Lines	bool InlineSpiller::reMaterializeFor(LiveInterval &VirtReg, MachineInstr &MI) {

++NumRemats;		++NumRemats;
return true;		return true;
}		}

/// reMaterializeAll - Try to rematerialize as many uses as possible,		/// reMaterializeAll - Try to rematerialize as many uses as possible,
/// and trim the live ranges after.		/// and trim the live ranges after.
void InlineSpiller::reMaterializeAll() {		void InlineSpiller::reMaterializeAll() {
// analyzeSiblingValues has already tested all relevant defining instructions.
if (!Edit->anyRematerializable(AA))		if (!Edit->anyRematerializable(AA))
return;		return;

UsedValues.clear();		UsedValues.clear();

// Try to remat before all uses of snippets.		// Try to remat before all uses of snippets.
bool anyRemat = false;		bool anyRemat = false;
for (unsigned Reg : RegsToSpill) {		for (unsigned Reg : RegsToSpill) {
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	bool InlineSpiller::coalesceStackAccess(MachineInstr *MI, unsigned Reg) {
bool IsLoad = InstrReg;		bool IsLoad = InstrReg;
if (!IsLoad)		if (!IsLoad)
InstrReg = TII.isStoreToStackSlot(MI, FI);		InstrReg = TII.isStoreToStackSlot(MI, FI);

// We have a stack access. Is it the right register and slot?		// We have a stack access. Is it the right register and slot?
if (InstrReg != Reg \|\| FI != StackSlot)		if (InstrReg != Reg \|\| FI != StackSlot)
return false;		return false;

		if (!IsLoad && HSpiller)
		HSpiller->rmFromMergableSpills(MI, StackSlot);

DEBUG(dbgs() << "Coalescing stack access: " << *MI);		DEBUG(dbgs() << "Coalescing stack access: " << *MI);
LIS.RemoveMachineInstrFromMaps(*MI);		LIS.RemoveMachineInstrFromMaps(*MI);
MI->eraseFromParent();		MI->eraseFromParent();

if (IsLoad) {		if (IsLoad) {
++NumReloadsRemoved;		++NumReloadsRemoved;
--NumReloads;		--NumReloads;
} else {		} else {
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	for (MIBundleOperands MO(*MI); MO.isValid(); ++MO) {
if (RI.FullyDefined)		if (RI.FullyDefined)
continue;		continue;
// FoldMI does not define this physreg. Remove the LI segment.		// FoldMI does not define this physreg. Remove the LI segment.
assert(MO->isDead() && "Cannot fold physreg def");		assert(MO->isDead() && "Cannot fold physreg def");
SlotIndex Idx = LIS.getInstructionIndex(*MI).getRegSlot();		SlotIndex Idx = LIS.getInstructionIndex(*MI).getRegSlot();
LIS.removePhysRegDefAt(Reg, Idx);		LIS.removePhysRegDefAt(Reg, Idx);
}		}

		int FI;
		if (TII.isStoreToStackSlot(MI, FI) && HSpiller &&
		HSpiller->rmFromMergableSpills(MI, FI))
		--NumSpills;
LIS.ReplaceMachineInstrInMaps(MI, FoldMI);		LIS.ReplaceMachineInstrInMaps(MI, FoldMI);
MI->eraseFromParent();		MI->eraseFromParent();

// Insert any new instructions other than FoldMI into the LIS maps.		// Insert any new instructions other than FoldMI into the LIS maps.
assert(!MIS.empty() && "Unexpected empty span of instructions!");		assert(!MIS.empty() && "Unexpected empty span of instructions!");
for (MachineInstr &MI : MIS)		for (MachineInstr &MI : MIS)
if (&MI != FoldMI)		if (&MI != FoldMI)
LIS.InsertMachineInstrInMaps(MI);		LIS.InsertMachineInstrInMaps(MI);
Show All 9 Lines	for (unsigned i = FoldMI->getNumOperands(); i; --i) {
FoldMI->RemoveOperand(i - 1);		FoldMI->RemoveOperand(i - 1);
}		}

DEBUG(dumpMachineInstrRangeWithSlotIndex(MIS.begin(), MIS.end(), LIS,		DEBUG(dumpMachineInstrRangeWithSlotIndex(MIS.begin(), MIS.end(), LIS,
"folded"));		"folded"));

if (!WasCopy)		if (!WasCopy)
++NumFolded;		++NumFolded;
else if (Ops.front().second == 0)		else if (Ops.front().second == 0) {
++NumSpills;		++NumSpills;
else		if (HSpiller)
		HSpiller->addToMergableSpills(FoldMI, StackSlot, Original);
		} else
++NumReloads;		++NumReloads;
return true;		return true;
}		}

void InlineSpiller::insertReload(unsigned NewVReg,		void InlineSpiller::insertReload(unsigned NewVReg,
SlotIndex Idx,		SlotIndex Idx,
MachineBasicBlock::iterator MI) {		MachineBasicBlock::iterator MI) {
MachineBasicBlock &MBB = *MI->getParent();		MachineBasicBlock &MBB = *MI->getParent();
Show All 18 Lines	void InlineSpiller::insertSpill(unsigned NewVReg, bool isKill,
TII.storeRegToStackSlot(MBB, std::next(MI), NewVReg, isKill, StackSlot,		TII.storeRegToStackSlot(MBB, std::next(MI), NewVReg, isKill, StackSlot,
MRI.getRegClass(NewVReg), &TRI);		MRI.getRegClass(NewVReg), &TRI);

LIS.InsertMachineInstrRangeInMaps(std::next(MI), MIS.end());		LIS.InsertMachineInstrRangeInMaps(std::next(MI), MIS.end());

DEBUG(dumpMachineInstrRangeWithSlotIndex(std::next(MI), MIS.end(), LIS,		DEBUG(dumpMachineInstrRangeWithSlotIndex(std::next(MI), MIS.end(), LIS,
"spill"));		"spill"));
++NumSpills;		++NumSpills;
		if (HSpiller)
		HSpiller->addToMergableSpills(std::next(MI), StackSlot, Original);
}		}

/// spillAroundUses - insert spill code around each use of Reg.		/// spillAroundUses - insert spill code around each use of Reg.
void InlineSpiller::spillAroundUses(unsigned Reg) {		void InlineSpiller::spillAroundUses(unsigned Reg) {
DEBUG(dbgs() << "spillAroundUses " << PrintReg(Reg) << '\n');		DEBUG(dbgs() << "spillAroundUses " << PrintReg(Reg) << '\n');
LiveInterval &OldLI = LIS.getInterval(Reg);		LiveInterval &OldLI = LIS.getInterval(Reg);

// Iterate over instructions using Reg.		// Iterate over instructions using Reg.
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	for (MachineRegisterInfo::reg_bundle_iterator
unsigned SibReg = isFullCopyOf(MI, Reg);		unsigned SibReg = isFullCopyOf(MI, Reg);
if (SibReg && isSibling(SibReg)) {		if (SibReg && isSibling(SibReg)) {
// This may actually be a copy between snippets.		// This may actually be a copy between snippets.
if (isRegToSpill(SibReg)) {		if (isRegToSpill(SibReg)) {
DEBUG(dbgs() << "Found new snippet copy: " << *MI);		DEBUG(dbgs() << "Found new snippet copy: " << *MI);
SnippetCopies.insert(MI);		SnippetCopies.insert(MI);
continue;		continue;
}		}
if (RI.Writes) {		if (!RI.Writes) {
// Hoist the spill of a sib-reg copy.
if (hoistSpill(OldLI, *MI)) {
// This COPY is now dead, the value is already in the stack slot.
MI->getOperand(0).setIsDead();
DeadDefs.push_back(MI);
continue;
}
} else {
// This is a reload for a sib-reg copy. Drop spills downstream.		// This is a reload for a sib-reg copy. Drop spills downstream.
LiveInterval &SibLI = LIS.getInterval(SibReg);		LiveInterval &SibLI = LIS.getInterval(SibReg);
eliminateRedundantSpills(SibLI, SibLI.getVNInfoAt(Idx));		eliminateRedundantSpills(SibLI, SibLI.getVNInfoAt(Idx));
// The COPY will fold to a reload below.		// The COPY will fold to a reload below.
}		}
}		}

// Attempt to fold memory ops.		// Attempt to fold memory ops.
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	DEBUG(dbgs() << "Inline spilling "
<< TRI.getRegClassName(MRI.getRegClass(edit.getReg()))		<< TRI.getRegClassName(MRI.getRegClass(edit.getReg()))
<< ':' << edit.getParent()		<< ':' << edit.getParent()
<< "\nFrom original " << PrintReg(Original) << '\n');		<< "\nFrom original " << PrintReg(Original) << '\n');
assert(edit.getParent().isSpillable() &&		assert(edit.getParent().isSpillable() &&
"Attempting to spill already spilled value.");		"Attempting to spill already spilled value.");
assert(DeadDefs.empty() && "Previous spill didn't remove dead defs");		assert(DeadDefs.empty() && "Previous spill didn't remove dead defs");

collectRegsToSpill();		collectRegsToSpill();
analyzeSiblingValues();
reMaterializeAll();		reMaterializeAll();

// Remat may handle everything.		// Remat may handle everything.
if (!RegsToSpill.empty())		if (!RegsToSpill.empty())
spillAll();		spillAll();

Edit->calculateRegClassAndHint(MF, Loops, MBFI);		Edit->calculateRegClassAndHint(MF, Loops, MBFI);
}		}

		// When a spill is inserted, add the spill to MergableSpills map.
		qcolombetUnsubmitted Not Done Reply Inline Actions Don’t repeat the name of the method. qcolombet: Don’t repeat the name of the method.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions All similar comments fixed. wmi: All similar comments fixed.
		void HoistSpiller::addToMergableSpills(MachineInstr *Spill, int StackSlot,
		unsigned Original) {
		StackSlotToReg[StackSlot] = Original;
		SlotIndex Idx = LIS.getInstructionIndex(*Spill);
		VNInfo *OrigVNI = LIS.getInterval(Original).getVNInfoAt(Idx.getRegSlot());
		std::pair<int, VNInfo *> MIdx = std::make_pair(StackSlot, OrigVNI);
		MergableSpills[MIdx].insert(Spill);
		}

		// When a spill is removed, remove the spill from MergableSpills map.
		// Return true if the spill is removed successfully.
		qcolombetUnsubmitted Not Done Reply Inline Actions More mergeable typos... qcolombet: More mergeable typos...
		wmiAuthorUnsubmitted Not Done Reply Inline Actions All such typos Fixed. wmi: All such typos Fixed.
		bool HoistSpiller::rmFromMergableSpills(MachineInstr *Spill, int StackSlot) {
		int Original = StackSlotToReg[StackSlot];
		if (!Original)
		return false;
		SlotIndex Idx = LIS.getInstructionIndex(*Spill);
		VNInfo *OrigVNI = LIS.getInterval(Original).getVNInfoAt(Idx.getRegSlot());
		std::pair<int, VNInfo *> MIdx = std::make_pair(StackSlot, OrigVNI);
		return MergableSpills[MIdx].erase(Spill);
		}

		// Check BB to see if it is a possible target BB to place a hoisted spill,
		// .i.e, there should be a living sibling of OrigReg at the insert point.
		bool HoistSpiller::isSpillCandBB(unsigned OrigReg, VNInfo *OrigVNI,
		qcolombetUnsubmitted Not Done Reply Inline Actions .i.e => i.e. qcolombet: .i.e => i.e.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		qcolombetUnsubmitted Not Done Reply Inline Actions Use doxygen style comment. qcolombet: Use doxygen style comment.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		MachineBasicBlock *BB, unsigned &LiveReg) {
		SlotIndex Idx;
		MachineBasicBlock::iterator MI = BB->getFirstTerminator();
		if (MI != BB->end())
		Idx = LIS.getInstructionIndex(*MI);
		else
		Idx = LIS.getMBBEndIdx(BB).getPrevSlot();
		DenseSet<unsigned> &Siblings = Virt2SiblingsMap[OrigReg];
		assert((LIS.getInterval(OrigReg)).getVNInfoAt(Idx) == OrigVNI &&
		"Unexpected VNI");

		for (auto const ent : Siblings) {
		qcolombetUnsubmitted Not Done Reply Inline Actions DenseSet does not guarantee that the iteration order is stable from one run to another, does it? Although we should have several siblings live at the same time, this is theoretically possible. In other words, we should use a container that has a deterministic iteration order for reproducibility. See the earlier suggestion I made. qcolombet: DenseSet does not guarantee that the iteration order is stable from one run to another, does it?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, when turn on split-mode-size, after hoistCopies, it is possible that several siblings live at the same time. it is not just therotically possible but realistic. I use SmallSetVector instead here. wmi: Yes, when turn on split-mode-size, after hoistCopies, it is possible that several siblings live…
		LiveInterval &LI = LIS.getInterval(ent);
		VNInfo *VNI = LI.getVNInfoAt(Idx);
		if (VNI) {
		LiveReg = ent;
		return true;
		}
		}
		return false;
		}

		/// Get the top-bottom order to visit the BB nodes containing spills.
		/// Redundent spills will be found and put into SpillsToRm at the
		/// same time.
		void HoistSpiller::getVisitOrders(
		qcolombetUnsubmitted Not Done Reply Inline Actions This method would benefit a more verbose comment. Maybe something along the line: Starting from \p Root find a top-down traversal order of the dominator tree to visit all basic blocks containing the elements of \p Spills. Redundent spills will be found and put into \p SpillsToRm at the same time. \p SpillBBToSpill will populated as part of the process and maps a basic block to the first store occurring in the basic block. \post SpillToRm.union(Spills@post) == Spills@pre What is the usage of SpillsToKept? In particular the unsigned part? Should we consider to moving some of the arguments to field of the current hoistspill instance? qcolombet: This method would benefit a more verbose comment. Maybe something along the line: Starting from…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Thanks for your example comment. It is good. I copy most of them to the code. Should we consider to moving some of the arguments to field of the current hoistspill instance? hoistSpillHelper now has the same lifetime as InlineSpiller instance, so the lifes of those arguments are much shorter than that. That is why I choose to keep them as func local objects. wmi: Thanks for your example comment. It is good. I copy most of them to the code. > Should we…
		qcolombetUnsubmitted Not Done Reply Inline Actions This method does a bunch of things! Although I understand we want to share the logic that does the traversal and such, I found that it makes the code harder to read. I’d say as it is now and with more comment like I suggested, this is fine, but in general I rather have a better separation of concerns then try to optimize if it turns out to be problematic. I am guessing you already went through that process, we are just lacking the history :). qcolombet: This method does a bunch of things! Although I understand we want to share the logic that does…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions After separate part of the work into HoistSpillHelper::rmRedundentSpills and add more comments in the func body, it may make the code easier to read now. wmi: After separate part of the work into HoistSpillHelper::rmRedundentSpills and add more comments…
		MachineBasicBlock Root, SmallPtrSet<MachineInstr , 16> &Spills,
		SmallVectorImpl<MachineDomTreeNode *> &Orders,
		SmallVectorImpl<MachineInstr *> &SpillsToRm,
		DenseMap<MachineDomTreeNode *, unsigned> &SpillsToKept,
		DenseMap<MachineDomTreeNode , MachineInstr > &SpillBBToSpill) {
		// For each spill, check the BB the spill is located at and set
		// SpillBBToSpill[]. If a BB contains more than one spill, only
		// keep the spill with smaller SlotIndex.
		for (const auto CurrentSpill : Spills) {
		MachineBasicBlock *Block = CurrentSpill->getParent();
		MachineDomTreeNode *Node = MDT.DT->getNode(Block);
		MachineInstr *PrevSpill = SpillBBToSpill[Node];
		if (PrevSpill) {
		SlotIndex PIdx = LIS.getInstructionIndex(*PrevSpill);
		SlotIndex CIdx = LIS.getInstructionIndex(*CurrentSpill);
		MachineInstr *SpillToRm = (CIdx > PIdx) ? CurrentSpill : PrevSpill;
		MachineInstr *SpillToKeep = (CIdx > PIdx) ? PrevSpill : CurrentSpill;
		SpillsToRm.push_back(SpillToRm);
		SpillBBToSpill[MDT.DT->getNode(Block)] = SpillToKeep;
		qcolombetUnsubmitted Not Done Reply Inline Actions Any chance this could be updated when we insert the spill? Like I said, it just feels like getVisitOrders does too many things. qcolombet: Any chance this could be updated when we insert the spill? Like I said, it just feels like…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I separate the first part of the work to another func: HoistSpillHelper::rmRedundentSpills, so HoistSpiller::getVisitOrders is more focus on what its name describes. wmi: I separate the first part of the work to another func: HoistSpillHelper::rmRedundentSpills, so…
		} else {
		SpillBBToSpill[MDT.DT->getNode(Block)] = CurrentSpill;
		}
		}
		for (const auto SpillToRm : SpillsToRm)
		Spills.erase(SpillToRm);

		SmallPtrSet<MachineDomTreeNode *, 8> WorkSet;
		qcolombetUnsubmitted Not Done Reply Inline Actions Please document what is WorkSet supposed to contain. qcolombet: Please document what is WorkSet supposed to contain.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		SmallPtrSet<MachineDomTreeNode *, 8> NodesOnPath;
		MachineDomTreeNode *RootIDomNode = MDT[Root]->getIDom();
		qcolombetUnsubmitted Not Done Reply Inline Actions I think we should describe what is the expected root, because it seems strange to me that we don’t just take the node for the Root. qcolombet: I think we should describe what is the expected root, because it seems strange to me that we…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		// For every node on the dominator tree with spill, walk upside on the
		// dominator tree until reaching the Root node. If there is other node
		// found with spill on the path, the original node is redundent and will
		// be removed. All the nodes on the path from node with non-redundent spill
		// to Root node will be added to the WorkSet, which is the set we want
		// to look at during hoisting spills in the next step.
		for (const auto Spill : Spills) {
		MachineBasicBlock *Block = Spill->getParent();
		MachineDomTreeNode *Node = MDT[Block];
		MachineInstr *SpillToRm = nullptr;
		while (Node != RootIDomNode) {
		if (Node != MDT[Block] && SpillBBToSpill[Node]) {
		qcolombetUnsubmitted Not Done Reply Inline Actions More comments, e.g., Node dominates Block and already store the value. This store is redundant. qcolombet: More comments, e.g., Node dominates Block and already store the value. This store is redundant.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		SpillToRm = SpillBBToSpill[MDT[Block]];
		break;
		} else if (WorkSet.count(Node)) {
		break;
		} else {
		NodesOnPath.insert(Node);
		}
		Node = Node->getIDom();
		}
		if (SpillToRm) {
		SpillsToRm.push_back(SpillToRm);
		} else {
		SpillsToKept[MDT[Block]] = 0;
		qcolombetUnsubmitted Not Done Reply Inline Actions Ok, found the meaning of the unsigned elsewhere… A comment here as well would be great. qcolombet: Ok, found the meaning of the unsigned elsewhere… A comment here as well would be great.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		WorkSet.insert(NodesOnPath.begin(), NodesOnPath.end());
		}
		NodesOnPath.clear();
		}

		// Sort the nodes in WorkSet in top-bottom order and save the nodes
		// in Orders.
		unsigned idx = 0;
		Orders.push_back(MDT.DT->getNode(Root));
		do {
		MachineDomTreeNode *Node = Orders[idx++];
		const std::vector<MachineDomTreeNode *> &Children = Node->getChildren();
		unsigned NumChildren = Children.size();
		for (unsigned i = 0; i != NumChildren; ++i) {
		MachineDomTreeNode *Child = Children[i];
		if (WorkSet.count(Child))
		Orders.push_back(Child);
		}
		} while (idx != Orders.size());

		qcolombetUnsubmitted Not Done Reply Inline Actions Assert Orders.size == WorkSet.size? qcolombet: Assert Orders.size == WorkSet.size?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		DEBUG(dbgs() << "Orders size is " << Orders.size() << "\n");
		{
		SmallVector<MachineDomTreeNode *, 32>::reverse_iterator RIt =
		Orders.rbegin();
		for (; RIt != Orders.rend(); RIt++)
		DEBUG(dbgs() << "BB" << (*RIt)->getBlock()->getNumber() << ",");
		}
		DEBUG(dbgs() << "\n");
		}

		/// Try to hoist spills according to BB hotness. The spills to removed will
		/// be saved in SpillsToRm. The spills to be inserted will be saved in
		/// SpillsToIns.
		void HoistSpiller::runHoistSpills(
		unsigned OrigReg, VNInfo OrigVNI, SmallPtrSet<MachineInstr , 16> &Spills,
		SmallVectorImpl<MachineInstr *> &SpillsToRm,
		DenseMap<MachineBasicBlock *, unsigned> &SpillsToIns) {
		// Visit order of dominator tree nodes.
		SmallVector<MachineDomTreeNode *, 32> Orders;
		// SpillsToKept contains all the nodes where spills are to be inserted
		// during hoisting. If the spill to be inserted is an original spill
		// (not a hoisted one), the value of the map entry is 0. If the spill
		qcolombetUnsubmitted Not Done Reply Inline Actions We do not insert the original store, it is already there, right? qcolombet: We do not insert the original store, it is already there, right?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, that is right. wmi: Yes, that is right.
		// is a hoisted spill, the value of the map entry is the VReg to be used
		// on the RHS of the spill.
		DenseMap<MachineDomTreeNode *, unsigned> SpillsToKept;
		// Map from BB to the spill inside of it.
		DenseMap<MachineDomTreeNode , MachineInstr > SpillBBToSpill;
		MachineBasicBlock *Root = LIS.getMBBFromIndex(OrigVNI->def);
		getVisitOrders(Root, Spills, Orders, SpillsToRm, SpillsToKept,
		SpillBBToSpill);

		// SpillsInSubTree keeps the map from a dom tree node to a nodes set.
		// It saves the locations where spills are to be inserted in the
		// subtree of the node.
		DenseMap<MachineDomTreeNode , SmallPtrSet<MachineDomTreeNode , 16>>
		SpillsInSubTree;
		// Iterate Orders set in reverse order, which will be a bottom-top order
		qcolombetUnsubmitted Not Done Reply Inline Actions I believe we usually use bottom-up instead of bottom-top. qcolombet: I believe we usually use bottom-up instead of bottom-top.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		// in the dominator tree. Once we visit a dom tree node, we know its
		// children has already been visited and the spill locations in the
		qcolombetUnsubmitted Not Done Reply Inline Actions have qcolombet: have
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		// subtrees of all the children have been determined.
		SmallVector<MachineDomTreeNode *, 32>::reverse_iterator RIt = Orders.rbegin();
		for (; RIt != Orders.rend(); RIt++) {
		MachineBasicBlock Block = (RIt)->getBlock();

		// If Block contains an original spill, simply continue.
		if (SpillsToKept.find(RIt) != SpillsToKept.end() && !SpillsToKept[RIt]) {
		SpillsInSubTree[RIt].insert(RIt);
		continue;
		}

		// Collect spills in subtree of current node (*RIt) to
		// SpillsInSubTree[*RIt].
		const std::vector<MachineDomTreeNode > &Children = (RIt)->getChildren();
		unsigned NumChildren = Children.size();
		for (unsigned i = 0; i != NumChildren; ++i) {
		MachineDomTreeNode *Child = Children[i];
		SpillsInSubTree[*RIt].insert(SpillsInSubTree[Child].begin(),
		SpillsInSubTree[Child].end());
		SpillsInSubTree.erase(Child);
		}

		// No spills in subtree, simply continue.
		if (SpillsInSubTree[*RIt].empty())
		continue;

		// Check whether Block is a possible candidate to insert spill.
		unsigned LiveReg = 0;
		if (!isSpillCandBB(OrigReg, OrigVNI, Block, LiveReg))
		continue;

		// Now Block is a proper target BB for hoisting spills. Decide whether to
		// hoist the spills to current node. Get existing cost of all the spills
		// in SpillsInSubTree[Block].
		BlockFrequency SpillCost = 0;
		for (const auto SpillBB : SpillsInSubTree[*RIt])
		SpillCost += MBFI.getBlockFreq(SpillBB->getBlock());
		qcolombetUnsubmitted Not Done Reply Inline Actions If the subtrees get big, we will end up recomputing this cost a bunch of time. Could it be something we keep alongside the subtree? qcolombet: If the subtrees get big, we will end up recomputing this cost a bunch of time. Could it be…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I keep it along side the subtree in SpillsInSubTreeMap. wmi: I keep it along side the subtree in SpillsInSubTreeMap.

		// If there are multiple spills that could be merged, bias a little
		// to hoist the spill.
		BranchProbability MarginProb = (SpillsInSubTree[*RIt].size() > 1)
		? BranchProbability(9, 10)
		: BranchProbability(1, 1);
		if (SpillCost > MBFI.getBlockFreq(Block) * MarginProb) {
		qcolombetUnsubmitted Not Done Reply Inline Actions We could add a mode for the hoist spiller, where code size is the priority. I.e., always hoist when SpillsInSubTree.size > 1 A follow-up patch is fine. qcolombet: We could add a mode for the hoist spiller, where code size is the priority. I.e., always hoist…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok, I will do it in a follow-up patch. wmi: Ok, I will do it in a follow-up patch.
		// Hoist: Move spills to current Block.
		for (const auto SpillBB : SpillsInSubTree[*RIt]) {
		// When SpillBB is a BB contains original spill, insert the spill
		// to SpillsToRm.
		if (SpillsToKept.find(SpillBB) != SpillsToKept.end() &&
		!SpillsToKept[SpillBB]) {
		MachineInstr *SpillToRm = SpillBBToSpill[SpillBB];
		SpillsToRm.push_back(SpillToRm);
		}
		// SpillBB will not contain spill anymore, remove it from SpillsToKept.
		SpillsToKept.erase(SpillBB);
		}
		// Current Block is the BB containing the new hoisted spill. Add it to
		// SpillsToKept. LiveReg is the RHS of the spill.
		SpillsToKept[*RIt] = LiveReg;
		DEBUG({
		dbgs() << "spills in BB: ";
		for (const auto Rspill : SpillsInSubTree[*RIt])
		dbgs() << Rspill->getBlock()->getNumber() << " ";
		dbgs() << "were promoted to BB" << (*RIt)->getBlock()->getNumber()
		<< "\n";
		});
		SpillsInSubTree[*RIt].clear();
		SpillsInSubTree[RIt].insert(RIt);
		}
		}
		// For spills in SpillsToKept with LiveReg set (.i.e, not original spill),
		qcolombetUnsubmitted Not Done Reply Inline Actions typo. qcolombet: typo.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		// save them to SpillsToIns.
		for (const auto ent : SpillsToKept) {
		qcolombetUnsubmitted Not Done Reply Inline Actions Variable must start with a capital letter. Also why use ent for the name? qcolombet: Variable must start with a capital letter. Also why use ent for the name?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		if (ent.second)
		SpillsToIns[ent.first->getBlock()] = ent.second;
		}
		}

		/// For spills with equal values, remove redundent spills and hoist spills
		/// to a less hot spot.
		void HoistSpiller::hoistAllSpills(LiveRangeEdit &Edit) {
		qcolombetUnsubmitted Not Done Reply Inline Actions Explain the general algorithm here. qcolombet: Explain the general algorithm here.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		// Save the mapping between stackslot and its original reg.
		DenseMap<int, unsigned> SlotToOrigReg;
		for (unsigned i = 0, e = MRI.getNumVirtRegs(); i != e; ++i) {
		unsigned Reg = TargetRegisterInfo::index2VirtReg(i);
		int Slot = VRM.getStackSlot(Reg);
		if (Slot != VirtRegMap::NO_STACK_SLOT) {
		for (const auto &ent : MergableSpills) {
		if (ent.first.first == Slot &&
		SlotToOrigReg.find(Slot) == SlotToOrigReg.end())
		SlotToOrigReg[Slot] = VRM.getOriginal(Reg);
		}
		}
		unsigned Original = VRM.getPreSplitReg(Reg);
		if (!MRI.def_empty(Reg))
		Virt2SiblingsMap[Original].insert(Reg);
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions This loop scares me. Any chance this information can be built as we insert spill. qcolombet: This loop scares me. Any chance this information can be built as we insert spill.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I simply remove the inner loop. SlotToOrigReg map will become somewhat bigger, but not a lot. wmi: I simply remove the inner loop. SlotToOrigReg map will become somewhat bigger, but not a lot.

		// Each entry in MergableSpills contains a spill set with equal values.
		for (auto &ent : MergableSpills) {
		int Slot = ent.first.first;
		unsigned OrigReg = SlotToOrigReg[Slot];
		VNInfo *OrigVNI = ent.first.second;
		SmallPtrSet<MachineInstr *, 16> &EqValSpills = ent.second;
		if (!ent.second.size())
		qcolombetUnsubmitted Not Done Reply Inline Actions empty qcolombet: empty
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		continue;

		DEBUG({
		dbgs() << "\nFor Slot" << Slot << " and VN" << OrigVNI->id << ":\n"
		<< "Equal spills in BB: ";
		for (const auto spill : EqValSpills)
		dbgs() << spill->getParent()->getNumber() << " ";
		dbgs() << "\n";
		});

		// SpillsToRm is the spill set to be removed from EqValSpills.
		SmallVector<MachineInstr *, 16> SpillsToRm;
		// SpillsToIns is the spill set to be newly inserted after hoisting.
		DenseMap<MachineBasicBlock *, unsigned> SpillsToIns;

		runHoistSpills(OrigReg, OrigVNI, EqValSpills, SpillsToRm, SpillsToIns);

		DEBUG({
		dbgs() << "Finally inserted spills in BB: ";
		for (const auto Ispill : SpillsToIns)
		dbgs() << Ispill.first->getNumber() << " ";
		dbgs() << "\nFinally removed spills in BB: ";
		for (const auto Rspill : SpillsToRm)
		dbgs() << Rspill->getParent()->getNumber() << " ";
		dbgs() << "\n";
		});

		// Stack live range update.
		LiveInterval &StackIntvl = LSS.getInterval(Slot);
		if (!SpillsToIns.empty() \|\| !SpillsToRm.empty()) {
		LiveInterval &OrigLI = LIS.getInterval(OrigReg);
		StackIntvl.MergeValueInAsValue(OrigLI, OrigVNI,
		StackIntvl.getValNumInfo(0));
		}

		// Insert hoisted spills.
		for (auto const ent : SpillsToIns) {
		MachineBasicBlock *BB = ent.first;
		unsigned LiveReg = ent.second;
		MachineBasicBlock::iterator MI = BB->getFirstTerminator();
		TII.storeRegToStackSlot(*BB, MI, LiveReg, false, Slot,
		MRI.getRegClass(LiveReg), &TRI);
		LIS.InsertMachineInstrRangeInMaps(std::prev(MI), MI);
		++NumSpills;
		}

		// Remove redundent spills or change them to dead instructions.
		NumSpills -= SpillsToRm.size();
		for (auto const ent : SpillsToRm) {
		ent->setDesc(TII.get(TargetOpcode::KILL));
		for (unsigned i = ent->getNumOperands(); i; --i) {
		MachineOperand &MO = ent->getOperand(i - 1);
		if (MO.isReg() && MO.isImplicit() && MO.isDef() && !MO.isDead())
		ent->RemoveOperand(i - 1);
		}
		}
		Edit.eliminateDeadDefs(SpillsToRm, None, true);
		}
		}

lib/CodeGen/LiveRangeEdit.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	bool LiveRangeEdit::checkRematerializable(VNInfo *VNI,
Remattable.insert(VNI);		Remattable.insert(VNI);
return true;		return true;
}		}

void LiveRangeEdit::scanRemattable(AliasAnalysis *aa) {		void LiveRangeEdit::scanRemattable(AliasAnalysis *aa) {
for (VNInfo *VNI : getParent().valnos) {		for (VNInfo *VNI : getParent().valnos) {
if (VNI->isUnused())		if (VNI->isUnused())
continue;		continue;
MachineInstr *DefMI = LIS.getInstructionFromIndex(VNI->def);		unsigned Original = VRM->getOriginal(getReg());
		LiveInterval &OrigLI = LIS.getInterval(Original);
		VNInfo *OrigVNI = OrigLI.getVNInfoAt(VNI->def);
		MachineInstr *DefMI = LIS.getInstructionFromIndex(OrigVNI->def);
if (!DefMI)		if (!DefMI)
continue;		continue;
checkRematerializable(VNI, DefMI, aa);		checkRematerializable(OrigVNI, DefMI, aa);
}		}
ScannedRemattable = true;		ScannedRemattable = true;
}		}

bool LiveRangeEdit::anyRematerializable(AliasAnalysis *aa) {		bool LiveRangeEdit::anyRematerializable(AliasAnalysis *aa) {
if (!ScannedRemattable)		if (!ScannedRemattable)
scanRemattable(aa);		scanRemattable(aa);
return !Remattable.empty();		return !Remattable.empty();
Show All 36 Lines
}		}

bool LiveRangeEdit::canRematerializeAt(Remat &RM,		bool LiveRangeEdit::canRematerializeAt(Remat &RM,
SlotIndex UseIdx,		SlotIndex UseIdx,
bool cheapAsAMove) {		bool cheapAsAMove) {
assert(ScannedRemattable && "Call anyRematerializable first");		assert(ScannedRemattable && "Call anyRematerializable first");

// Use scanRemattable info.		// Use scanRemattable info.
if (!Remattable.count(RM.ParentVNI))		if (!Remattable.count(RM.OrigVNI))
return false;		return false;

// No defining instruction provided.		// No defining instruction provided.
SlotIndex DefIdx;		SlotIndex DefIdx;
if (RM.OrigMI)
DefIdx = LIS.getInstructionIndex(*RM.OrigMI);
else {
DefIdx = RM.ParentVNI->def;
RM.OrigMI = LIS.getInstructionFromIndex(DefIdx);
assert(RM.OrigMI && "No defining instruction for remattable value");		assert(RM.OrigMI && "No defining instruction for remattable value");
}		DefIdx = LIS.getInstructionIndex(*(RM.OrigMI));

// If only cheap remats were requested, bail out early.		// If only cheap remats were requested, bail out early.
if (cheapAsAMove && !TII.isAsCheapAsAMove(RM.OrigMI))		if (cheapAsAMove && !TII.isAsCheapAsAMove(RM.OrigMI))
return false;		return false;

// Verify that all used registers are available with the same values.		// Verify that all used registers are available with the same values.
if (!allUsesAvailableAt(RM.OrigMI, DefIdx, UseIdx))		if (!allUsesAvailableAt(RM.OrigMI, DefIdx, UseIdx))
return false;		return false;
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	if (!MI->isSafeToMove(nullptr, SawStore)) {
return;		return;
}		}

DEBUG(dbgs() << "Deleting dead def " << Idx << '\t' << *MI);		DEBUG(dbgs() << "Deleting dead def " << Idx << '\t' << *MI);

// Collect virtual registers to be erased after MI is gone.		// Collect virtual registers to be erased after MI is gone.
SmallVector<unsigned, 8> RegsToErase;		SmallVector<unsigned, 8> RegsToErase;
bool ReadsPhysRegs = false;		bool ReadsPhysRegs = false;
		bool isOrigDef = false;
		unsigned Dest;
		if (VRM && MI->getOperand(0).isReg()) {
		Dest = MI->getOperand(0).getReg();
		unsigned Original = VRM->getOriginal(Dest);
		LiveInterval &OrigLI = LIS.getInterval(Original);
		VNInfo *OrigVNI = OrigLI.getVNInfoAt(Idx);
		isOrigDef = SlotIndex::isSameInstr(OrigVNI->def, Idx);
		}

// Check for live intervals that may shrink		// Check for live intervals that may shrink
for (MachineInstr::mop_iterator MOI = MI->operands_begin(),		for (MachineInstr::mop_iterator MOI = MI->operands_begin(),
MOE = MI->operands_end(); MOI != MOE; ++MOI) {		MOE = MI->operands_end(); MOI != MOE; ++MOI) {
if (!MOI->isReg())		if (!MOI->isReg())
continue;		continue;
unsigned Reg = MOI->getReg();		unsigned Reg = MOI->getReg();
if (!TargetRegisterInfo::isVirtualRegister(Reg)) {		if (!TargetRegisterInfo::isVirtualRegister(Reg)) {
Show All 37 Lines	if (ReadsPhysRegs) {
for (unsigned i = MI->getNumOperands(); i; --i) {		for (unsigned i = MI->getNumOperands(); i; --i) {
const MachineOperand &MO = MI->getOperand(i-1);		const MachineOperand &MO = MI->getOperand(i-1);
if (MO.isReg() && TargetRegisterInfo::isPhysicalRegister(MO.getReg()))		if (MO.isReg() && TargetRegisterInfo::isPhysicalRegister(MO.getReg()))
continue;		continue;
MI->RemoveOperand(i-1);		MI->RemoveOperand(i-1);
}		}
DEBUG(dbgs() << "Converted physregs to:\t" << *MI);		DEBUG(dbgs() << "Converted physregs to:\t" << *MI);
} else {		} else {
		// If the dest of MI is an original reg, don't delete the inst. Replace
		// the dest with a new reg, keep the inst for remat of other siblings.
		// The inst is saved in LiveRangeEdit::DeadRemats and will be deleted
		// after all the allocations of the func are done.
		if (isOrigDef) {
		unsigned NewDest = createFrom(Dest);
		pop_back();
		markDeadRemat(MI);
		const TargetRegisterInfo &TRI = *MRI.getTargetRegisterInfo();
		MI->substituteRegister(Dest, NewDest, 0, TRI);
		MI->getOperand(0).setIsDead(false);
		} else {
if (TheDelegate)		if (TheDelegate)
TheDelegate->LRE_WillEraseInstruction(MI);		TheDelegate->LRE_WillEraseInstruction(MI);
LIS.RemoveMachineInstrFromMaps(*MI);		LIS.RemoveMachineInstrFromMaps(*MI);
MI->eraseFromParent();		MI->eraseFromParent();
++NumDCEDeleted;		++NumDCEDeleted;
}		}
		}

// Erase any virtregs that are now empty and unused. There may be <undef>		// Erase any virtregs that are now empty and unused. There may be <undef>
// uses around. Keep the empty live range in that case.		// uses around. Keep the empty live range in that case.
for (unsigned i = 0, e = RegsToErase.size(); i != e; ++i) {		for (unsigned i = 0, e = RegsToErase.size(); i != e; ++i) {
unsigned Reg = RegsToErase[i];		unsigned Reg = RegsToErase[i];
if (LIS.hasInterval(Reg) && MRI.reg_nodbg_empty(Reg)) {		if (LIS.hasInterval(Reg) && MRI.reg_nodbg_empty(Reg)) {
ToShrink.remove(&LIS.getInterval(Reg));		ToShrink.remove(&LIS.getInterval(Reg));
eraseVirtReg(Reg);		eraseVirtReg(Reg);
}		}
}		}
}		}

void LiveRangeEdit::eliminateDeadDefs(SmallVectorImpl<MachineInstr*> &Dead,		void LiveRangeEdit::eliminateDeadDefs(SmallVectorImpl<MachineInstr *> &Dead,
ArrayRef<unsigned> RegsBeingSpilled) {		ArrayRef<unsigned> RegsBeingSpilled,
		bool NoSplit) {
ToShrinkSet ToShrink;		ToShrinkSet ToShrink;

for (;;) {		for (;;) {
// Erase all dead defs.		// Erase all dead defs.
while (!Dead.empty())		while (!Dead.empty())
eliminateDeadDef(Dead.pop_back_val(), ToShrink);		eliminateDeadDef(Dead.pop_back_val(), ToShrink);

if (ToShrink.empty())		if (ToShrink.empty())
break;		break;

// Shrink just one live interval. Then delete new dead defs.		// Shrink just one live interval. Then delete new dead defs.
LiveInterval *LI = ToShrink.back();		LiveInterval *LI = ToShrink.back();
ToShrink.pop_back();		ToShrink.pop_back();
if (foldAsLoad(LI, Dead))		if (foldAsLoad(LI, Dead))
continue;		continue;
unsigned VReg = LI->reg;		unsigned VReg = LI->reg;
if (TheDelegate)		if (TheDelegate)
TheDelegate->LRE_WillShrinkVirtReg(VReg);		TheDelegate->LRE_WillShrinkVirtReg(VReg);
if (!LIS.shrinkToUses(LI, &Dead))		if (!LIS.shrinkToUses(LI, &Dead))
continue;		continue;

		if (NoSplit)
		continue;

// Don't create new intervals for a register being spilled.		// Don't create new intervals for a register being spilled.
// The new intervals would have to be spilled anyway so its not worth it.		// The new intervals would have to be spilled anyway so its not worth it.
		qcolombetUnsubmitted Not Done Reply Inline Actions Some update problem I believe. qcolombet: Some update problem I believe.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
// Also they currently aren't spilled so creating them and not spilling		// Also they currently aren't spilled so creating them and not spilling
// them results in incorrect code.		// them results in incorrect code.
bool BeingSpilled = false;		bool BeingSpilled = false;
for (unsigned i = 0, e = RegsBeingSpilled.size(); i != e; ++i) {		for (unsigned i = 0, e = RegsBeingSpilled.size(); i != e; ++i) {
if (VReg == RegsBeingSpilled[i]) {		if (VReg == RegsBeingSpilled[i]) {
BeingSpilled = true;		BeingSpilled = true;
break;		break;
}		}
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

lib/CodeGen/RegAllocBase.h

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	protected:			protected:
	const TargetRegisterInfo *TRI;			const TargetRegisterInfo *TRI;
	MachineRegisterInfo *MRI;			MachineRegisterInfo *MRI;
	VirtRegMap *VRM;			VirtRegMap *VRM;
	LiveIntervals *LIS;			LiveIntervals *LIS;
	LiveRegMatrix *Matrix;			LiveRegMatrix *Matrix;
	RegisterClassInfo RegClassInfo;			RegisterClassInfo RegClassInfo;

				/// Inst which is a def of an original reg and whose defs are already all
				/// dead after remat is saved in DeadRemats. The deletion of such inst is
				/// postponed till all the allocations are done, so its remat expr is
				/// always available for the remat of all the siblings of the original reg.
				SmallPtrSet<MachineInstr *, 32> DeadRemats;

	RegAllocBase()			RegAllocBase()
	: TRI(nullptr), MRI(nullptr), VRM(nullptr), LIS(nullptr), Matrix(nullptr) {}			: TRI(nullptr), MRI(nullptr), VRM(nullptr), LIS(nullptr), Matrix(nullptr) {}

	virtual ~RegAllocBase() {}			virtual ~RegAllocBase() {}

	// A RegAlloc pass should call this before allocatePhysRegs.			// A RegAlloc pass should call this before allocatePhysRegs.
	void init(VirtRegMap &vrm, LiveIntervals &lis, LiveRegMatrix &mat);			void init(VirtRegMap &vrm, LiveIntervals &lis, LiveRegMatrix &mat);

	// The top-level driver. The output is a VirtRegMap that us updated with			// The top-level driver. The output is a VirtRegMap that us updated with
	// physical register assignments.			// physical register assignments.
	void allocatePhysRegs();			void allocatePhysRegs();

				// Remove dead defs because of rematerialization.
				void eliminateDeadRemats();

				qcolombetUnsubmitted Not Done Reply Inline Actions Add virtual keyword. Subclasses may want to do additional things. qcolombet: Add virtual keyword. Subclasses may want to do additional things.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
	// Get a temporary reference to a Spiller instance.			// Get a temporary reference to a Spiller instance.
	virtual Spiller &spiller() = 0;			virtual Spiller &spiller() = 0;

	/// enqueue - Add VirtReg to the priority queue of unassigned registers.			/// enqueue - Add VirtReg to the priority queue of unassigned registers.
	virtual void enqueue(LiveInterval *LI) = 0;			virtual void enqueue(LiveInterval *LI) = 0;

	/// dequeue - Return the next unassigned register, or NULL.			/// dequeue - Return the next unassigned register, or NULL.
	virtual LiveInterval *dequeue() = 0;			virtual LiveInterval *dequeue() = 0;
	Show All 25 Lines

lib/CodeGen/RegAllocBase.cpp

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	for (VirtRegVec::iterator I = SplitVRegs.begin(), E = SplitVRegs.end();
DEBUG(dbgs() << "queuing new interval: " << *SplitVirtReg << "\n");		DEBUG(dbgs() << "queuing new interval: " << *SplitVirtReg << "\n");
assert(TargetRegisterInfo::isVirtualRegister(SplitVirtReg->reg) &&		assert(TargetRegisterInfo::isVirtualRegister(SplitVirtReg->reg) &&
"expect split value in virtual register");		"expect split value in virtual register");
enqueue(SplitVirtReg);		enqueue(SplitVirtReg);
++NumNewQueued;		++NumNewQueued;
}		}
}		}
}		}

		void RegAllocBase::eliminateDeadRemats() {
		for (auto ent : DeadRemats) {
		LIS->RemoveMachineInstrFromMaps(*ent);
		qcolombetUnsubmitted Not Done Reply Inline Actions Capitale letter for the first letter of the variable name. qcolombet: Capitale letter for the first letter of the variable name.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		ent->eraseFromParent();
		}
		DeadRemats.clear();
		}

lib/CodeGen/RegAllocBasic.cpp

Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = Intfs.size(); i != e; ++i) {
if (!VRM->hasPhys(Spill.reg))		if (!VRM->hasPhys(Spill.reg))
continue;		continue;

// Deallocate the interfering vreg by removing it from the union.		// Deallocate the interfering vreg by removing it from the union.
// A LiveInterval instance may not be in a union during modification!		// A LiveInterval instance may not be in a union during modification!
Matrix->unassign(Spill);		Matrix->unassign(Spill);

// Spill the extracted interval.		// Spill the extracted interval.
LiveRangeEdit LRE(&Spill, SplitVRegs, MF, LIS, VRM);		LiveRangeEdit LRE(&Spill, SplitVRegs, &DeadRemats, MF, LIS, VRM);
spiller().spill(LRE);		spiller().spill(LRE);
}		}
return true;		return true;
}		}

// Driver for the register assignment and splitting heuristics.		// Driver for the register assignment and splitting heuristics.
// Manages iteration over the LiveIntervalUnions.		// Manages iteration over the LiveIntervalUnions.
//		//
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	for (SmallVectorImpl<unsigned>::iterator PhysRegI = PhysRegSpillCands.begin(),
// Tell the caller to allocate to this newly freed physical register.		// Tell the caller to allocate to this newly freed physical register.
return *PhysRegI;		return *PhysRegI;
}		}

// No other spill candidates were found, so spill the current VirtReg.		// No other spill candidates were found, so spill the current VirtReg.
DEBUG(dbgs() << "spilling: " << VirtReg << '\n');		DEBUG(dbgs() << "spilling: " << VirtReg << '\n');
if (!VirtReg.isSpillable())		if (!VirtReg.isSpillable())
return ~0u;		return ~0u;
LiveRangeEdit LRE(&VirtReg, SplitVRegs, MF, LIS, VRM);		LiveRangeEdit LRE(&VirtReg, SplitVRegs, &DeadRemats, MF, LIS, VRM);
spiller().spill(LRE);		spiller().spill(LRE);

// The live virtual register requesting allocation was spilled, so tell		// The live virtual register requesting allocation was spilled, so tell
// the caller not to allocate anything during this round.		// the caller not to allocate anything during this round.
return 0;		return 0;
}		}

bool RABasic::runOnMachineFunction(MachineFunction &mf) {		bool RABasic::runOnMachineFunction(MachineFunction &mf) {
DEBUG(dbgs() << "******** BASIC REGISTER ALLOCATION ********\n"		DEBUG(dbgs() << "******** BASIC REGISTER ALLOCATION ********\n"
<< "********** Function: "		<< "********** Function: "
<< mf.getName() << '\n');		<< mf.getName() << '\n');

MF = &mf;		MF = &mf;
RegAllocBase::init(getAnalysis<VirtRegMap>(),		RegAllocBase::init(getAnalysis<VirtRegMap>(),
getAnalysis<LiveIntervals>(),		getAnalysis<LiveIntervals>(),
getAnalysis<LiveRegMatrix>());		getAnalysis<LiveRegMatrix>());

calculateSpillWeightsAndHints(LIS, MF, VRM,		calculateSpillWeightsAndHints(LIS, MF, VRM,
getAnalysis<MachineLoopInfo>(),		getAnalysis<MachineLoopInfo>(),
getAnalysis<MachineBlockFrequencyInfo>());		getAnalysis<MachineBlockFrequencyInfo>());

SpillerInstance.reset(createInlineSpiller(this, MF, *VRM));		SpillerInstance.reset(createInlineSpiller(this, MF, *VRM));

allocatePhysRegs();		allocatePhysRegs();
		eliminateDeadRemats();

// Diagnostic output before rewriting		// Diagnostic output before rewriting
DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << *VRM << "\n");		DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << *VRM << "\n");

releaseMemory();		releaseMemory();
return true;		return true;
}		}

FunctionPass* llvm::createBasicRegisterAllocator()		FunctionPass* llvm::createBasicRegisterAllocator()
{		{
return new RABasic();		return new RABasic();
}		}

lib/CodeGen/RegAllocGreedy.cpp

//===-- RegAllocGreedy.cpp - greedy register allocator --------------------===//		//===-- RegAllocGreedy.cpp - greedy register allocator --------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the RAGreedy function pass for register allocation in		// This file defines the RAGreedy function pass for register allocation in
// optimized builds.		// optimized builds.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/Passes.h"
#include "AllocationOrder.h"		#include "AllocationOrder.h"
#include "InterferenceCache.h"		#include "InterferenceCache.h"
#include "LiveDebugVariables.h"		#include "LiveDebugVariables.h"
#include "RegAllocBase.h"		#include "RegAllocBase.h"
#include "SpillPlacement.h"		#include "SpillPlacement.h"
#include "Spiller.h"		#include "Spiller.h"
#include "SplitKit.h"		#include "SplitKit.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/CalcSpillWeights.h"		#include "llvm/CodeGen/CalcSpillWeights.h"
#include "llvm/CodeGen/EdgeBundles.h"		#include "llvm/CodeGen/EdgeBundles.h"
#include "llvm/CodeGen/LiveIntervalAnalysis.h"		#include "llvm/CodeGen/LiveIntervalAnalysis.h"
#include "llvm/CodeGen/LiveRangeEdit.h"		#include "llvm/CodeGen/LiveRangeEdit.h"
#include "llvm/CodeGen/LiveRegMatrix.h"		#include "llvm/CodeGen/LiveRegMatrix.h"
#include "llvm/CodeGen/LiveStackAnalysis.h"		#include "llvm/CodeGen/LiveStackAnalysis.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/RegAllocRegistry.h"		#include "llvm/CodeGen/RegAllocRegistry.h"
#include "llvm/CodeGen/RegisterClassInfo.h"		#include "llvm/CodeGen/RegisterClassInfo.h"
#include "llvm/CodeGen/VirtRegMap.h"		#include "llvm/CodeGen/VirtRegMap.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/PassAnalysisSupport.h"		#include "llvm/PassAnalysisSupport.h"
#include "llvm/Support/BranchProbability.h"		#include "llvm/Support/BranchProbability.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Timer.h"		#include "llvm/Support/Timer.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
#include <queue>		#include <queue>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "regalloc"		#define DEBUG_TYPE "regalloc"

STATISTIC(NumGlobalSplits, "Number of split global live ranges");		STATISTIC(NumGlobalSplits, "Number of split global live ranges");
STATISTIC(NumLocalSplits, "Number of split local live ranges");		STATISTIC(NumLocalSplits, "Number of split local live ranges");
STATISTIC(NumEvicted, "Number of interferences evicted");		STATISTIC(NumEvicted, "Number of interferences evicted");

static cl::opt<SplitEditor::ComplementSpillMode>		static cl::opt<SplitEditor::ComplementSpillMode> SplitSpillMode(
SplitSpillMode("split-spill-mode", cl::Hidden,		"split-spill-mode", cl::Hidden,
cl::desc("Spill mode for splitting live ranges"),		cl::desc("Spill mode for splitting live ranges"),
cl::values(clEnumValN(SplitEditor::SM_Partition, "default", "Default"),		cl::values(clEnumValN(SplitEditor::SM_Partition, "default", "Default"),
clEnumValN(SplitEditor::SM_Size, "size", "Optimize for size"),		clEnumValN(SplitEditor::SM_Size, "size", "Optimize for size"),
clEnumValN(SplitEditor::SM_Speed, "speed", "Optimize for speed"),		clEnumValN(SplitEditor::SM_Speed, "speed", "Optimize for speed"),
clEnumValEnd),		clEnumValEnd),
cl::init(SplitEditor::SM_Partition));		cl::init(SplitEditor::SM_Speed));

static cl::opt<unsigned>		static cl::opt<unsigned>
LastChanceRecoloringMaxDepth("lcr-max-depth", cl::Hidden,		LastChanceRecoloringMaxDepth("lcr-max-depth", cl::Hidden,
cl::desc("Last chance recoloring max depth"),		cl::desc("Last chance recoloring max depth"),
cl::init(5));		cl::init(5));

static cl::opt<unsigned> LastChanceRecoloringMaxInterference(		static cl::opt<unsigned> LastChanceRecoloringMaxInterference(
"lcr-max-interf", cl::Hidden,		"lcr-max-interf", cl::Hidden,
▲ Show 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	unsigned trySplit(LiveInterval&, AllocationOrder&,
SmallVectorImpl<unsigned>&);		SmallVectorImpl<unsigned>&);
unsigned tryLastChanceRecoloring(LiveInterval &, AllocationOrder &,		unsigned tryLastChanceRecoloring(LiveInterval &, AllocationOrder &,
SmallVectorImpl<unsigned> &,		SmallVectorImpl<unsigned> &,
SmallVirtRegSet &, unsigned);		SmallVirtRegSet &, unsigned);
bool tryRecoloringCandidates(PQueue &, SmallVectorImpl<unsigned> &,		bool tryRecoloringCandidates(PQueue &, SmallVectorImpl<unsigned> &,
SmallVirtRegSet &, unsigned);		SmallVirtRegSet &, unsigned);
void tryHintRecoloring(LiveInterval &);		void tryHintRecoloring(LiveInterval &);
void tryHintsRecoloring();		void tryHintsRecoloring();
		void postOptimization();
		qcolombetUnsubmitted Not Done Reply Inline Actions I would have put this into the base class. qcolombet: I would have put this into the base class.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.

/// Model the information carried by one end of a copy.		/// Model the information carried by one end of a copy.
struct HintInfo {		struct HintInfo {
/// The frequency of the copy.		/// The frequency of the copy.
BlockFrequency Freq;		BlockFrequency Freq;
/// The virtual register or physical register.		/// The virtual register or physical register.
unsigned Reg;		unsigned Reg;
/// Its currently assigned register.		/// Its currently assigned register.
▲ Show 20 Lines • Show All 1,052 Lines • ▼ Show 20 Lines	unsigned RAGreedy::calculateRegionSplitCost(LiveInterval &VirtReg,
return BestCand;		return BestCand;
}		}

unsigned RAGreedy::doRegionSplit(LiveInterval &VirtReg, unsigned BestCand,		unsigned RAGreedy::doRegionSplit(LiveInterval &VirtReg, unsigned BestCand,
bool HasCompact,		bool HasCompact,
SmallVectorImpl<unsigned> &NewVRegs) {		SmallVectorImpl<unsigned> &NewVRegs) {
SmallVector<unsigned, 8> UsedCands;		SmallVector<unsigned, 8> UsedCands;
// Prepare split editor.		// Prepare split editor.
LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LREdit(&VirtReg, NewVRegs, &DeadRemats, MF, LIS, VRM, this);
SE->reset(LREdit, SplitSpillMode);		SE->reset(LREdit, SplitSpillMode);

// Assign all edge bundles to the preferred candidate, or NoCand.		// Assign all edge bundles to the preferred candidate, or NoCand.
BundleCand.assign(Bundles->getNumBundles(), NoCand);		BundleCand.assign(Bundles->getNumBundles(), NoCand);

// Assign bundles for the best candidate region.		// Assign bundles for the best candidate region.
if (BestCand != NoCand) {		if (BestCand != NoCand) {
GlobalSplitCandidate &Cand = GlobalCand[BestCand];		GlobalSplitCandidate &Cand = GlobalCand[BestCand];
Show All 31 Lines
/// tryBlockSplit - Split a global live range around every block with uses. This		/// tryBlockSplit - Split a global live range around every block with uses. This
/// creates a lot of local live ranges, that will be split by tryLocalSplit if		/// creates a lot of local live ranges, that will be split by tryLocalSplit if
/// they don't allocate.		/// they don't allocate.
unsigned RAGreedy::tryBlockSplit(LiveInterval &VirtReg, AllocationOrder &Order,		unsigned RAGreedy::tryBlockSplit(LiveInterval &VirtReg, AllocationOrder &Order,
SmallVectorImpl<unsigned> &NewVRegs) {		SmallVectorImpl<unsigned> &NewVRegs) {
assert(&SA->getParent() == &VirtReg && "Live range wasn't analyzed");		assert(&SA->getParent() == &VirtReg && "Live range wasn't analyzed");
unsigned Reg = VirtReg.reg;		unsigned Reg = VirtReg.reg;
bool SingleInstrs = RegClassInfo.isProperSubClass(MRI->getRegClass(Reg));		bool SingleInstrs = RegClassInfo.isProperSubClass(MRI->getRegClass(Reg));
LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LREdit(&VirtReg, NewVRegs, &DeadRemats, MF, LIS, VRM, this);
SE->reset(LREdit, SplitSpillMode);		SE->reset(LREdit, SplitSpillMode);
ArrayRef<SplitAnalysis::BlockInfo> UseBlocks = SA->getUseBlocks();		ArrayRef<SplitAnalysis::BlockInfo> UseBlocks = SA->getUseBlocks();
for (unsigned i = 0; i != UseBlocks.size(); ++i) {		for (unsigned i = 0; i != UseBlocks.size(); ++i) {
const SplitAnalysis::BlockInfo &BI = UseBlocks[i];		const SplitAnalysis::BlockInfo &BI = UseBlocks[i];
if (SA->shouldSplitSingleBlock(BI, SingleInstrs))		if (SA->shouldSplitSingleBlock(BI, SingleInstrs))
SE->splitSingleBlock(BI);		SE->splitSingleBlock(BI);
}		}
// No blocks were split.		// No blocks were split.
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	RAGreedy::tryInstructionSplit(LiveInterval &VirtReg, AllocationOrder &Order,
SmallVectorImpl<unsigned> &NewVRegs) {		SmallVectorImpl<unsigned> &NewVRegs) {
const TargetRegisterClass *CurRC = MRI->getRegClass(VirtReg.reg);		const TargetRegisterClass *CurRC = MRI->getRegClass(VirtReg.reg);
// There is no point to this if there are no larger sub-classes.		// There is no point to this if there are no larger sub-classes.
if (!RegClassInfo.isProperSubClass(CurRC))		if (!RegClassInfo.isProperSubClass(CurRC))
return 0;		return 0;

// Always enable split spill mode, since we're effectively spilling to a		// Always enable split spill mode, since we're effectively spilling to a
// register.		// register.
LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LREdit(&VirtReg, NewVRegs, &DeadRemats, MF, LIS, VRM, this);
SE->reset(LREdit, SplitEditor::SM_Size);		SE->reset(LREdit, SplitEditor::SM_Size);

ArrayRef<SlotIndex> Uses = SA->getUseSlots();		ArrayRef<SlotIndex> Uses = SA->getUseSlots();
if (Uses.size() <= 1)		if (Uses.size() <= 1)
return 0;		return 0;

DEBUG(dbgs() << "Split around " << Uses.size() << " individual instrs.\n");		DEBUG(dbgs() << "Split around " << Uses.size() << " individual instrs.\n");

▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	unsigned RAGreedy::tryLocalSplit(LiveInterval &VirtReg, AllocationOrder &Order,
// Didn't find any candidates?		// Didn't find any candidates?
if (BestBefore == NumGaps)		if (BestBefore == NumGaps)
return 0;		return 0;

DEBUG(dbgs() << "Best local split range: " << Uses[BestBefore]		DEBUG(dbgs() << "Best local split range: " << Uses[BestBefore]
<< '-' << Uses[BestAfter] << ", " << BestDiff		<< '-' << Uses[BestAfter] << ", " << BestDiff
<< ", " << (BestAfter - BestBefore + 1) << " instrs\n");		<< ", " << (BestAfter - BestBefore + 1) << " instrs\n");

LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LREdit(&VirtReg, NewVRegs, &DeadRemats, MF, LIS, VRM, this);
SE->reset(LREdit);		SE->reset(LREdit);

SE->openIntv();		SE->openIntv();
SlotIndex SegStart = SE->enterIntvBefore(Uses[BestBefore]);		SlotIndex SegStart = SE->enterIntvBefore(Uses[BestBefore]);
SlotIndex SegStop = SE->leaveIntvAfter(Uses[BestAfter]);		SlotIndex SegStop = SE->leaveIntvAfter(Uses[BestAfter]);
SE->useIntv(SegStart, SegStop);		SE->useIntv(SegStart, SegStop);
SmallVector<unsigned, 8> IntvMap;		SmallVector<unsigned, 8> IntvMap;
SE->finish(&IntvMap);		SE->finish(&IntvMap);
▲ Show 20 Lines • Show All 626 Lines • ▼ Show 20 Lines	if (EnableDeferredSpilling && getStage(VirtReg) < RS_Memory) {
// the live range splitting done by spilling correctly.		// the live range splitting done by spilling correctly.
// We would need a deep integration with the spiller to do the		// We would need a deep integration with the spiller to do the
// right thing here. Anyway, that is still good for early testing.		// right thing here. Anyway, that is still good for early testing.
setStage(VirtReg, RS_Memory);		setStage(VirtReg, RS_Memory);
DEBUG(dbgs() << "Do as if this register is in memory\n");		DEBUG(dbgs() << "Do as if this register is in memory\n");
NewVRegs.push_back(VirtReg.reg);		NewVRegs.push_back(VirtReg.reg);
} else {		} else {
NamedRegionTimer T("Spiller", TimerGroupName, TimePassesIsEnabled);		NamedRegionTimer T("Spiller", TimerGroupName, TimePassesIsEnabled);
LiveRangeEdit LRE(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LRE(&VirtReg, NewVRegs, &DeadRemats, MF, LIS, VRM, this);
spiller().spill(LRE);		spiller().spill(LRE);
setStage(NewVRegs.begin(), NewVRegs.end(), RS_Done);		setStage(NewVRegs.begin(), NewVRegs.end(), RS_Done);

if (VerifyEnabled)		if (VerifyEnabled)
MF->verify(this, "After spilling");		MF->verify(this, "After spilling");
}		}

// The live virtual register requesting allocation was spilled, so tell		// The live virtual register requesting allocation was spilled, so tell
// the caller not to allocate anything during this round.		// the caller not to allocate anything during this round.
return 0;		return 0;
}		}

		void RAGreedy::postOptimization() {
		eliminateDeadRemats();
		startHoistSpiller(MF, VRM, *LIS, &spiller());
		qcolombetUnsubmitted Not Done Reply Inline Actions spiller().postOptimization() qcolombet: spiller().postOptimization()
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		}

bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {		bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {
DEBUG(dbgs() << "******** GREEDY REGISTER ALLOCATION ********\n"		DEBUG(dbgs() << "******** GREEDY REGISTER ALLOCATION ********\n"
<< "********** Function: " << mf.getName() << '\n');		<< "********** Function: " << mf.getName() << '\n');

MF = &mf;		MF = &mf;
TRI = MF->getSubtarget().getRegisterInfo();		TRI = MF->getSubtarget().getRegisterInfo();
TII = MF->getSubtarget().getInstrInfo();		TII = MF->getSubtarget().getInstrInfo();
RCI.runOnMachineFunction(mf);		RCI.runOnMachineFunction(mf);

EnableLocalReassign = EnableLocalReassignment \|\|		EnableLocalReassign = EnableLocalReassignment \|\|
MF->getSubtarget().enableRALocalReassignment(		MF->getSubtarget().enableRALocalReassignment(
MF->getTarget().getOptLevel());		MF->getTarget().getOptLevel());

if (VerifyEnabled)		if (VerifyEnabled)
MF->verify(this, "Before greedy register allocator");		MF->verify(this, "Before greedy register allocator");

RegAllocBase::init(getAnalysis<VirtRegMap>(),		RegAllocBase::init(getAnalysis<VirtRegMap>(),
getAnalysis<LiveIntervals>(),		getAnalysis<LiveIntervals>(),
getAnalysis<LiveRegMatrix>());		getAnalysis<LiveRegMatrix>());
Indexes = &getAnalysis<SlotIndexes>();		Indexes = &getAnalysis<SlotIndexes>();
MBFI = &getAnalysis<MachineBlockFrequencyInfo>();		MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
DomTree = &getAnalysis<MachineDominatorTree>();		DomTree = &getAnalysis<MachineDominatorTree>();
SpillerInstance.reset(createInlineSpiller(this, MF, *VRM));		SpillerInstance.reset(createInlineSpiller(this, MF, *VRM));
		createHoistSpiller(this, MF, *VRM, &spiller());
		qcolombetUnsubmitted Not Done Reply Inline Actions Should be created within the inline spiller. See my comment on createInlineSpiller. qcolombet: Should be created within the inline spiller. See my comment on createInlineSpiller.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
Loops = &getAnalysis<MachineLoopInfo>();		Loops = &getAnalysis<MachineLoopInfo>();
Bundles = &getAnalysis<EdgeBundles>();		Bundles = &getAnalysis<EdgeBundles>();
SpillPlacer = &getAnalysis<SpillPlacement>();		SpillPlacer = &getAnalysis<SpillPlacement>();
DebugVars = &getAnalysis<LiveDebugVariables>();		DebugVars = &getAnalysis<LiveDebugVariables>();

initializeCSRCost();		initializeCSRCost();

calculateSpillWeightsAndHints(LIS, mf, VRM, Loops, *MBFI);		calculateSpillWeightsAndHints(LIS, mf, VRM, Loops, *MBFI);

DEBUG(LIS->dump());		DEBUG(LIS->dump());

SA.reset(new SplitAnalysis(VRM, LIS, *Loops));		SA.reset(new SplitAnalysis(VRM, LIS, *Loops));
SE.reset(new SplitEditor(SA, LIS, VRM, DomTree, *MBFI));		SE.reset(new SplitEditor(SA, LIS, VRM, DomTree, *MBFI));
ExtraRegInfo.clear();		ExtraRegInfo.clear();
ExtraRegInfo.resize(MRI->getNumVirtRegs());		ExtraRegInfo.resize(MRI->getNumVirtRegs());
NextCascade = 1;		NextCascade = 1;
IntfCache.init(MF, Matrix->getLiveUnions(), Indexes, LIS, TRI);		IntfCache.init(MF, Matrix->getLiveUnions(), Indexes, LIS, TRI);
GlobalCand.resize(32); // This will grow as needed.		GlobalCand.resize(32); // This will grow as needed.
SetOfBrokenHints.clear();		SetOfBrokenHints.clear();

allocatePhysRegs();		allocatePhysRegs();
tryHintsRecoloring();		tryHintsRecoloring();
		postOptimization();

releaseMemory();		releaseMemory();
return true;		return true;
}		}

lib/CodeGen/RegAllocPBQP.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	private:
typedef std::pair<unsigned, unsigned> RegPair;		typedef std::pair<unsigned, unsigned> RegPair;
typedef std::map<RegPair, PBQP::PBQPNum> CoalesceMap;		typedef std::map<RegPair, PBQP::PBQPNum> CoalesceMap;
typedef std::set<unsigned> RegSet;		typedef std::set<unsigned> RegSet;

char *customPassID;		char *customPassID;

RegSet VRegsToAlloc, EmptyIntervalVRegs;		RegSet VRegsToAlloc, EmptyIntervalVRegs;

		/// Inst which is a def of an original reg and whose defs are already all
		/// dead after remat is saved in DeadRemats. The deletion of such inst is
		/// postponed till all the allocations are done, so its remat expr is
		/// always available for the remat of all the siblings of the original reg.
		SmallPtrSet<MachineInstr *, 32> DeadRemats;
		qcolombetUnsubmitted Not Done Reply Inline Actions This should be a separate patch. qcolombet: This should be a separate patch.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions InlineSpiller is shared by all kinds of register allocator, so the DeadRemats logic is also needed by PBQP. If I separate the part out, I need to fix the related unit test. wmi: InlineSpiller is shared by all kinds of register allocator, so the DeadRemats logic is also…

/// \brief Finds the initial set of vreg intervals to allocate.		/// \brief Finds the initial set of vreg intervals to allocate.
void findVRegIntervalsToAlloc(const MachineFunction &MF, LiveIntervals &LIS);		void findVRegIntervalsToAlloc(const MachineFunction &MF, LiveIntervals &LIS);

/// \brief Constructs an initial graph.		/// \brief Constructs an initial graph.
void initializeGraph(PBQPRAGraph &G, VirtRegMap &VRM, Spiller &VRegSpiller);		void initializeGraph(PBQPRAGraph &G, VirtRegMap &VRM, Spiller &VRegSpiller);

/// \brief Spill the given VReg.		/// \brief Spill the given VReg.
void spillVReg(unsigned VReg, SmallVectorImpl<unsigned> &NewIntervals,		void spillVReg(unsigned VReg, SmallVectorImpl<unsigned> &NewIntervals,
MachineFunction &MF, LiveIntervals &LIS, VirtRegMap &VRM,		MachineFunction &MF, LiveIntervals &LIS, VirtRegMap &VRM,
Spiller &VRegSpiller);		Spiller &VRegSpiller);

/// \brief Given a solved PBQP problem maps this solution back to a register		/// \brief Given a solved PBQP problem maps this solution back to a register
/// assignment.		/// assignment.
bool mapPBQPToRegAlloc(const PBQPRAGraph &G,		bool mapPBQPToRegAlloc(const PBQPRAGraph &G,
const PBQP::Solution &Solution,		const PBQP::Solution &Solution,
VirtRegMap &VRM,		VirtRegMap &VRM,
Spiller &VRegSpiller);		Spiller &VRegSpiller);

/// \brief Postprocessing before final spilling. Sets basic block "live in"		/// \brief Postprocessing before final spilling. Sets basic block "live in"
/// variables.		/// variables.
void finalizeAlloc(MachineFunction &MF, LiveIntervals &LIS,		void finalizeAlloc(MachineFunction &MF, LiveIntervals &LIS,
VirtRegMap &VRM) const;		VirtRegMap &VRM) const;

		/// Remove dead defs because of rematerialization.
		void eliminateDeadRemats(LiveIntervals &LIS);
		qcolombetUnsubmitted Not Done Reply Inline Actions Ditto. qcolombet: Ditto.
};		};

char RegAllocPBQP::ID = 0;		char RegAllocPBQP::ID = 0;

/// @brief Set spill costs for each node in the PBQP reg-alloc graph.		/// @brief Set spill costs for each node in the PBQP reg-alloc graph.
class SpillCosts : public PBQPRAConstraint {		class SpillCosts : public PBQPRAConstraint {
public:		public:
void apply(PBQPRAGraph &G) override {		void apply(PBQPRAGraph &G) override {
▲ Show 20 Lines • Show All 469 Lines • ▼ Show 20 Lines
}		}

void RegAllocPBQP::spillVReg(unsigned VReg,		void RegAllocPBQP::spillVReg(unsigned VReg,
SmallVectorImpl<unsigned> &NewIntervals,		SmallVectorImpl<unsigned> &NewIntervals,
MachineFunction &MF, LiveIntervals &LIS,		MachineFunction &MF, LiveIntervals &LIS,
VirtRegMap &VRM, Spiller &VRegSpiller) {		VirtRegMap &VRM, Spiller &VRegSpiller) {

VRegsToAlloc.erase(VReg);		VRegsToAlloc.erase(VReg);
LiveRangeEdit LRE(&LIS.getInterval(VReg), NewIntervals, MF, LIS, &VRM);		LiveRangeEdit LRE(&LIS.getInterval(VReg), NewIntervals, &DeadRemats, MF, LIS,
		&VRM);
VRegSpiller.spill(LRE);		VRegSpiller.spill(LRE);

const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
(void)TRI;		(void)TRI;
DEBUG(dbgs() << "VREG " << PrintReg(VReg, &TRI) << " -> SPILLED (Cost: "		DEBUG(dbgs() << "VREG " << PrintReg(VReg, &TRI) << " -> SPILLED (Cost: "
<< LRE.getParent().weight << ", New vregs: ");		<< LRE.getParent().weight << ", New vregs: ");

// Copy any newly inserted live intervals into the list of regs to		// Copy any newly inserted live intervals into the list of regs to
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	if (PReg == 0) {
const TargetRegisterClass &RC = *MRI.getRegClass(LI.reg);		const TargetRegisterClass &RC = *MRI.getRegClass(LI.reg);
PReg = RC.getRawAllocationOrder(MF).front();		PReg = RC.getRawAllocationOrder(MF).front();
}		}

VRM.assignVirt2Phys(LI.reg, PReg);		VRM.assignVirt2Phys(LI.reg, PReg);
}		}
}		}

		void RegAllocPBQP::eliminateDeadRemats(LiveIntervals &LIS) {
		for (auto ent : DeadRemats) {
		LIS.RemoveMachineInstrFromMaps(*ent);
		ent->eraseFromParent();
		qcolombetUnsubmitted Not Done Reply Inline Actions Variables start with a capital letter. qcolombet: Variables start with a capital letter.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		}
		DeadRemats.clear();
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions We shouldn’t have to touch this. qcolombet: We shouldn’t have to touch this.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions My original comment that DeadRemats is non-empty only when the regalloc is Greedy is wrong. Actually, InlineSpiller and related Remat logic are shared by all register allocators. And RegAllocPBQP is not a subclass of RegAllocBase, so the code is needed. wmi: My original comment that DeadRemats is non-empty only when the regalloc is Greedy is wrong.

static inline float normalizePBQPSpillWeight(float UseDefFreq, unsigned Size,		static inline float normalizePBQPSpillWeight(float UseDefFreq, unsigned Size,
unsigned NumInstr) {		unsigned NumInstr) {
// All intervals have a spill weight that is mostly proportional to the number		// All intervals have a spill weight that is mostly proportional to the number
// of uses, with uses in loops having a bigger weight.		// of uses, with uses in loops having a bigger weight.
return NumInstr * normalizeSpillWeight(UseDefFreq, Size, 1);		return NumInstr * normalizeSpillWeight(UseDefFreq, Size, 1);
}		}

bool RegAllocPBQP::runOnMachineFunction(MachineFunction &MF) {		bool RegAllocPBQP::runOnMachineFunction(MachineFunction &MF) {
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	#endif
PBQP::Solution Solution = PBQP::RegAlloc::solve(G);		PBQP::Solution Solution = PBQP::RegAlloc::solve(G);
PBQPAllocComplete = mapPBQPToRegAlloc(G, Solution, VRM, *VRegSpiller);		PBQPAllocComplete = mapPBQPToRegAlloc(G, Solution, VRM, *VRegSpiller);
++Round;		++Round;
}		}
}		}

// Finalise allocation, allocate empty ranges.		// Finalise allocation, allocate empty ranges.
finalizeAlloc(MF, LIS, VRM);		finalizeAlloc(MF, LIS, VRM);
		eliminateDeadRemats(LIS);
VRegsToAlloc.clear();		VRegsToAlloc.clear();
EmptyIntervalVRegs.clear();		EmptyIntervalVRegs.clear();

DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << VRM << "\n");		DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << VRM << "\n");

return true;		return true;
}		}

▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

lib/CodeGen/RegisterCoalescer.cpp

Show First 20 Lines • Show All 453 Lines • ▼ Show 20 Lines	void RegisterCoalescer::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addRequired<MachineLoopInfo>();		AU.addRequired<MachineLoopInfo>();
AU.addPreserved<MachineLoopInfo>();		AU.addPreserved<MachineLoopInfo>();
AU.addPreservedID(MachineDominatorsID);		AU.addPreservedID(MachineDominatorsID);
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

void RegisterCoalescer::eliminateDeadDefs() {		void RegisterCoalescer::eliminateDeadDefs() {
SmallVector<unsigned, 8> NewRegs;		SmallVector<unsigned, 8> NewRegs;
LiveRangeEdit(nullptr, NewRegs, MF, LIS,		LiveRangeEdit(nullptr, NewRegs, nullptr, MF, LIS, nullptr, this)
nullptr, this).eliminateDeadDefs(DeadDefs);		.eliminateDeadDefs(DeadDefs);
		qcolombetUnsubmitted Not Done Reply Inline Actions We shouldn’t have to touch this. qcolombet: We shouldn’t have to touch this.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
}		}

void RegisterCoalescer::LRE_WillEraseInstruction(MachineInstr *MI) {		void RegisterCoalescer::LRE_WillEraseInstruction(MachineInstr *MI) {
// MI may be in WorkList. Make sure we don't visit it.		// MI may be in WorkList. Make sure we don't visit it.
ErasedInstrs.insert(MI);		ErasedInstrs.insert(MI);
}		}

bool RegisterCoalescer::adjustCopiesBackFrom(const CoalescerPair &CP,		bool RegisterCoalescer::adjustCopiesBackFrom(const CoalescerPair &CP,
▲ Show 20 Lines • Show All 2,581 Lines • Show Last 20 Lines

lib/CodeGen/Spiller.h

	Show All 10 Lines
	#define LLVM_LIB_CODEGEN_SPILLER_H			#define LLVM_LIB_CODEGEN_SPILLER_H

	namespace llvm {			namespace llvm {

	class LiveRangeEdit;			class LiveRangeEdit;
	class MachineFunction;			class MachineFunction;
	class MachineFunctionPass;			class MachineFunctionPass;
	class VirtRegMap;			class VirtRegMap;
				class LiveIntervals;

	/// Spiller interface.			/// Spiller interface.
	///			///
	/// Implementations are utility classes which insert spill or remat code on			/// Implementations are utility classes which insert spill or remat code on
	/// demand.			/// demand.
	class Spiller {			class Spiller {
	virtual void anchor();			virtual void anchor();
	public:			public:
	virtual ~Spiller() = 0;			virtual ~Spiller() = 0;

	/// spill - Spill the LRE.getParent() live interval.			/// spill - Spill the LRE.getParent() live interval.
	virtual void spill(LiveRangeEdit &LRE) = 0;			virtual void spill(LiveRangeEdit &LRE) = 0;

	};			};
				qcolombetUnsubmitted Not Done Reply Inline Actions Other spillers out-of-tree may exist and there is little interest in having them to implement a post optimization method if they do not need it. In other words, instead of a pure virtual method, do nothing for the default implementation. qcolombet: Other spillers out-of-tree may exist and there is little interest in having them to implement a…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Make sense. Fixed. wmi: Make sense. Fixed.

	/// Create and return a spiller that will insert spill code directly instead			/// Create and return a spiller that will insert spill code directly instead
				qcolombetUnsubmitted Done Reply Inline Actions Call this method postOptimization and make it a non-abstract method. We do not want the spillers existing out of tree to have to add a default implementation whereas they do not need to do anything. qcolombet: Call this method postOptimization and make it a non-abstract method. We do not want the…
	/// of deferring though VirtRegMap.			/// of deferring though VirtRegMap.
	Spiller *createInlineSpiller(MachineFunctionPass &pass,			Spiller *createInlineSpiller(MachineFunctionPass &pass,
	MachineFunction &mf,			MachineFunction &mf,
	VirtRegMap &vrm);			VirtRegMap &vrm);
				qcolombetUnsubmitted Not Done Reply Inline Actions add a bool here that default to false for using a post optimization. qcolombet: add a bool here that default to false for using a post optimization.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I don't get the intention to add a bool here. Is it used to guard post optimization? why it is needed? wmi: I don't get the intention to add a bool here. Is it used to guard post optimization? why it is…

				void createHoistSpiller(MachineFunctionPass &pass, MachineFunction &mf,
				VirtRegMap &vrm, Spiller *);

				/// startHoistSpiller - create a HoistSpiller object and start to hoist
				/// Spills.
				void startHoistSpiller(MachineFunction &mf, VirtRegMap &vrm,
				LiveIntervals &lis, Spiller *);
				qcolombetUnsubmitted Not Done Reply Inline Actions I was thinking in case we want to test without the post-optimization. But I am fine if it is always enabled. qcolombet: I was thinking in case we want to test without the post-optimization. But I am fine if it is…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok, I leave it there for now. wmi: Ok, I leave it there for now.
				qcolombetUnsubmitted Not Done Reply Inline Actions Get rid of those. qcolombet: Get rid of those.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
	}			}

	#endif			#endif

lib/CodeGen/SplitKit.h

Show All 12 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_CODEGEN_SPLITKIT_H		#ifndef LLVM_LIB_CODEGEN_SPLITKIT_H
#define LLVM_LIB_CODEGEN_SPLITKIT_H		#define LLVM_LIB_CODEGEN_SPLITKIT_H

#include "LiveRangeCalc.h"		#include "LiveRangeCalc.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
		#include "llvm/ADT/DenseSet.h"
#include "llvm/ADT/IntervalMap.h"		#include "llvm/ADT/IntervalMap.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"

namespace llvm {		namespace llvm {

class ConnectedVNInfoEqClasses;		class ConnectedVNInfoEqClasses;
class LiveInterval;		class LiveInterval;
class LiveIntervals;		class LiveIntervals;
▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	private:
/// in the vector in the complement interval.		/// in the vector in the complement interval.
void removeBackCopies(SmallVectorImpl<VNInfo*> &Copies);		void removeBackCopies(SmallVectorImpl<VNInfo*> &Copies);

/// getShallowDominator - Returns the least busy dominator of MBB that is		/// getShallowDominator - Returns the least busy dominator of MBB that is
/// also dominated by DefMBB. Busy is measured by loop depth.		/// also dominated by DefMBB. Busy is measured by loop depth.
MachineBasicBlock findShallowDominator(MachineBasicBlock MBB,		MachineBasicBlock findShallowDominator(MachineBasicBlock MBB,
MachineBasicBlock *DefMBB);		MachineBasicBlock *DefMBB);

/// hoistCopiesForSize - Hoist back-copies to the complement interval in a		/// removeRedundentCopies - Remove redundent back-copies if it has been
		qcolombetUnsubmitted Not Done Reply Inline Actions Don’t repeat the name of the method. qcolombet: Don’t repeat the name of the method.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
/// way that minimizes code size. This implements the SM_Size spill mode.		/// decided those back-copies will not be hoisted.
void hoistCopiesForSize();		void removeRedundentCopies(DenseSet<unsigned> &NotToHoistSet,
		qcolombetUnsubmitted Not Done Reply Inline Actions Given how this is used, the actual name of this method should be computeRedundentBackCopies. qcolombet: Given how this is used, the actual name of this method should be computeRedundentBackCopies.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		qcolombetUnsubmitted Not Done Reply Inline Actions Typo: redundant qcolombet: Typo: redundant
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed here and many other places. wmi: Fixed here and many other places.
		SmallVectorImpl<VNInfo *> &BackCopies);

		/// hoistCopies - Hoist back-copies to the complement interval.
		qcolombetUnsubmitted Not Done Reply Inline Actions Ditto. qcolombet: Ditto.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		void hoistCopies();

/// transferValues - Transfer values to the new ranges.		/// transferValues - Transfer values to the new ranges.
/// Return true if any ranges were skipped.		/// Return true if any ranges were skipped.
bool transferValues();		bool transferValues();

/// extendPHIKillRanges - Extend the ranges of all values killed by original		/// extendPHIKillRanges - Extend the ranges of all values killed by original
/// parent PHIDefs.		/// parent PHIDefs.
void extendPHIKillRanges();		void extendPHIKillRanges();
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

lib/CodeGen/SplitKit.cpp

Show All 10 Lines
// live range splitting.		// live range splitting.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "SplitKit.h"		#include "SplitKit.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/LiveIntervalAnalysis.h"		#include "llvm/CodeGen/LiveIntervalAnalysis.h"
#include "llvm/CodeGen/LiveRangeEdit.h"		#include "llvm/CodeGen/LiveRangeEdit.h"
		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/VirtRegMap.h"		#include "llvm/CodeGen/VirtRegMap.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	VNInfo *SplitEditor::defFromParent(unsigned RegIdx,
SlotIndex Def;		SlotIndex Def;
LiveInterval *LI = &LIS.getInterval(Edit->get(RegIdx));		LiveInterval *LI = &LIS.getInterval(Edit->get(RegIdx));

// We may be trying to avoid interference that ends at a deleted instruction,		// We may be trying to avoid interference that ends at a deleted instruction,
// so always begin RegIdx 0 early and all others late.		// so always begin RegIdx 0 early and all others late.
bool Late = RegIdx != 0;		bool Late = RegIdx != 0;

// Attempt cheap-as-a-copy rematerialization.		// Attempt cheap-as-a-copy rematerialization.
LiveRangeEdit::Remat RM(ParentVNI);		unsigned Original = VRM.getOriginal(Edit->get(RegIdx));
		LiveInterval &OrigLI = LIS.getInterval(Original);
		VNInfo *OrigVNI = OrigLI.getVNInfoAt(UseIdx);
		LiveRangeEdit::Remat RM(ParentVNI, OrigVNI);
		RM.OrigMI = LIS.getInstructionFromIndex(OrigVNI->def);

if (Edit->canRematerializeAt(RM, UseIdx, true)) {		if (Edit->canRematerializeAt(RM, UseIdx, true)) {
Def = Edit->rematerializeAt(MBB, I, LI->reg, RM, TRI, Late);		Def = Edit->rematerializeAt(MBB, I, LI->reg, RM, TRI, Late);
++NumRemats;		++NumRemats;
} else {		} else {
// Can't remat, just insert a copy from parent.		// Can't remat, just insert a copy from parent.
CopyMI = BuildMI(MBB, I, DebugLoc(), TII.get(TargetOpcode::COPY), LI->reg)		CopyMI = BuildMI(MBB, I, DebugLoc(), TII.get(TargetOpcode::COPY), LI->reg)
.addReg(Edit->getReg());		.addReg(Edit->getReg());
Def = LIS.getSlotIndexes()		Def = LIS.getSlotIndexes()
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	for (;;) {
// Too far up the dominator tree?		// Too far up the dominator tree?
if (!IDom \|\| !MDT.dominates(DefDomNode, IDom))		if (!IDom \|\| !MDT.dominates(DefDomNode, IDom))
return BestMBB;		return BestMBB;

MBB = IDom->getBlock();		MBB = IDom->getBlock();
}		}
}		}

void SplitEditor::hoistCopiesForSize() {		/// Remove redundent backcopies if the backcopies for the same ParentVNI cannot
		/// be hoisted because of too much cost.
		void SplitEditor::removeRedundentCopies(DenseSet<unsigned> &NotToHoistSet,
		qcolombetUnsubmitted Not Done Reply Inline Actions Please don’t repeat the comment from the declaration. qcolombet: Please don’t repeat the comment from the declaration.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Comment removed. wmi: Comment removed.
		SmallVectorImpl<VNInfo *> &BackCopies) {
		LiveInterval *LI = &LIS.getInterval(Edit->get(0));
		LiveInterval *Parent = &Edit->getParent();
		SmallVector<SmallPtrSet<VNInfo *, 8>, 8> EqualVNs(Parent->getNumValNums());
		SmallPtrSet<VNInfo *, 8> DominatedVNIs;

		// Aggregate VNIs having the same value as ParentVNI.
		for (VNInfo *VNI : LI->valnos) {
		if (VNI->isUnused())
		continue;
		VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(VNI->def);
		EqualVNs[ParentVNI->id].insert(VNI);
		}

		// For VNI aggregation of each ParentVNI, collect dominated, .i.e,
		qcolombetUnsubmitted Not Done Reply Inline Actions i.e. qcolombet: i.e.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		// redundent VNIs to BackCopies.
		for (unsigned i = 0, e = Parent->getNumValNums(); i != e; ++i) {
		VNInfo *ParentVNI = Parent->getValNumInfo(i);
		if (!NotToHoistSet.count(ParentVNI->id))
		continue;
		for (auto Ent1 : EqualVNs[ParentVNI->id]) {
		for (auto Ent2 : EqualVNs[ParentVNI->id]) {
		qcolombetUnsubmitted Not Done Reply Inline Actions We should we start iterating with the next iterator instead of starting over. The next call to count should early continue the loop but still! qcolombet: We should we start iterating with the next iterator instead of starting over. The next call to…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		if (Ent1 == Ent2 \|\| DominatedVNIs.count(Ent1) \|\|
		DominatedVNIs.count(Ent2))
		continue;

		MachineBasicBlock *MBB1 = LIS.getMBBFromIndex(Ent1->def);
		MachineBasicBlock *MBB2 = LIS.getMBBFromIndex(Ent2->def);
		if (MBB1 == MBB2) {
		DominatedVNIs.insert(Ent1->def < Ent2->def ? Ent2 : Ent1);
		} else if (MDT.dominates(MBB1, MBB2)) {
		DominatedVNIs.insert(Ent2);
		} else if (MDT.dominates(MBB2, MBB1)) {
		DominatedVNIs.insert(Ent1);
		}
		}
		}
		if (!DominatedVNIs.empty()) {
		forceRecompute(0, ParentVNI);
		for (auto Ent : DominatedVNIs) {
		BackCopies.push_back(Ent);
		}
		DominatedVNIs.clear();
		}
		}
		}

		/// For SM_Size mode, find a common dominator for all the back-copies for
		/// the same ParentVNI and hoist the backcopies to the dominator BB.
		/// For SM_Speed mode, if the common dominator is hot and it is not beneficial
		/// to do the hoisting, simply remove the dominated backcopies for the same
		/// ParentVNI.
		void SplitEditor::hoistCopies() {
// Get the complement interval, always RegIdx 0.		// Get the complement interval, always RegIdx 0.
LiveInterval *LI = &LIS.getInterval(Edit->get(0));		LiveInterval *LI = &LIS.getInterval(Edit->get(0));
LiveInterval *Parent = &Edit->getParent();		LiveInterval *Parent = &Edit->getParent();

// Track the nearest common dominator for all back-copies for each ParentVNI,		// Track the nearest common dominator for all back-copies for each ParentVNI,
// indexed by ParentVNI->id.		// indexed by ParentVNI->id.
typedef std::pair<MachineBasicBlock*, SlotIndex> DomPair;		typedef std::pair<MachineBasicBlock*, SlotIndex> DomPair;
SmallVector<DomPair, 8> NearestDom(Parent->getNumValNums());		SmallVector<DomPair, 8> NearestDom(Parent->getNumValNums());
		// The total cost of all the back-copies for each ParentVNI.
		SmallVector<BlockFrequency, 8> Costs(Parent->getNumValNums());
		// The ParentVNI->id set for which hoisting back-copies are not beneficial
		// for Speed.
		DenseSet<unsigned> NotToHoistSet;

// Find the nearest common dominator for parent values with multiple		// Find the nearest common dominator for parent values with multiple
// back-copies. If a single back-copy dominates, put it in DomPair.second.		// back-copies. If a single back-copy dominates, put it in DomPair.second.
for (VNInfo *VNI : LI->valnos) {		for (VNInfo *VNI : LI->valnos) {
if (VNI->isUnused())		if (VNI->isUnused())
continue;		continue;
VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(VNI->def);		VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(VNI->def);
assert(ParentVNI && "Parent not live at complement def");		assert(ParentVNI && "Parent not live at complement def");

// Don't hoist remats. The complement is probably going to disappear		// Don't hoist remats. The complement is probably going to disappear
// completely anyway.		// completely anyway.
if (Edit->didRematerialize(ParentVNI))		if (Edit->didRematerialize(ParentVNI))
continue;		continue;

MachineBasicBlock *ValMBB = LIS.getMBBFromIndex(VNI->def);		MachineBasicBlock *ValMBB = LIS.getMBBFromIndex(VNI->def);

DomPair &Dom = NearestDom[ParentVNI->id];		DomPair &Dom = NearestDom[ParentVNI->id];

// Keep directly defined parent values. This is either a PHI or an		// Keep directly defined parent values. This is either a PHI or an
// instruction in the complement range. All other copies of ParentVNI		// instruction in the complement range. All other copies of ParentVNI
// should be eliminated.		// should be eliminated.
if (VNI->def == ParentVNI->def) {		if (VNI->def == ParentVNI->def) {
DEBUG(dbgs() << "Direct complement def at " << VNI->def << '\n');		DEBUG(dbgs() << "Direct complement def at " << VNI->def << '\n');
Dom = DomPair(ValMBB, VNI->def);		Dom = DomPair(ValMBB, VNI->def);
Show All 18 Lines	if (!Dom.first) {
MachineBasicBlock *Near =		MachineBasicBlock *Near =
MDT.findNearestCommonDominator(Dom.first, ValMBB);		MDT.findNearestCommonDominator(Dom.first, ValMBB);
if (Near == ValMBB)		if (Near == ValMBB)
// Def ValMBB dominates.		// Def ValMBB dominates.
Dom = DomPair(ValMBB, VNI->def);		Dom = DomPair(ValMBB, VNI->def);
else if (Near != Dom.first)		else if (Near != Dom.first)
// None dominate. Hoist to common dominator, need new def.		// None dominate. Hoist to common dominator, need new def.
Dom = DomPair(Near, SlotIndex());		Dom = DomPair(Near, SlotIndex());
		Costs[ParentVNI->id] += MBFI.getBlockFreq(ValMBB);
}		}

DEBUG(dbgs() << "Multi-mapped complement " << VNI->id << '@' << VNI->def		DEBUG(dbgs() << "Multi-mapped complement " << VNI->id << '@' << VNI->def
<< " for parent " << ParentVNI->id << '@' << ParentVNI->def		<< " for parent " << ParentVNI->id << '@' << ParentVNI->def
<< " hoist to BB#" << Dom.first->getNumber() << ' '		<< " hoist to BB#" << Dom.first->getNumber() << ' '
<< Dom.second << '\n');		<< Dom.second << '\n');
}		}

// Insert the hoisted copies.		// Insert the hoisted copies.
for (unsigned i = 0, e = Parent->getNumValNums(); i != e; ++i) {		for (unsigned i = 0, e = Parent->getNumValNums(); i != e; ++i) {
DomPair &Dom = NearestDom[i];		DomPair &Dom = NearestDom[i];
if (!Dom.first \|\| Dom.second.isValid())		if (!Dom.first \|\| Dom.second.isValid())
continue;		continue;
// This value needs a hoisted copy inserted at the end of Dom.first.		// This value needs a hoisted copy inserted at the end of Dom.first.
VNInfo *ParentVNI = Parent->getValNumInfo(i);		VNInfo *ParentVNI = Parent->getValNumInfo(i);
MachineBasicBlock *DefMBB = LIS.getMBBFromIndex(ParentVNI->def);		MachineBasicBlock *DefMBB = LIS.getMBBFromIndex(ParentVNI->def);
// Get a less loopy dominator than Dom.first.		// Get a less loopy dominator than Dom.first.
Dom.first = findShallowDominator(Dom.first, DefMBB);		Dom.first = findShallowDominator(Dom.first, DefMBB);
		if (SpillMode == SM_Speed &&
		MBFI.getBlockFreq(Dom.first) > Costs[ParentVNI->id]) {
		NotToHoistSet.insert(ParentVNI->id);
		continue;
		}
SlotIndex Last = LIS.getMBBEndIdx(Dom.first).getPrevSlot();		SlotIndex Last = LIS.getMBBEndIdx(Dom.first).getPrevSlot();
Dom.second =		Dom.second =
defFromParent(0, ParentVNI, Last, *Dom.first,		defFromParent(0, ParentVNI, Last, *Dom.first,
SA.getLastSplitPointIter(Dom.first))->def;		SA.getLastSplitPointIter(Dom.first))->def;
}		}

// Remove redundant back-copies that are now known to be dominated by another		// Remove redundant back-copies that are now known to be dominated by another
// def with the same value.		// def with the same value.
SmallVector<VNInfo*, 8> BackCopies;		SmallVector<VNInfo*, 8> BackCopies;
for (VNInfo *VNI : LI->valnos) {		for (VNInfo *VNI : LI->valnos) {
if (VNI->isUnused())		if (VNI->isUnused())
continue;		continue;
VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(VNI->def);		VNInfo *ParentVNI = Edit->getParent().getVNInfoAt(VNI->def);
const DomPair &Dom = NearestDom[ParentVNI->id];		const DomPair &Dom = NearestDom[ParentVNI->id];
if (!Dom.first \|\| Dom.second == VNI->def)		if (!Dom.first \|\| Dom.second == VNI->def \|\|
		NotToHoistSet.count(ParentVNI->id))
continue;		continue;
BackCopies.push_back(VNI);		BackCopies.push_back(VNI);
forceRecompute(0, ParentVNI);		forceRecompute(0, ParentVNI);
}		}

		// If it is not beneficial to hoist all the BackCopies, simply remove
		// redundent BackCopies in speed mode.
		if (SpillMode == SM_Speed && !NotToHoistSet.empty())
		removeRedundentCopies(NotToHoistSet, BackCopies);

removeBackCopies(BackCopies);		removeBackCopies(BackCopies);
}		}


/// transferValues - Transfer all possible values to the new live ranges.		/// transferValues - Transfer all possible values to the new live ranges.
/// Values that were rematerialized are left alone, they need LRCalc.extend().		/// Values that were rematerialized are left alone, they need LRCalc.extend().
bool SplitEditor::transferValues() {		bool SplitEditor::transferValues() {
bool Skipped = false;		bool Skipped = false;
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
void SplitEditor::deleteRematVictims() {		void SplitEditor::deleteRematVictims() {
SmallVector<MachineInstr*, 8> Dead;		SmallVector<MachineInstr*, 8> Dead;
for (LiveRangeEdit::iterator I = Edit->begin(), E = Edit->end(); I != E; ++I){		for (LiveRangeEdit::iterator I = Edit->begin(), E = Edit->end(); I != E; ++I){
LiveInterval LI = &LIS.getInterval(I);		LiveInterval LI = &LIS.getInterval(I);
for (const LiveRange::Segment &S : LI->segments) {		for (const LiveRange::Segment &S : LI->segments) {
// Dead defs end at the dead slot.		// Dead defs end at the dead slot.
if (S.end != S.valno->def.getDeadSlot())		if (S.end != S.valno->def.getDeadSlot())
continue;		continue;
		if (S.valno->isPHIDef())
		continue;
MachineInstr *MI = LIS.getInstructionFromIndex(S.valno->def);		MachineInstr *MI = LIS.getInstructionFromIndex(S.valno->def);
assert(MI && "Missing instruction for dead def");		assert(MI && "Missing instruction for dead def");
MI->addRegisterDead(LI->reg, &TRI);		MI->addRegisterDead(LI->reg, &TRI);

if (!MI->allDefsAreDead())		if (!MI->allDefsAreDead())
continue;		continue;

DEBUG(dbgs() << "All defs dead: " << *MI);		DEBUG(dbgs() << "All defs dead: " << *MI);
Show All 28 Lines	void SplitEditor::finish(SmallVectorImpl<unsigned> *LRMap) {
}		}

// Hoist back-copies to the complement interval when in spill mode.		// Hoist back-copies to the complement interval when in spill mode.
switch (SpillMode) {		switch (SpillMode) {
case SM_Partition:		case SM_Partition:
// Leave all back-copies as is.		// Leave all back-copies as is.
break;		break;
case SM_Size:		case SM_Size:
hoistCopiesForSize();
break;
case SM_Speed:		case SM_Speed:
llvm_unreachable("Spill mode 'speed' not implemented yet");		hoistCopies();
		qcolombetUnsubmitted Not Done Reply Inline Actions Add a comment saying that hoistCopies will behave differently between size and speed, otherwise it feels like those modes are the same. qcolombet: Add a comment saying that hoistCopies will behave differently between size and speed, otherwise…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Added. wmi: Added.
}		}

// Transfer the simply mapped values, check if any are skipped.		// Transfer the simply mapped values, check if any are skipped.
bool Skipped = transferValues();		bool Skipped = transferValues();
if (Skipped)		if (Skipped)
extendPHIKillRanges();		extendPHIKillRanges();
else		else
++NumSimple;		++NumSimple;
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

test/CodeGen/AArch64/aarch64-deferred-spilling.ll

	;RUN: llc < %s -mtriple=aarch64--linux-android -regalloc=greedy -enable-deferred-spilling=true -mcpu=cortex-a57 -disable-fp-elim \| FileCheck %s --check-prefix=CHECK --check-prefix=DEFERRED
	;RUN: llc < %s -mtriple=aarch64--linux-android -regalloc=greedy -enable-deferred-spilling=false -mcpu=cortex-a57 -disable-fp-elim \| FileCheck %s --check-prefix=CHECK --check-prefix=REGULAR

	; Check that we do not end up with useless spill code.
	;
	; Move to the basic block we are interested in.
	;
	; CHECK: // %if.then.120
	;
	; REGULAR: str w21, [sp, #[[OFFSET:[0-9]+]]] // 4-byte Folded Spill
	; Check that w21 wouldn't need to be spilled since it is never reused.
	; REGULAR-NOT: {{[wx]}}21{{,?}}
	;
	; Check that w22 is used to carry a value through the call.
	; DEFERRED-NOT: str {{[wx]}}22,
	; DEFERRED: mov {{[wx]}}22,
	; DEFERRED-NOT: str {{[wx]}}22,
	;
	; CHECK: bl fprintf
	;
	; DEFERRED-NOT: ldr {{[wx]}}22,
	; DEFERRED: mov {{[wx][0-9]+}}, {{[wx]}}22
	; DEFERRED-NOT: ldr {{[wx]}}22,
	;
	; REGULAR-NOT: {{[wx]}}21{{,?}}
	; REGULAR: ldr w21, [sp, #[[OFFSET]]] // 4-byte Folded Reload
	;
	; End of the basic block we are interested in.
	; CHECK: b
	; CHECK: {{[^:]+}}: // %sw.bb.123

	%struct.__sFILE = type { i8, i32, i32, i32, i32, %struct.__sbuf, i32, i8, i32 (i8), i32 (i8, i8, i32), i64 (i8, i64, i32), i32 (i8, i8, i32), %struct.__sbuf, i8*, i32, [3 x i8], [1 x i8], %struct.__sbuf, i32, i64 }
	%struct.__sbuf = type { i8*, i64 }
	%struct.DState = type { %struct.bz_stream, i32, i8, i32, i8, i32, i32, i32, i32, i32, i8, i32, i32, i32, i32, i32, [256 x i32], i32, [257 x i32], [257 x i32], i32, i16, i8, i32, i32, i32, i32, i32, [256 x i8], [16 x i8], [256 x i8], [4096 x i8], [16 x i32], [18002 x i8], [18002 x i8], [6 x [258 x i8]], [6 x [258 x i32]], [6 x [258 x i32]], [6 x [258 x i32]], [6 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32* }
	%struct.bz_stream = type { i8, i32, i32, i32, i8, i32, i32, i32, i8, i8 (i8, i32, i32), void (i8, i8), i8 }

	@__sF = external global [0 x %struct.__sFILE], align 8
	@.str = private unnamed_addr constant [20 x i8] c"\0A [%d: stuff+mf \00", align 1

	declare i32 @fprintf(%struct.__sFILE* nocapture, i8* nocapture readonly, ...)

	declare void @bar(i32)

	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1)

	define i32 @foo(%struct.DState* %s) {
	entry:
	%state = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 1
	%tmp = load i32, i32* %state, align 4
	%cmp = icmp eq i32 %tmp, 10
	%save_i = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 40
	br i1 %cmp, label %if.end.thread, label %if.end

	if.end.thread: ; preds = %entry
	%save_j = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 41
	%save_t = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 42
	%save_alphaSize = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 43
	%save_nGroups = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 44
	%save_nSelectors = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 45
	%save_EOB = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 46
	%save_groupNo = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 47
	%save_groupPos = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 48
	%save_nextSym = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 49
	%save_nblockMAX = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 50
	%save_nblock = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 51
	%save_es = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 52
	%save_N = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 53
	%save_curr = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 54
	%save_zt = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 55
	%save_zn = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 56
	%save_zvec = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 57
	%save_zj = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 58
	%tmp1 = bitcast i32* %save_i to i8*
	call void @llvm.memset.p0i8.i64(i8* %tmp1, i8 0, i64 108, i32 4, i1 false)
	br label %sw.default

	if.end: ; preds = %entry
	%.pre = load i32, i32* %save_i, align 4
	%save_j3.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 41
	%.pre406 = load i32, i32* %save_j3.phi.trans.insert, align 4
	%save_t4.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 42
	%.pre407 = load i32, i32* %save_t4.phi.trans.insert, align 4
	%save_alphaSize5.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 43
	%.pre408 = load i32, i32* %save_alphaSize5.phi.trans.insert, align 4
	%save_nGroups6.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 44
	%.pre409 = load i32, i32* %save_nGroups6.phi.trans.insert, align 4
	%save_nSelectors7.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 45
	%.pre410 = load i32, i32* %save_nSelectors7.phi.trans.insert, align 4
	%save_EOB8.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 46
	%.pre411 = load i32, i32* %save_EOB8.phi.trans.insert, align 4
	%save_groupNo9.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 47
	%.pre412 = load i32, i32* %save_groupNo9.phi.trans.insert, align 4
	%save_groupPos10.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 48
	%.pre413 = load i32, i32* %save_groupPos10.phi.trans.insert, align 4
	%save_nextSym11.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 49
	%.pre414 = load i32, i32* %save_nextSym11.phi.trans.insert, align 4
	%save_nblockMAX12.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 50
	%.pre415 = load i32, i32* %save_nblockMAX12.phi.trans.insert, align 4
	%save_nblock13.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 51
	%.pre416 = load i32, i32* %save_nblock13.phi.trans.insert, align 4
	%save_es14.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 52
	%.pre417 = load i32, i32* %save_es14.phi.trans.insert, align 4
	%save_N15.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 53
	%.pre418 = load i32, i32* %save_N15.phi.trans.insert, align 4
	%save_curr16.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 54
	%.pre419 = load i32, i32* %save_curr16.phi.trans.insert, align 4
	%save_zt17.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 55
	%.pre420 = load i32, i32* %save_zt17.phi.trans.insert, align 4
	%save_zn18.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 56
	%.pre421 = load i32, i32* %save_zn18.phi.trans.insert, align 4
	%save_zvec19.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 57
	%.pre422 = load i32, i32* %save_zvec19.phi.trans.insert, align 4
	%save_zj20.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 58
	%.pre423 = load i32, i32* %save_zj20.phi.trans.insert, align 4
	switch i32 %tmp, label %sw.default [
	i32 13, label %sw.bb
	i32 14, label %if.end.sw.bb.65_crit_edge
	i32 25, label %if.end.sw.bb.123_crit_edge
	]

	if.end.sw.bb.123_crit_edge: ; preds = %if.end
	%.pre433 = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 8
	br label %sw.bb.123

	if.end.sw.bb.65_crit_edge: ; preds = %if.end
	%bsLive69.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 8
	%.pre426 = load i32, i32* %bsLive69.phi.trans.insert, align 4
	br label %sw.bb.65

	sw.bb: ; preds = %if.end
	%sunkaddr = ptrtoint %struct.DState* %s to i64
	%sunkaddr485 = add i64 %sunkaddr, 8
	%sunkaddr486 = inttoptr i64 %sunkaddr485 to i32*
	store i32 13, i32* %sunkaddr486, align 4
	%bsLive = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 8
	%tmp2 = load i32, i32* %bsLive, align 4
	%cmp28.400 = icmp sgt i32 %tmp2, 7
	br i1 %cmp28.400, label %sw.bb.if.then.29_crit_edge, label %if.end.33.lr.ph

	sw.bb.if.then.29_crit_edge: ; preds = %sw.bb
	%sunkaddr487 = ptrtoint %struct.DState* %s to i64
	%sunkaddr488 = add i64 %sunkaddr487, 32
	%sunkaddr489 = inttoptr i64 %sunkaddr488 to i32*
	%.pre425 = load i32, i32* %sunkaddr489, align 4
	br label %if.then.29

	if.end.33.lr.ph: ; preds = %sw.bb
	%tmp3 = bitcast %struct.DState* %s to %struct.bz_stream**
	%.pre424 = load %struct.bz_stream, %struct.bz_stream* %tmp3, align 8
	%avail_in.phi.trans.insert = getelementptr inbounds %struct.bz_stream, %struct.bz_stream* %.pre424, i64 0, i32 1
	%.pre430 = load i32, i32* %avail_in.phi.trans.insert, align 4
	%tmp4 = add i32 %.pre430, -1
	br label %if.end.33

	if.then.29: ; preds = %while.body.backedge, %sw.bb.if.then.29_crit_edge
	%tmp5 = phi i32 [ %.pre425, %sw.bb.if.then.29_crit_edge ], [ %or, %while.body.backedge ]
	%.lcssa393 = phi i32 [ %tmp2, %sw.bb.if.then.29_crit_edge ], [ %add, %while.body.backedge ]
	%sub = add nsw i32 %.lcssa393, -8
	%shr = lshr i32 %tmp5, %sub
	%and = and i32 %shr, 255
	%sunkaddr491 = ptrtoint %struct.DState* %s to i64
	%sunkaddr492 = add i64 %sunkaddr491, 36
	%sunkaddr493 = inttoptr i64 %sunkaddr492 to i32*
	store i32 %sub, i32* %sunkaddr493, align 4
	%blockSize100k = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 9
	store i32 %and, i32* %blockSize100k, align 4
	%and.off = add nsw i32 %and, -49
	%tmp6 = icmp ugt i32 %and.off, 8
	br i1 %tmp6, label %save_state_and_return, label %if.end.62

	if.end.33: ; preds = %while.body.backedge, %if.end.33.lr.ph
	%lsr.iv482 = phi i32 [ %tmp4, %if.end.33.lr.ph ], [ %lsr.iv.next483, %while.body.backedge ]
	%tmp7 = phi i32 [ %tmp2, %if.end.33.lr.ph ], [ %add, %while.body.backedge ]
	%cmp35 = icmp eq i32 %lsr.iv482, -1
	br i1 %cmp35, label %save_state_and_return, label %if.end.37

	if.end.37: ; preds = %if.end.33
	%tmp8 = bitcast %struct.bz_stream* %.pre424 to i8**
	%sunkaddr494 = ptrtoint %struct.DState* %s to i64
	%sunkaddr495 = add i64 %sunkaddr494, 32
	%sunkaddr496 = inttoptr i64 %sunkaddr495 to i32*
	%tmp9 = load i32, i32* %sunkaddr496, align 4
	%shl = shl i32 %tmp9, 8
	%tmp10 = load i8, i8* %tmp8, align 8
	%tmp11 = load i8, i8* %tmp10, align 1
	%conv = zext i8 %tmp11 to i32
	%or = or i32 %conv, %shl
	store i32 %or, i32* %sunkaddr496, align 4
	%add = add nsw i32 %tmp7, 8
	%sunkaddr497 = ptrtoint %struct.DState* %s to i64
	%sunkaddr498 = add i64 %sunkaddr497, 36
	%sunkaddr499 = inttoptr i64 %sunkaddr498 to i32*
	store i32 %add, i32* %sunkaddr499, align 4
	%incdec.ptr = getelementptr inbounds i8, i8* %tmp10, i64 1
	store i8* %incdec.ptr, i8** %tmp8, align 8
	%sunkaddr500 = ptrtoint %struct.bz_stream* %.pre424 to i64
	%sunkaddr501 = add i64 %sunkaddr500, 8
	%sunkaddr502 = inttoptr i64 %sunkaddr501 to i32*
	store i32 %lsr.iv482, i32* %sunkaddr502, align 4
	%sunkaddr503 = ptrtoint %struct.bz_stream* %.pre424 to i64
	%sunkaddr504 = add i64 %sunkaddr503, 12
	%sunkaddr505 = inttoptr i64 %sunkaddr504 to i32*
	%tmp12 = load i32, i32* %sunkaddr505, align 4
	%inc = add i32 %tmp12, 1
	store i32 %inc, i32* %sunkaddr505, align 4
	%cmp49 = icmp eq i32 %inc, 0
	br i1 %cmp49, label %if.then.51, label %while.body.backedge

	if.then.51: ; preds = %if.end.37
	%sunkaddr506 = ptrtoint %struct.bz_stream* %.pre424 to i64
	%sunkaddr507 = add i64 %sunkaddr506, 16
	%sunkaddr508 = inttoptr i64 %sunkaddr507 to i32*
	%tmp13 = load i32, i32* %sunkaddr508, align 4
	%inc53 = add i32 %tmp13, 1
	store i32 %inc53, i32* %sunkaddr508, align 4
	br label %while.body.backedge

	while.body.backedge: ; preds = %if.then.51, %if.end.37
	%lsr.iv.next483 = add i32 %lsr.iv482, -1
	%cmp28 = icmp sgt i32 %add, 7
	br i1 %cmp28, label %if.then.29, label %if.end.33

	if.end.62: ; preds = %if.then.29
	%sub64 = add nsw i32 %and, -48
	%sunkaddr509 = ptrtoint %struct.DState* %s to i64
	%sunkaddr510 = add i64 %sunkaddr509, 40
	%sunkaddr511 = inttoptr i64 %sunkaddr510 to i32*
	store i32 %sub64, i32* %sunkaddr511, align 4
	br label %sw.bb.65

	sw.bb.65: ; preds = %if.end.62, %if.end.sw.bb.65_crit_edge
	%bsLive69.pre-phi = phi i32* [ %bsLive69.phi.trans.insert, %if.end.sw.bb.65_crit_edge ], [ %bsLive, %if.end.62 ]
	%tmp14 = phi i32 [ %.pre426, %if.end.sw.bb.65_crit_edge ], [ %sub, %if.end.62 ]
	%sunkaddr512 = ptrtoint %struct.DState* %s to i64
	%sunkaddr513 = add i64 %sunkaddr512, 8
	%sunkaddr514 = inttoptr i64 %sunkaddr513 to i32*
	store i32 14, i32* %sunkaddr514, align 4
	%cmp70.397 = icmp sgt i32 %tmp14, 7
	br i1 %cmp70.397, label %if.then.72, label %if.end.82.lr.ph

	if.end.82.lr.ph: ; preds = %sw.bb.65
	%tmp15 = bitcast %struct.DState* %s to %struct.bz_stream**
	%.pre427 = load %struct.bz_stream, %struct.bz_stream* %tmp15, align 8
	%avail_in84.phi.trans.insert = getelementptr inbounds %struct.bz_stream, %struct.bz_stream* %.pre427, i64 0, i32 1
	%.pre431 = load i32, i32* %avail_in84.phi.trans.insert, align 4
	%tmp16 = add i32 %.pre431, -1
	br label %if.end.82

	if.then.72: ; preds = %while.body.68.backedge, %sw.bb.65
	%.lcssa390 = phi i32 [ %tmp14, %sw.bb.65 ], [ %add97, %while.body.68.backedge ]
	%sub76 = add nsw i32 %.lcssa390, -8
	%sunkaddr516 = ptrtoint %struct.DState* %s to i64
	%sunkaddr517 = add i64 %sunkaddr516, 36
	%sunkaddr518 = inttoptr i64 %sunkaddr517 to i32*
	store i32 %sub76, i32* %sunkaddr518, align 4
	%currBlockNo = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 11
	%tmp17 = load i32, i32* %currBlockNo, align 4
	%inc117 = add nsw i32 %tmp17, 1
	store i32 %inc117, i32* %currBlockNo, align 4
	%verbosity = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 12
	%tmp18 = load i32, i32* %verbosity, align 4
	%cmp118 = icmp sgt i32 %tmp18, 1
	br i1 %cmp118, label %if.then.120, label %sw.bb.123, !prof !0

	if.end.82: ; preds = %while.body.68.backedge, %if.end.82.lr.ph
	%lsr.iv480 = phi i32 [ %tmp16, %if.end.82.lr.ph ], [ %lsr.iv.next481, %while.body.68.backedge ]
	%tmp19 = phi i32 [ %tmp14, %if.end.82.lr.ph ], [ %add97, %while.body.68.backedge ]
	%cmp85 = icmp eq i32 %lsr.iv480, -1
	br i1 %cmp85, label %save_state_and_return, label %if.end.88

	if.end.88: ; preds = %if.end.82
	%tmp20 = bitcast %struct.bz_stream* %.pre427 to i8**
	%sunkaddr519 = ptrtoint %struct.DState* %s to i64
	%sunkaddr520 = add i64 %sunkaddr519, 32
	%sunkaddr521 = inttoptr i64 %sunkaddr520 to i32*
	%tmp21 = load i32, i32* %sunkaddr521, align 4
	%shl90 = shl i32 %tmp21, 8
	%tmp22 = load i8, i8* %tmp20, align 8
	%tmp23 = load i8, i8* %tmp22, align 1
	%conv93 = zext i8 %tmp23 to i32
	%or94 = or i32 %conv93, %shl90
	store i32 %or94, i32* %sunkaddr521, align 4
	%add97 = add nsw i32 %tmp19, 8
	%sunkaddr522 = ptrtoint %struct.DState* %s to i64
	%sunkaddr523 = add i64 %sunkaddr522, 36
	%sunkaddr524 = inttoptr i64 %sunkaddr523 to i32*
	store i32 %add97, i32* %sunkaddr524, align 4
	%incdec.ptr100 = getelementptr inbounds i8, i8* %tmp22, i64 1
	store i8* %incdec.ptr100, i8** %tmp20, align 8
	%sunkaddr525 = ptrtoint %struct.bz_stream* %.pre427 to i64
	%sunkaddr526 = add i64 %sunkaddr525, 8
	%sunkaddr527 = inttoptr i64 %sunkaddr526 to i32*
	store i32 %lsr.iv480, i32* %sunkaddr527, align 4
	%sunkaddr528 = ptrtoint %struct.bz_stream* %.pre427 to i64
	%sunkaddr529 = add i64 %sunkaddr528, 12
	%sunkaddr530 = inttoptr i64 %sunkaddr529 to i32*
	%tmp24 = load i32, i32* %sunkaddr530, align 4
	%inc106 = add i32 %tmp24, 1
	store i32 %inc106, i32* %sunkaddr530, align 4
	%cmp109 = icmp eq i32 %inc106, 0
	br i1 %cmp109, label %if.then.111, label %while.body.68.backedge

	if.then.111: ; preds = %if.end.88
	%sunkaddr531 = ptrtoint %struct.bz_stream* %.pre427 to i64
	%sunkaddr532 = add i64 %sunkaddr531, 16
	%sunkaddr533 = inttoptr i64 %sunkaddr532 to i32*
	%tmp25 = load i32, i32* %sunkaddr533, align 4
	%inc114 = add i32 %tmp25, 1
	store i32 %inc114, i32* %sunkaddr533, align 4
	br label %while.body.68.backedge

	while.body.68.backedge: ; preds = %if.then.111, %if.end.88
	%lsr.iv.next481 = add i32 %lsr.iv480, -1
	%cmp70 = icmp sgt i32 %add97, 7
	br i1 %cmp70, label %if.then.72, label %if.end.82

	if.then.120: ; preds = %if.then.72
	%call = tail call i32 (%struct.__sFILE, i8, ...) @fprintf(%struct.__sFILE* getelementptr inbounds ([0 x %struct.__sFILE], [0 x %struct.__sFILE]* @__sF, i64 0, i64 2), i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.str, i64 0, i64 0), i32 %inc117)
	br label %sw.bb.123

	sw.bb.123: ; preds = %if.then.120, %if.then.72, %if.end.sw.bb.123_crit_edge
	%bsLive127.pre-phi = phi i32* [ %.pre433, %if.end.sw.bb.123_crit_edge ], [ %bsLive69.pre-phi, %if.then.72 ], [ %bsLive69.pre-phi, %if.then.120 ]
	%sunkaddr534 = ptrtoint %struct.DState* %s to i64
	%sunkaddr535 = add i64 %sunkaddr534, 8
	%sunkaddr536 = inttoptr i64 %sunkaddr535 to i32*
	store i32 25, i32* %sunkaddr536, align 4
	%tmp26 = load i32, i32* %bsLive127.pre-phi, align 4
	%cmp128.395 = icmp sgt i32 %tmp26, 7
	br i1 %cmp128.395, label %sw.bb.123.if.then.130_crit_edge, label %if.end.140.lr.ph

	sw.bb.123.if.then.130_crit_edge: ; preds = %sw.bb.123
	%sunkaddr537 = ptrtoint %struct.DState* %s to i64
	%sunkaddr538 = add i64 %sunkaddr537, 32
	%sunkaddr539 = inttoptr i64 %sunkaddr538 to i32*
	%.pre429 = load i32, i32* %sunkaddr539, align 4
	br label %if.then.130

	if.end.140.lr.ph: ; preds = %sw.bb.123
	%tmp27 = bitcast %struct.DState* %s to %struct.bz_stream**
	%.pre428 = load %struct.bz_stream, %struct.bz_stream* %tmp27, align 8
	%avail_in142.phi.trans.insert = getelementptr inbounds %struct.bz_stream, %struct.bz_stream* %.pre428, i64 0, i32 1
	%.pre432 = load i32, i32* %avail_in142.phi.trans.insert, align 4
	%tmp28 = add i32 %.pre432, -1
	br label %if.end.140

	if.then.130: ; preds = %while.body.126.backedge, %sw.bb.123.if.then.130_crit_edge
	%tmp29 = phi i32 [ %.pre429, %sw.bb.123.if.then.130_crit_edge ], [ %or152, %while.body.126.backedge ]
	%.lcssa = phi i32 [ %tmp26, %sw.bb.123.if.then.130_crit_edge ], [ %add155, %while.body.126.backedge ]
	%sub134 = add nsw i32 %.lcssa, -8
	%shr135 = lshr i32 %tmp29, %sub134
	store i32 %sub134, i32* %bsLive127.pre-phi, align 4
	%origPtr = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 13
	%tmp30 = load i32, i32* %origPtr, align 4
	%shl175 = shl i32 %tmp30, 8
	%conv176 = and i32 %shr135, 255
	%or177 = or i32 %shl175, %conv176
	store i32 %or177, i32* %origPtr, align 4
	%nInUse = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 27
	%tmp31 = load i32, i32* %nInUse, align 4
	%add179 = add nsw i32 %tmp31, 2
	br label %save_state_and_return

	if.end.140: ; preds = %while.body.126.backedge, %if.end.140.lr.ph
	%lsr.iv = phi i32 [ %tmp28, %if.end.140.lr.ph ], [ %lsr.iv.next, %while.body.126.backedge ]
	%tmp32 = phi i32 [ %tmp26, %if.end.140.lr.ph ], [ %add155, %while.body.126.backedge ]
	%cmp143 = icmp eq i32 %lsr.iv, -1
	br i1 %cmp143, label %save_state_and_return, label %if.end.146

	if.end.146: ; preds = %if.end.140
	%tmp33 = bitcast %struct.bz_stream* %.pre428 to i8**
	%sunkaddr541 = ptrtoint %struct.DState* %s to i64
	%sunkaddr542 = add i64 %sunkaddr541, 32
	%sunkaddr543 = inttoptr i64 %sunkaddr542 to i32*
	%tmp34 = load i32, i32* %sunkaddr543, align 4
	%shl148 = shl i32 %tmp34, 8
	%tmp35 = load i8, i8* %tmp33, align 8
	%tmp36 = load i8, i8* %tmp35, align 1
	%conv151 = zext i8 %tmp36 to i32
	%or152 = or i32 %conv151, %shl148
	store i32 %or152, i32* %sunkaddr543, align 4
	%add155 = add nsw i32 %tmp32, 8
	store i32 %add155, i32* %bsLive127.pre-phi, align 4
	%incdec.ptr158 = getelementptr inbounds i8, i8* %tmp35, i64 1
	store i8* %incdec.ptr158, i8** %tmp33, align 8
	%sunkaddr544 = ptrtoint %struct.bz_stream* %.pre428 to i64
	%sunkaddr545 = add i64 %sunkaddr544, 8
	%sunkaddr546 = inttoptr i64 %sunkaddr545 to i32*
	store i32 %lsr.iv, i32* %sunkaddr546, align 4
	%sunkaddr547 = ptrtoint %struct.bz_stream* %.pre428 to i64
	%sunkaddr548 = add i64 %sunkaddr547, 12
	%sunkaddr549 = inttoptr i64 %sunkaddr548 to i32*
	%tmp37 = load i32, i32* %sunkaddr549, align 4
	%inc164 = add i32 %tmp37, 1
	store i32 %inc164, i32* %sunkaddr549, align 4
	%cmp167 = icmp eq i32 %inc164, 0
	br i1 %cmp167, label %if.then.169, label %while.body.126.backedge

	if.then.169: ; preds = %if.end.146
	%sunkaddr550 = ptrtoint %struct.bz_stream* %.pre428 to i64
	%sunkaddr551 = add i64 %sunkaddr550, 16
	%sunkaddr552 = inttoptr i64 %sunkaddr551 to i32*
	%tmp38 = load i32, i32* %sunkaddr552, align 4
	%inc172 = add i32 %tmp38, 1
	store i32 %inc172, i32* %sunkaddr552, align 4
	br label %while.body.126.backedge

	while.body.126.backedge: ; preds = %if.then.169, %if.end.146
	%lsr.iv.next = add i32 %lsr.iv, -1
	%cmp128 = icmp sgt i32 %add155, 7
	br i1 %cmp128, label %if.then.130, label %if.end.140

	sw.default: ; preds = %if.end, %if.end.thread
	%tmp39 = phi i32 [ 0, %if.end.thread ], [ %.pre, %if.end ]
	%tmp40 = phi i32 [ 0, %if.end.thread ], [ %.pre406, %if.end ]
	%tmp41 = phi i32 [ 0, %if.end.thread ], [ %.pre407, %if.end ]
	%tmp42 = phi i32 [ 0, %if.end.thread ], [ %.pre408, %if.end ]
	%tmp43 = phi i32 [ 0, %if.end.thread ], [ %.pre409, %if.end ]
	%tmp44 = phi i32 [ 0, %if.end.thread ], [ %.pre410, %if.end ]
	%tmp45 = phi i32 [ 0, %if.end.thread ], [ %.pre411, %if.end ]
	%tmp46 = phi i32 [ 0, %if.end.thread ], [ %.pre412, %if.end ]
	%tmp47 = phi i32 [ 0, %if.end.thread ], [ %.pre413, %if.end ]
	%tmp48 = phi i32 [ 0, %if.end.thread ], [ %.pre414, %if.end ]
	%tmp49 = phi i32 [ 0, %if.end.thread ], [ %.pre415, %if.end ]
	%tmp50 = phi i32 [ 0, %if.end.thread ], [ %.pre416, %if.end ]
	%tmp51 = phi i32 [ 0, %if.end.thread ], [ %.pre417, %if.end ]
	%tmp52 = phi i32 [ 0, %if.end.thread ], [ %.pre418, %if.end ]
	%tmp53 = phi i32 [ 0, %if.end.thread ], [ %.pre419, %if.end ]
	%tmp54 = phi i32 [ 0, %if.end.thread ], [ %.pre420, %if.end ]
	%tmp55 = phi i32 [ 0, %if.end.thread ], [ %.pre421, %if.end ]
	%tmp56 = phi i32 [ 0, %if.end.thread ], [ %.pre422, %if.end ]
	%tmp57 = phi i32 [ 0, %if.end.thread ], [ %.pre423, %if.end ]
	%save_j3.pre-phi469 = phi i32* [ %save_j, %if.end.thread ], [ %save_j3.phi.trans.insert, %if.end ]
	%save_t4.pre-phi467 = phi i32* [ %save_t, %if.end.thread ], [ %save_t4.phi.trans.insert, %if.end ]
	%save_alphaSize5.pre-phi465 = phi i32* [ %save_alphaSize, %if.end.thread ], [ %save_alphaSize5.phi.trans.insert, %if.end ]
	%save_nGroups6.pre-phi463 = phi i32* [ %save_nGroups, %if.end.thread ], [ %save_nGroups6.phi.trans.insert, %if.end ]
	%save_nSelectors7.pre-phi461 = phi i32* [ %save_nSelectors, %if.end.thread ], [ %save_nSelectors7.phi.trans.insert, %if.end ]
	%save_EOB8.pre-phi459 = phi i32* [ %save_EOB, %if.end.thread ], [ %save_EOB8.phi.trans.insert, %if.end ]
	%save_groupNo9.pre-phi457 = phi i32* [ %save_groupNo, %if.end.thread ], [ %save_groupNo9.phi.trans.insert, %if.end ]
	%save_groupPos10.pre-phi455 = phi i32* [ %save_groupPos, %if.end.thread ], [ %save_groupPos10.phi.trans.insert, %if.end ]
	%save_nextSym11.pre-phi453 = phi i32* [ %save_nextSym, %if.end.thread ], [ %save_nextSym11.phi.trans.insert, %if.end ]
	%save_nblockMAX12.pre-phi451 = phi i32* [ %save_nblockMAX, %if.end.thread ], [ %save_nblockMAX12.phi.trans.insert, %if.end ]
	%save_nblock13.pre-phi449 = phi i32* [ %save_nblock, %if.end.thread ], [ %save_nblock13.phi.trans.insert, %if.end ]
	%save_es14.pre-phi447 = phi i32* [ %save_es, %if.end.thread ], [ %save_es14.phi.trans.insert, %if.end ]
	%save_N15.pre-phi445 = phi i32* [ %save_N, %if.end.thread ], [ %save_N15.phi.trans.insert, %if.end ]
	%save_curr16.pre-phi443 = phi i32* [ %save_curr, %if.end.thread ], [ %save_curr16.phi.trans.insert, %if.end ]
	%save_zt17.pre-phi441 = phi i32* [ %save_zt, %if.end.thread ], [ %save_zt17.phi.trans.insert, %if.end ]
	%save_zn18.pre-phi439 = phi i32* [ %save_zn, %if.end.thread ], [ %save_zn18.phi.trans.insert, %if.end ]
	%save_zvec19.pre-phi437 = phi i32* [ %save_zvec, %if.end.thread ], [ %save_zvec19.phi.trans.insert, %if.end ]
	%save_zj20.pre-phi435 = phi i32* [ %save_zj, %if.end.thread ], [ %save_zj20.phi.trans.insert, %if.end ]
	tail call void @bar(i32 4001)
	br label %save_state_and_return

	save_state_and_return: ; preds = %sw.default, %if.end.140, %if.then.130, %if.end.82, %if.end.33, %if.then.29
	%tmp58 = phi i32 [ %tmp39, %sw.default ], [ %.pre, %if.then.29 ], [ %.pre, %if.then.130 ], [ %.pre, %if.end.140 ], [ %.pre, %if.end.82 ], [ %.pre, %if.end.33 ]
	%tmp59 = phi i32 [ %tmp40, %sw.default ], [ %.pre406, %if.then.29 ], [ %.pre406, %if.then.130 ], [ %.pre406, %if.end.140 ], [ %.pre406, %if.end.82 ], [ %.pre406, %if.end.33 ]
	%tmp60 = phi i32 [ %tmp41, %sw.default ], [ %.pre407, %if.then.29 ], [ %.pre407, %if.then.130 ], [ %.pre407, %if.end.140 ], [ %.pre407, %if.end.82 ], [ %.pre407, %if.end.33 ]
	%tmp61 = phi i32 [ %tmp43, %sw.default ], [ %.pre409, %if.then.29 ], [ %.pre409, %if.then.130 ], [ %.pre409, %if.end.140 ], [ %.pre409, %if.end.82 ], [ %.pre409, %if.end.33 ]
	%tmp62 = phi i32 [ %tmp44, %sw.default ], [ %.pre410, %if.then.29 ], [ %.pre410, %if.then.130 ], [ %.pre410, %if.end.140 ], [ %.pre410, %if.end.82 ], [ %.pre410, %if.end.33 ]
	%tmp63 = phi i32 [ %tmp45, %sw.default ], [ %.pre411, %if.then.29 ], [ %.pre411, %if.then.130 ], [ %.pre411, %if.end.140 ], [ %.pre411, %if.end.82 ], [ %.pre411, %if.end.33 ]
	%tmp64 = phi i32 [ %tmp46, %sw.default ], [ %.pre412, %if.then.29 ], [ %.pre412, %if.then.130 ], [ %.pre412, %if.end.140 ], [ %.pre412, %if.end.82 ], [ %.pre412, %if.end.33 ]
	%tmp65 = phi i32 [ %tmp47, %sw.default ], [ %.pre413, %if.then.29 ], [ %.pre413, %if.then.130 ], [ %.pre413, %if.end.140 ], [ %.pre413, %if.end.82 ], [ %.pre413, %if.end.33 ]
	%tmp66 = phi i32 [ %tmp48, %sw.default ], [ %.pre414, %if.then.29 ], [ %.pre414, %if.then.130 ], [ %.pre414, %if.end.140 ], [ %.pre414, %if.end.82 ], [ %.pre414, %if.end.33 ]
	%tmp67 = phi i32 [ %tmp49, %sw.default ], [ %.pre415, %if.then.29 ], [ %.pre415, %if.then.130 ], [ %.pre415, %if.end.140 ], [ %.pre415, %if.end.82 ], [ %.pre415, %if.end.33 ]
	%tmp68 = phi i32 [ %tmp51, %sw.default ], [ %.pre417, %if.then.29 ], [ %.pre417, %if.then.130 ], [ %.pre417, %if.end.140 ], [ %.pre417, %if.end.82 ], [ %.pre417, %if.end.33 ]
	%tmp69 = phi i32 [ %tmp52, %sw.default ], [ %.pre418, %if.then.29 ], [ %.pre418, %if.then.130 ], [ %.pre418, %if.end.140 ], [ %.pre418, %if.end.82 ], [ %.pre418, %if.end.33 ]
	%tmp70 = phi i32 [ %tmp53, %sw.default ], [ %.pre419, %if.then.29 ], [ %.pre419, %if.then.130 ], [ %.pre419, %if.end.140 ], [ %.pre419, %if.end.82 ], [ %.pre419, %if.end.33 ]
	%tmp71 = phi i32 [ %tmp54, %sw.default ], [ %.pre420, %if.then.29 ], [ %.pre420, %if.then.130 ], [ %.pre420, %if.end.140 ], [ %.pre420, %if.end.82 ], [ %.pre420, %if.end.33 ]
	%tmp72 = phi i32 [ %tmp55, %sw.default ], [ %.pre421, %if.then.29 ], [ %.pre421, %if.then.130 ], [ %.pre421, %if.end.140 ], [ %.pre421, %if.end.82 ], [ %.pre421, %if.end.33 ]
	%tmp73 = phi i32 [ %tmp56, %sw.default ], [ %.pre422, %if.then.29 ], [ %.pre422, %if.then.130 ], [ %.pre422, %if.end.140 ], [ %.pre422, %if.end.82 ], [ %.pre422, %if.end.33 ]
	%tmp74 = phi i32 [ %tmp57, %sw.default ], [ %.pre423, %if.then.29 ], [ %.pre423, %if.then.130 ], [ %.pre423, %if.end.140 ], [ %.pre423, %if.end.82 ], [ %.pre423, %if.end.33 ]
	%save_j3.pre-phi468 = phi i32* [ %save_j3.pre-phi469, %sw.default ], [ %save_j3.phi.trans.insert, %if.then.29 ], [ %save_j3.phi.trans.insert, %if.then.130 ], [ %save_j3.phi.trans.insert, %if.end.140 ], [ %save_j3.phi.trans.insert, %if.end.82 ], [ %save_j3.phi.trans.insert, %if.end.33 ]
	%save_t4.pre-phi466 = phi i32* [ %save_t4.pre-phi467, %sw.default ], [ %save_t4.phi.trans.insert, %if.then.29 ], [ %save_t4.phi.trans.insert, %if.then.130 ], [ %save_t4.phi.trans.insert, %if.end.140 ], [ %save_t4.phi.trans.insert, %if.end.82 ], [ %save_t4.phi.trans.insert, %if.end.33 ]
	%save_alphaSize5.pre-phi464 = phi i32* [ %save_alphaSize5.pre-phi465, %sw.default ], [ %save_alphaSize5.phi.trans.insert, %if.then.29 ], [ %save_alphaSize5.phi.trans.insert, %if.then.130 ], [ %save_alphaSize5.phi.trans.insert, %if.end.140 ], [ %save_alphaSize5.phi.trans.insert, %if.end.82 ], [ %save_alphaSize5.phi.trans.insert, %if.end.33 ]
	%save_nGroups6.pre-phi462 = phi i32* [ %save_nGroups6.pre-phi463, %sw.default ], [ %save_nGroups6.phi.trans.insert, %if.then.29 ], [ %save_nGroups6.phi.trans.insert, %if.then.130 ], [ %save_nGroups6.phi.trans.insert, %if.end.140 ], [ %save_nGroups6.phi.trans.insert, %if.end.82 ], [ %save_nGroups6.phi.trans.insert, %if.end.33 ]
	%save_nSelectors7.pre-phi460 = phi i32* [ %save_nSelectors7.pre-phi461, %sw.default ], [ %save_nSelectors7.phi.trans.insert, %if.then.29 ], [ %save_nSelectors7.phi.trans.insert, %if.then.130 ], [ %save_nSelectors7.phi.trans.insert, %if.end.140 ], [ %save_nSelectors7.phi.trans.insert, %if.end.82 ], [ %save_nSelectors7.phi.trans.insert, %if.end.33 ]
	%save_EOB8.pre-phi458 = phi i32* [ %save_EOB8.pre-phi459, %sw.default ], [ %save_EOB8.phi.trans.insert, %if.then.29 ], [ %save_EOB8.phi.trans.insert, %if.then.130 ], [ %save_EOB8.phi.trans.insert, %if.end.140 ], [ %save_EOB8.phi.trans.insert, %if.end.82 ], [ %save_EOB8.phi.trans.insert, %if.end.33 ]
	%save_groupNo9.pre-phi456 = phi i32* [ %save_groupNo9.pre-phi457, %sw.default ], [ %save_groupNo9.phi.trans.insert, %if.then.29 ], [ %save_groupNo9.phi.trans.insert, %if.then.130 ], [ %save_groupNo9.phi.trans.insert, %if.end.140 ], [ %save_groupNo9.phi.trans.insert, %if.end.82 ], [ %save_groupNo9.phi.trans.insert, %if.end.33 ]
	%save_groupPos10.pre-phi454 = phi i32* [ %save_groupPos10.pre-phi455, %sw.default ], [ %save_groupPos10.phi.trans.insert, %if.then.29 ], [ %save_groupPos10.phi.trans.insert, %if.then.130 ], [ %save_groupPos10.phi.trans.insert, %if.end.140 ], [ %save_groupPos10.phi.trans.insert, %if.end.82 ], [ %save_groupPos10.phi.trans.insert, %if.end.33 ]
	%save_nextSym11.pre-phi452 = phi i32* [ %save_nextSym11.pre-phi453, %sw.default ], [ %save_nextSym11.phi.trans.insert, %if.then.29 ], [ %save_nextSym11.phi.trans.insert, %if.then.130 ], [ %save_nextSym11.phi.trans.insert, %if.end.140 ], [ %save_nextSym11.phi.trans.insert, %if.end.82 ], [ %save_nextSym11.phi.trans.insert, %if.end.33 ]
	%save_nblockMAX12.pre-phi450 = phi i32* [ %save_nblockMAX12.pre-phi451, %sw.default ], [ %save_nblockMAX12.phi.trans.insert, %if.then.29 ], [ %save_nblockMAX12.phi.trans.insert, %if.then.130 ], [ %save_nblockMAX12.phi.trans.insert, %if.end.140 ], [ %save_nblockMAX12.phi.trans.insert, %if.end.82 ], [ %save_nblockMAX12.phi.trans.insert, %if.end.33 ]
	%save_nblock13.pre-phi448 = phi i32* [ %save_nblock13.pre-phi449, %sw.default ], [ %save_nblock13.phi.trans.insert, %if.then.29 ], [ %save_nblock13.phi.trans.insert, %if.then.130 ], [ %save_nblock13.phi.trans.insert, %if.end.140 ], [ %save_nblock13.phi.trans.insert, %if.end.82 ], [ %save_nblock13.phi.trans.insert, %if.end.33 ]
	%save_es14.pre-phi446 = phi i32* [ %save_es14.pre-phi447, %sw.default ], [ %save_es14.phi.trans.insert, %if.then.29 ], [ %save_es14.phi.trans.insert, %if.then.130 ], [ %save_es14.phi.trans.insert, %if.end.140 ], [ %save_es14.phi.trans.insert, %if.end.82 ], [ %save_es14.phi.trans.insert, %if.end.33 ]
	%save_N15.pre-phi444 = phi i32* [ %save_N15.pre-phi445, %sw.default ], [ %save_N15.phi.trans.insert, %if.then.29 ], [ %save_N15.phi.trans.insert, %if.then.130 ], [ %save_N15.phi.trans.insert, %if.end.140 ], [ %save_N15.phi.trans.insert, %if.end.82 ], [ %save_N15.phi.trans.insert, %if.end.33 ]
	%save_curr16.pre-phi442 = phi i32* [ %save_curr16.pre-phi443, %sw.default ], [ %save_curr16.phi.trans.insert, %if.then.29 ], [ %save_curr16.phi.trans.insert, %if.then.130 ], [ %save_curr16.phi.trans.insert, %if.end.140 ], [ %save_curr16.phi.trans.insert, %if.end.82 ], [ %save_curr16.phi.trans.insert, %if.end.33 ]
	%save_zt17.pre-phi440 = phi i32* [ %save_zt17.pre-phi441, %sw.default ], [ %save_zt17.phi.trans.insert, %if.then.29 ], [ %save_zt17.phi.trans.insert, %if.then.130 ], [ %save_zt17.phi.trans.insert, %if.end.140 ], [ %save_zt17.phi.trans.insert, %if.end.82 ], [ %save_zt17.phi.trans.insert, %if.end.33 ]
	%save_zn18.pre-phi438 = phi i32* [ %save_zn18.pre-phi439, %sw.default ], [ %save_zn18.phi.trans.insert, %if.then.29 ], [ %save_zn18.phi.trans.insert, %if.then.130 ], [ %save_zn18.phi.trans.insert, %if.end.140 ], [ %save_zn18.phi.trans.insert, %if.end.82 ], [ %save_zn18.phi.trans.insert, %if.end.33 ]
	%save_zvec19.pre-phi436 = phi i32* [ %save_zvec19.pre-phi437, %sw.default ], [ %save_zvec19.phi.trans.insert, %if.then.29 ], [ %save_zvec19.phi.trans.insert, %if.then.130 ], [ %save_zvec19.phi.trans.insert, %if.end.140 ], [ %save_zvec19.phi.trans.insert, %if.end.82 ], [ %save_zvec19.phi.trans.insert, %if.end.33 ]
	%save_zj20.pre-phi434 = phi i32* [ %save_zj20.pre-phi435, %sw.default ], [ %save_zj20.phi.trans.insert, %if.then.29 ], [ %save_zj20.phi.trans.insert, %if.then.130 ], [ %save_zj20.phi.trans.insert, %if.end.140 ], [ %save_zj20.phi.trans.insert, %if.end.82 ], [ %save_zj20.phi.trans.insert, %if.end.33 ]
	%nblock.1 = phi i32 [ %tmp50, %sw.default ], [ %.pre416, %if.then.29 ], [ 0, %if.then.130 ], [ %.pre416, %if.end.140 ], [ %.pre416, %if.end.82 ], [ %.pre416, %if.end.33 ]
	%alphaSize.1 = phi i32 [ %tmp42, %sw.default ], [ %.pre408, %if.then.29 ], [ %add179, %if.then.130 ], [ %.pre408, %if.end.140 ], [ %.pre408, %if.end.82 ], [ %.pre408, %if.end.33 ]
	%retVal.0 = phi i32 [ 0, %sw.default ], [ -5, %if.then.29 ], [ -4, %if.then.130 ], [ 0, %if.end.140 ], [ 0, %if.end.82 ], [ 0, %if.end.33 ]
	store i32 %tmp58, i32* %save_i, align 4
	store i32 %tmp59, i32* %save_j3.pre-phi468, align 4
	store i32 %tmp60, i32* %save_t4.pre-phi466, align 4
	store i32 %alphaSize.1, i32* %save_alphaSize5.pre-phi464, align 4
	store i32 %tmp61, i32* %save_nGroups6.pre-phi462, align 4
	store i32 %tmp62, i32* %save_nSelectors7.pre-phi460, align 4
	store i32 %tmp63, i32* %save_EOB8.pre-phi458, align 4
	store i32 %tmp64, i32* %save_groupNo9.pre-phi456, align 4
	store i32 %tmp65, i32* %save_groupPos10.pre-phi454, align 4
	store i32 %tmp66, i32* %save_nextSym11.pre-phi452, align 4
	store i32 %tmp67, i32* %save_nblockMAX12.pre-phi450, align 4
	store i32 %nblock.1, i32* %save_nblock13.pre-phi448, align 4
	store i32 %tmp68, i32* %save_es14.pre-phi446, align 4
	store i32 %tmp69, i32* %save_N15.pre-phi444, align 4
	store i32 %tmp70, i32* %save_curr16.pre-phi442, align 4
	store i32 %tmp71, i32* %save_zt17.pre-phi440, align 4
	store i32 %tmp72, i32* %save_zn18.pre-phi438, align 4
	store i32 %tmp73, i32* %save_zvec19.pre-phi436, align 4
	store i32 %tmp74, i32* %save_zj20.pre-phi434, align 4
	ret i32 %retVal.0
	}

	!0 = !{!"branch_weights", i32 10, i32 1}

test/CodeGen/ARM/subreg-remat.ll

	; RUN: llc < %s -relocation-model=pic -disable-fp-elim -mcpu=cortex-a8 -pre-RA-sched=source -no-integrated-as \| FileCheck %s			; RUN: llc < %s -relocation-model=pic -disable-fp-elim -mcpu=cortex-a8 -pre-RA-sched=source -no-integrated-as \| FileCheck %s
	target triple = "thumbv7-apple-ios"			target triple = "thumbv7-apple-ios"
	; <rdar://problem/10032939>			; <rdar://problem/10032939>
	;			;
	; The vector %v2 is built like this:			; The vector %v2 is built like this:
	;			;
	; %vreg6:ssub_1<def> = ...			; %vreg6:ssub_1<def> = ...
	; %vreg6:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg; mem:LD4[ConstantPool] DPR_VFP2:%vreg6			; %vreg6:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg; mem:LD4[ConstantPool] DPR_VFP2:%vreg6
	;			;
	; When %vreg6 spills, the VLDRS constant pool load cannot be rematerialized			; When %vreg6 spills, the VLDRS constant pool load cannot be rematerialized
	; since it implicitly reads the ssub_1 sub-register.			; since it implicitly reads the ssub_1 sub-register.
	;			;
	; CHECK: f1			; CHECK: f1
	; CHECK: vmov d0, r0, r0			; CHECK: vmov d1, r0, r0
	; CHECK: vldr s1, LCPI			; CHECK: vldr s3, LCPI
	; The vector must be spilled:			; The vector must be spilled:
	; CHECK: vstr d0,			; CHECK: vstr d1,
	; CHECK: asm clobber d0			; CHECK: asm clobber d0
	; And reloaded after the asm:			; And reloaded after the asm:
	; CHECK: vldr [[D16:d[0-9]+]],			; CHECK: vldr [[D16:d[0-9]+]],
	; CHECK: vstr [[D16]], [r1]			; CHECK: vstr [[D16]], [r1]
	define void @f1(float %x, <2 x float>* %p) {			define void @f1(float %x, <2 x float>* %p) {
	%v1 = insertelement <2 x float> undef, float %x, i32 0			%v1 = insertelement <2 x float> undef, float %x, i32 0
	%v2 = insertelement <2 x float> %v1, float 0x400921FB60000000, i32 1			%v2 = insertelement <2 x float> %v1, float 0x400921FB60000000, i32 1
	%y = call double asm sideeffect "asm clobber $0", "=w,0,~{d1},~{d2},~{d3},~{d4},~{d5},~{d6},~{d7},~{d8},~{d9},~{d10},~{d11},~{d12},~{d13},~{d14},~{d15},~{d16},~{d17},~{d18},~{d19},~{d20},~{d21},~{d22},~{d23},~{d24},~{d25},~{d26},~{d27},~{d28},~{d29},~{d30},~{d31}"(<2 x float> %v2) nounwind			%y = call double asm sideeffect "asm clobber $0", "=w,0,~{d1},~{d2},~{d3},~{d4},~{d5},~{d6},~{d7},~{d8},~{d9},~{d10},~{d11},~{d12},~{d13},~{d14},~{d15},~{d16},~{d17},~{d18},~{d19},~{d20},~{d21},~{d22},~{d23},~{d24},~{d25},~{d26},~{d27},~{d28},~{d29},~{d30},~{d31}"(<2 x float> %v2) nounwind
	Show All 27 Lines

test/CodeGen/SPARC/spill.ll

	; RUN: llc -march=sparc < %s \| FileCheck %s			; RUN: llc -march=sparc < %s \| FileCheck %s

	;; Ensure that spills and reloads work for various types on			;; Ensure that spills and reloads work for various types on
	;; sparcv8.			;; sparcv8.

	;; For i32/i64 tests, use an asm statement which clobbers most			;; For i32/i64 tests, use an asm statement which clobbers most
	;; registers to ensure the spill will happen.			;; registers to ensure the spill will happen.

	; CHECK-LABEL: test_i32_spill:			; CHECK-LABEL: test_i32_spill:
	; CHECK: and %i0, %i1, %o0			; CHECK: and %i0, %i1, %i0
	; CHECK: st %o0, [%fp+{{.+}}]			; CHECK: mov %i0, %o0
				; CHECK: st %i0, [%fp+{{.+}}]
	; CHECK: add %o0, %o0, %g0			; CHECK: add %o0, %o0, %g0
	; CHECK: ld [%fp+{{.+}}, %i0			; CHECK: ld [%fp+{{.+}}, %i0
	define i32 @test_i32_spill(i32 %a, i32 %b) {			define i32 @test_i32_spill(i32 %a, i32 %b) {
	entry:			entry:
	%r0 = and i32 %a, %b			%r0 = and i32 %a, %b
	; The clobber list has all registers except g0/o0. (Only o0 is usable.)			; The clobber list has all registers except g0/o0. (Only o0 is usable.)
	%0 = call i32 asm sideeffect "add $0,$1,%g0", "=r,0,~{i0},~{i1},~{i2},~{i3},~{i4},~{i5},~{i6},~{i7},~{g1},~{g2},~{g3},~{g4},~{g5},~{g6},~{g7},~{l0},~{l1},~{l2},~{l3},~{l4},~{l5},~{l6},~{l7},~{o1},~{o2},~{o3},~{o4},~{o5},~{o6},~{o7}"(i32 %r0)			%0 = call i32 asm sideeffect "add $0,$1,%g0", "=r,0,~{i0},~{i1},~{i2},~{i3},~{i4},~{i5},~{i6},~{i7},~{g1},~{g2},~{g3},~{g4},~{g5},~{g6},~{g7},~{l0},~{l1},~{l2},~{l3},~{l4},~{l5},~{l6},~{l7},~{o1},~{o2},~{o3},~{o4},~{o5},~{o6},~{o7}"(i32 %r0)
	ret i32 %r0			ret i32 %r0
	}			}

	; CHECK-LABEL: test_i64_spill:			; CHECK-LABEL: test_i64_spill:
	; CHECK: and %i0, %i2, %o0			; CHECK: and %i0, %i2, %i4
	; CHECK: and %i1, %i3, %o1			; CHECK: and %i1, %i3, %i5
	; CHECK: std %o0, [%fp+{{.+}}]			; CHECK: mov %i4, %o0
				; CHECK: mov %i5, %o1
				; CHECK: std %i4, [%fp+{{.+}}]
	; CHECK: add %o0, %o0, %g0			; CHECK: add %o0, %o0, %g0
	; CHECK: ldd [%fp+{{.+}}, %i0			; CHECK: ldd [%fp+{{.+}}, %i0
	define i64 @test_i64_spill(i64 %a, i64 %b) {			define i64 @test_i64_spill(i64 %a, i64 %b) {
	entry:			entry:
	%r0 = and i64 %a, %b			%r0 = and i64 %a, %b
	; The clobber list has all registers except g0,g1,o0,o1. (Only o0/o1 are a usable pair)			; The clobber list has all registers except g0,g1,o0,o1. (Only o0/o1 are a usable pair)
	; So, o0/o1 must be used.			; So, o0/o1 must be used.
	%0 = call i64 asm sideeffect "add $0,$1,%g0", "=r,0,~{i0},~{i1},~{i2},~{i3},~{i4},~{i5},~{i6},~{i7},~{g2},~{g3},~{g4},~{g5},~{g6},~{g7},~{l0},~{l1},~{l2},~{l3},~{l4},~{l5},~{l6},~{l7},~{o2},~{o3},~{o4},~{o5},~{o7}"(i64 %r0)			%0 = call i64 asm sideeffect "add $0,$1,%g0", "=r,0,~{i0},~{i1},~{i2},~{i3},~{i4},~{i5},~{i6},~{i7},~{g2},~{g3},~{g4},~{g5},~{g6},~{g7},~{l0},~{l1},~{l2},~{l3},~{l4},~{l5},~{l6},~{l7},~{o2},~{o3},~{o4},~{o5},~{o7}"(i64 %r0)
	Show All 31 Lines

test/CodeGen/X86/avx512-bugfix-25270.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl \| FileCheck %s

	declare void @Print__512(<16 x i32>) #0			declare void @Print__512(<16 x i32>) #0

	define void @bar__512(<16 x i32>* %var) #0 {			define void @bar__512(<16 x i32>* %var) #0 {
	; CHECK-LABEL: bar__512:			; CHECK-LABEL: bar__512:
	; CHECK: ## BB#0: ## %allocas			; CHECK: ## BB#0: ## %allocas
	; CHECK-NEXT: pushq %rbx			; CHECK-NEXT: pushq %rbx
	; CHECK-NEXT: subq $112, %rsp			; CHECK-NEXT: subq $112, %rsp
	; CHECK-NEXT: movq %rdi, %rbx			; CHECK-NEXT: movq %rdi, %rbx
	; CHECK-NEXT: vmovdqu32 (%rbx), %zmm0			; CHECK-NEXT: vmovdqu32 (%rbx), %zmm0
	; CHECK-NEXT: vmovups %zmm0, (%rsp) ## 64-byte Spill
	; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %zmm1			; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %zmm1
	; CHECK-NEXT: vmovdqa32 %zmm1, (%rbx)			; CHECK-NEXT: vmovdqa32 %zmm1, (%rbx)
				; CHECK-NEXT: vmovups %zmm0, (%rsp) ## 64-byte Spill
	; CHECK-NEXT: callq _Print__512			; CHECK-NEXT: callq _Print__512
	; CHECK-NEXT: vmovups (%rsp), %zmm0 ## 64-byte Reload			; CHECK-NEXT: vmovups (%rsp), %zmm0 ## 64-byte Reload
	; CHECK-NEXT: callq _Print__512			; CHECK-NEXT: callq _Print__512
	; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0			; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0
	; CHECK-NEXT: vmovdqa32 %zmm0, (%rbx)			; CHECK-NEXT: vmovdqa32 %zmm0, (%rbx)
	; CHECK-NEXT: addq $112, %rsp			; CHECK-NEXT: addq $112, %rsp
	; CHECK-NEXT: popq %rbx			; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	Show All 12 Lines

test/CodeGen/X86/fold-push.ll

	; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=CHECK -check-prefix=NORMAL			; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=CHECK -check-prefix=NORMAL
	; RUN: llc < %s -mtriple=i686-windows -mattr=call-reg-indirect \| FileCheck %s -check-prefix=CHECK -check-prefix=SLM			; RUN: llc < %s -mtriple=i686-windows -mattr=call-reg-indirect \| FileCheck %s -check-prefix=CHECK -check-prefix=SLM

	declare void @foo(i32 %r)			declare void @foo(i32 %r)

	define void @test(i32 %a, i32 %b) optsize nounwind {			define void @test(i32 %a, i32 %b) optsize nounwind {
	; CHECK-LABEL: test:			; CHECK-LABEL: test:
	; CHECK: movl [[EAX:%e..]], (%esp)			; CHECK: addl
	; CHECK-NEXT: pushl [[EAX]]			; CHECK-NEXT: pushl [[EAX:%e..]]
				; CHECK-NEXT: movl [[EAX]], 4(%esp)
	; CHECK-NEXT: calll			; CHECK-NEXT: calll
	; CHECK-NEXT: addl $4, %esp			; CHECK-NEXT: addl $4, %esp
	; CHECK: nop			; CHECK: nop
	; NORMAL: pushl (%esp)			; NORMAL: pushl (%esp)
	; SLM: movl (%esp), [[RELOAD:%e..]]			; SLM: movl (%esp), [[RELOAD:%e..]]
	; SLM-NEXT: pushl [[RELOAD]]			; SLM-NEXT: pushl [[RELOAD]]
	; CHECK: calll			; CHECK: calll
	; CHECK-NEXT: addl $4, %esp			; CHECK-NEXT: addl $4, %esp
	%c = add i32 %a, %b			%c = add i32 %a, %b
	call void @foo(i32 %c)			call void @foo(i32 %c)
	call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()			call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()
	call void @foo(i32 %c)			call void @foo(i32 %c)
	ret void			ret void
	}			}

	define void @test_min(i32 %a, i32 %b) minsize nounwind {			define void @test_min(i32 %a, i32 %b) minsize nounwind {
	; CHECK-LABEL: test_min:			; CHECK-LABEL: test_min:
	; CHECK: movl [[EAX:%e..]], (%esp)			; CHECK: addl
	; CHECK-NEXT: pushl [[EAX]]			; CHECK-NEXT: pushl [[EAX:%e..]]
				; CHECK-NEXT: movl [[EAX]], 4(%esp)
	; CHECK-NEXT: calll			; CHECK-NEXT: calll
	; CHECK-NEXT: popl			; CHECK-NEXT: popl
	; CHECK: nop			; CHECK: nop
	; CHECK: pushl (%esp)			; CHECK: pushl (%esp)
	; CHECK: calll			; CHECK: calll
	; CHECK-NEXT: popl			; CHECK-NEXT: popl
	%c = add i32 %a, %b			%c = add i32 %a, %b
	call void @foo(i32 %c)			call void @foo(i32 %c)
	call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()			call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()
	call void @foo(i32 %c)			call void @foo(i32 %c)
	ret void			ret void
	}			}

test/CodeGen/X86/hoist-spill.ll

				; RUN: llc < %s \| grep 'Spill' \|sed 's%.$-[0-9]\+(\%rsp)$.%\1%g' \|sort \|uniq -d \|awk '{if (/rsp/); exit -1}'
				; Check no spills to the same stack slot after hoisting.
				qcolombetUnsubmitted Not Done Reply Inline Actions Make this a file check test. qcolombet: Make this a file check test.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I felt the file check test was not as general as the above test, but filecheck can still work, so I switch to file check here. wmi: I felt the file check test was not as general as the above test, but filecheck can still work…

				qcolombetUnsubmitted Not Done Reply Inline Actions You could check where the spills actually are. But it already looks pretty good now :). qcolombet: You could check where the spills actually are. But it already looks pretty good now :).
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@a = external global i32*, align 8
				@b = external global i32, align 4
				@d = external global i32*, align 8

				; Function Attrs: norecurse noreturn nounwind uwtable
				define void @fn1(i32 %p1) #0 {
				entry:
				%tmp = load i32, i32* @d, align 8
				%tmp1 = load i32, i32* @a, align 8
				%tmp2 = sext i32 %p1 to i64
				br label %for.cond

				for.cond: ; preds = %for.inc14, %entry
				%indvar = phi i32 [ %indvar.next, %for.inc14 ], [ 0, %entry ]
				%indvars.iv30.in = phi i32 [ %indvars.iv30, %for.inc14 ], [ %p1, %entry ]
				%c.0 = phi i32 [ %inc15, %for.inc14 ], [ 1, %entry ]
				%k.0 = phi i32 [ %k.1.lcssa, %for.inc14 ], [ undef, %entry ]
				%tmp3 = icmp sgt i32 undef, 0
				%smax52 = select i1 %tmp3, i32 undef, i32 0
				%tmp4 = zext i32 %smax52 to i64
				%tmp5 = icmp sgt i64 undef, %tmp4
				%smax53 = select i1 %tmp5, i64 undef, i64 %tmp4
				%tmp6 = add nsw i64 %smax53, 1
				%tmp7 = sub nsw i64 %tmp6, %tmp4
				%tmp8 = add nsw i64 %tmp7, -8
				%tmp9 = sub i32 undef, %indvar
				%tmp10 = icmp sgt i64 %tmp2, 0
				%smax40 = select i1 %tmp10, i64 %tmp2, i64 0
				%scevgep41 = getelementptr i32, i32* %tmp1, i64 %smax40
				%indvars.iv30 = add i32 %indvars.iv30.in, -1
				%tmp11 = icmp sgt i32 %indvars.iv30, 0
				%smax = select i1 %tmp11, i32 %indvars.iv30, i32 0
				%tmp12 = zext i32 %smax to i64
				%sub = sub nsw i32 %p1, %c.0
				%cmp = icmp sgt i32 %sub, 0
				%sub. = select i1 %cmp, i32 %sub, i32 0
				%cmp326 = icmp sgt i32 %k.0, %p1
				br i1 %cmp326, label %for.cond4.preheader, label %for.body.preheader

				for.body.preheader: ; preds = %for.cond
				br label %for.body

				for.cond4.preheader: ; preds = %for.body, %for.cond
				%k.1.lcssa = phi i32 [ %k.0, %for.cond ], [ %add, %for.body ]
				%cmp528 = icmp sgt i32 %sub., %p1
				br i1 %cmp528, label %for.inc14, label %for.body6.preheader

				for.body6.preheader: ; preds = %for.cond4.preheader
				br i1 undef, label %for.body6, label %min.iters.checked

				min.iters.checked: ; preds = %for.body6.preheader
				br i1 undef, label %for.body6, label %vector.memcheck

				vector.memcheck: ; preds = %min.iters.checked
				%bound1 = icmp ule i32* undef, %scevgep41
				%memcheck.conflict = and i1 undef, %bound1
				br i1 %memcheck.conflict, label %for.body6, label %vector.body.preheader

				vector.body.preheader: ; preds = %vector.memcheck
				%lcmp.mod = icmp eq i64 undef, 0
				br i1 %lcmp.mod, label %vector.body.preheader.split, label %vector.body.prol

				vector.body.prol: ; preds = %vector.body.prol, %vector.body.preheader
				%prol.iter.cmp = icmp eq i64 undef, 0
				br i1 %prol.iter.cmp, label %vector.body.preheader.split, label %vector.body.prol

				vector.body.preheader.split: ; preds = %vector.body.prol, %vector.body.preheader
				%tmp13 = icmp ult i64 %tmp8, 24
				br i1 %tmp13, label %middle.block, label %vector.body

				vector.body: ; preds = %vector.body, %vector.body.preheader.split
				%index = phi i64 [ %index.next.3, %vector.body ], [ 0, %vector.body.preheader.split ]
				%index.next = add i64 %index, 8
				%offset.idx.1 = add i64 %tmp12, %index.next
				%tmp14 = getelementptr inbounds i32, i32* %tmp, i64 %offset.idx.1
				%tmp15 = bitcast i32* %tmp14 to <4 x i32>*
				%wide.load.1 = load <4 x i32>, <4 x i32>* %tmp15, align 4
				%tmp16 = getelementptr inbounds i32, i32* %tmp1, i64 %offset.idx.1
				%tmp17 = bitcast i32* %tmp16 to <4 x i32>*
				store <4 x i32> %wide.load.1, <4 x i32>* %tmp17, align 4
				%index.next.3 = add i64 %index, 32
				br i1 undef, label %middle.block, label %vector.body

				middle.block: ; preds = %vector.body, %vector.body.preheader.split
				br i1 undef, label %for.inc14, label %for.body6

				for.body: ; preds = %for.body, %for.body.preheader
				%k.127 = phi i32 [ %k.0, %for.body.preheader ], [ %add, %for.body ]
				%add = add nsw i32 %k.127, 1
				%tmp18 = load i32, i32* undef, align 4
				store i32 %tmp18, i32* @b, align 4
				br i1 undef, label %for.body, label %for.cond4.preheader

				for.body6: ; preds = %for.body6, %middle.block, %vector.memcheck, %min.iters.checked, %for.body6.preheader
				%indvars.iv32 = phi i64 [ undef, %for.body6 ], [ %tmp12, %vector.memcheck ], [ %tmp12, %min.iters.checked ], [ %tmp12, %for.body6.preheader ], [ undef, %middle.block ]
				%arrayidx8 = getelementptr inbounds i32, i32* %tmp, i64 %indvars.iv32
				%tmp19 = load i32, i32* %arrayidx8, align 4
				%arrayidx10 = getelementptr inbounds i32, i32* %tmp1, i64 %indvars.iv32
				store i32 %tmp19, i32* %arrayidx10, align 4
				%cmp5 = icmp slt i64 %indvars.iv32, undef
				br i1 %cmp5, label %for.body6, label %for.inc14

				for.inc14: ; preds = %for.body6, %middle.block, %for.cond4.preheader
				%inc15 = add nuw nsw i32 %c.0, 1
				%indvar.next = add i32 %indvar, 1
				br label %for.cond
				}

				attributes #0 = { norecurse noreturn nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				qcolombetUnsubmitted Not Done Reply Inline Actions Get rid of the attributes if they are not actually needed. qcolombet: Get rid of the attributes if they are not actually needed.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.

test/CodeGen/X86/new-remat.ll

				; RUN: llc < %s \| FileCheck %s
				; Check all spills are rematerialized.
				; CHECK-NOT: Spill

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@b = common global double 0.000000e+00, align 8
				@a = common global i32 0, align 4

				; Function Attrs: nounwind uwtable
				define i32 @uniform_testdata(i32 %p1) #0 {
				entry:
				qcolombetUnsubmitted Not Done Reply Inline Actions Use opt -instnamer to get rid of the %[0-9]+ variables. qcolombet: Use opt -instnamer to get rid of the %[0-9]+ variables.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
				%cmp3 = icmp sgt i32 %p1, 0
				br i1 %cmp3, label %for.body.preheader, label %for.end

				for.body.preheader: ; preds = %entry
				%0 = add i32 %p1, -1
				%xtraiter = and i32 %p1, 7
				%lcmp.mod = icmp eq i32 %xtraiter, 0
				br i1 %lcmp.mod, label %for.body.preheader.split, label %for.body.prol.preheader

				for.body.prol.preheader: ; preds = %for.body.preheader
				br label %for.body.prol

				for.body.prol: ; preds = %for.body.prol.preheader, %for.body.prol
				%i.04.prol = phi i32 [ %inc.prol, %for.body.prol ], [ 0, %for.body.prol.preheader ]
				%prol.iter = phi i32 [ %prol.iter.sub, %for.body.prol ], [ %xtraiter, %for.body.prol.preheader ]
				%1 = load double, double* @b, align 8
				%call.prol = tail call double @pow(double %1, double 2.500000e-01) #2
				%inc.prol = add nuw nsw i32 %i.04.prol, 1
				%prol.iter.sub = add i32 %prol.iter, -1
				%prol.iter.cmp = icmp eq i32 %prol.iter.sub, 0
				br i1 %prol.iter.cmp, label %for.body.preheader.split.loopexit, label %for.body.prol

				for.body.preheader.split.loopexit: ; preds = %for.body.prol
				%inc.prol.lcssa = phi i32 [ %inc.prol, %for.body.prol ]
				br label %for.body.preheader.split

				for.body.preheader.split: ; preds = %for.body.preheader.split.loopexit, %for.body.preheader
				%i.04.unr = phi i32 [ 0, %for.body.preheader ], [ %inc.prol.lcssa, %for.body.preheader.split.loopexit ]
				%2 = icmp ult i32 %0, 7
				br i1 %2, label %for.end.loopexit, label %for.body.preheader.split.split

				for.body.preheader.split.split: ; preds = %for.body.preheader.split
				br label %for.body

				for.body: ; preds = %for.body, %for.body.preheader.split.split
				%i.04 = phi i32 [ %i.04.unr, %for.body.preheader.split.split ], [ %inc.7, %for.body ]
				%3 = load double, double* @b, align 8
				%call = tail call double @pow(double %3, double 2.500000e-01) #2
				%4 = load double, double* @b, align 8
				%call.1 = tail call double @pow(double %4, double 2.500000e-01) #2
				%inc.7 = add nsw i32 %i.04, 8
				%exitcond.7 = icmp eq i32 %inc.7, %p1
				br i1 %exitcond.7, label %for.end.loopexit.unr-lcssa, label %for.body

				for.end.loopexit.unr-lcssa: ; preds = %for.body
				br label %for.end.loopexit

				for.end.loopexit: ; preds = %for.body.preheader.split, %for.end.loopexit.unr-lcssa
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				%5 = load i32, i32* @a, align 4
				ret i32 %5
				}

				; Function Attrs: nounwind
				declare double @pow(double, double) #1

				attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { nounwind }

test/CodeGen/X86/ragreedy-hoist-spill.ll

	; RUN: llc < %s -mtriple=x86_64-apple-macosx -regalloc=greedy \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-macosx -regalloc=greedy \| FileCheck %s

	; This testing case is reduced from 254.gap SyFgets function.			; This testing case is reduced from 254.gap SyFgets function.
	; We make sure a spill is not hoisted to a hotter outer loop.			; We make sure a spill is not hoisted to a hotter outer loop.
				; We make sure a spill is hoisted to a cold BB inside the hotter outer loop.

	%struct.TMP.1 = type { %struct.TMP.2, %struct.TMP.2, [1024 x i8] }			%struct.TMP.1 = type { %struct.TMP.2, %struct.TMP.2, [1024 x i8] }
	%struct.TMP.2 = type { i8, i32, i32, i16, i16, %struct.TMP.3, i32, i8, i32 (i8), i32 (i8, i8, i32), i64 (i8, i64, i32), i32 (i8, i8, i32), %struct.TMP.3, %struct.TMP.4*, i32, [3 x i8], [1 x i8], %struct.TMP.3, i32, i64 }			%struct.TMP.2 = type { i8, i32, i32, i16, i16, %struct.TMP.3, i32, i8, i32 (i8), i32 (i8, i8, i32), i64 (i8, i64, i32), i32 (i8, i8, i32), %struct.TMP.3, %struct.TMP.4*, i32, [3 x i8], [1 x i8], %struct.TMP.3, i32, i64 }
	%struct.TMP.4 = type opaque			%struct.TMP.4 = type opaque
	%struct.TMP.3 = type { i8*, i32 }			%struct.TMP.3 = type { i8*, i32 }

	@syBuf = external global [16 x %struct.TMP.1], align 16			@syBuf = external global [16 x %struct.TMP.1], align 16
	@syHistory = external global [8192 x i8], align 16			@syHistory = external global [8192 x i8], align 16
	▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	for.cond357:			for.cond357:
	br label %for.cond357			br label %for.cond357

	sw.bb474:			sw.bb474:
	%cmp476 = icmp eq i8 undef, 0			%cmp476 = icmp eq i8 undef, 0
	br i1 %cmp476, label %if.end517, label %do.body479.preheader			br i1 %cmp476, label %if.end517, label %do.body479.preheader

	do.body479.preheader:			do.body479.preheader:
				; CHECK: do.body479.preheader
				; spill is hoisted here. Although loop depth1 is even hotter than loop depth2, do.body479.preheader is cold.
				; CHECK: movq %r{{.*}}, {{[0-9]+}}(%rsp)
				; CHECK: land.rhs485
	%cmp4833314 = icmp eq i8 undef, 0			%cmp4833314 = icmp eq i8 undef, 0
	br i1 %cmp4833314, label %if.end517, label %land.rhs485			br i1 %cmp4833314, label %if.end517, label %land.rhs485

	land.rhs485:			land.rhs485:
	%incdec.ptr4803316 = phi i8* [ %incdec.ptr480, %do.body479.backedge.land.rhs485_crit_edge ], [ undef, %do.body479.preheader ]			%incdec.ptr4803316 = phi i8* [ %incdec.ptr480, %do.body479.backedge.land.rhs485_crit_edge ], [ undef, %do.body479.preheader ]
	%isascii.i.i27763151 = icmp sgt i8 undef, -1			%isascii.i.i27763151 = icmp sgt i8 undef, -1
	br i1 %isascii.i.i27763151, label %cond.true.i.i2780, label %cond.false.i.i2782			br i1 %isascii.i.i27763151, label %cond.true.i.i2780, label %cond.false.i.i2782

	cond.true.i.i2780:			cond.true.i.i2780:
	br i1 undef, label %land.lhs.true490, label %lor.rhs500			br i1 undef, label %land.lhs.true490, label %lor.rhs500

	cond.false.i.i2782:			cond.false.i.i2782:
	unreachable			unreachable

	land.lhs.true490:			land.lhs.true490:
	br i1 false, label %lor.rhs500, label %do.body479.backedge			br i1 false, label %lor.rhs500, label %do.body479.backedge

	lor.rhs500:			lor.rhs500:
	; CHECK: lor.rhs500			; CHECK: lor.rhs500
	; Make sure that we don't hoist the spill to outer loops.			; Make sure spill is hoisted to a cold preheader in outside loop.
	; CHECK: movq %r{{.*}}, {{[0-9]+}}(%rsp)			; CHECK-NOT: movq %r{{.*}}, {{[0-9]+}}(%rsp)
	; CHECK: callq {{.*}}maskrune			; CHECK: callq {{.*}}maskrune
	%call3.i.i2792 = call i32 @__maskrune(i32 undef, i64 256)			%call3.i.i2792 = call i32 @__maskrune(i32 undef, i64 256)
	br i1 undef, label %land.lhs.true504, label %do.body479.backedge			br i1 undef, label %land.lhs.true504, label %do.body479.backedge

	land.lhs.true504:			land.lhs.true504:
	br i1 undef, label %do.body479.backedge, label %if.end517			br i1 undef, label %do.body479.backedge, label %if.end517

	do.body479.backedge:			do.body479.backedge:
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

test/CodeGen/X86/vselect-minmax.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,882 Lines • ▼ Show 20 Lines	entry:
%cmp = icmp slt <8 x i64> %a, %b		%cmp = icmp slt <8 x i64> %a, %b
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test122(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test122(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test122:		; SSE2-LABEL: test122:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
		; SSE2-NEXT: movdqa %xmm11, %xmm8
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
		qcolombetUnsubmitted Not Done Reply Inline Actions Why is this happening? qcolombet: Why is this happening?
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test124(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test124(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test124:		; SSE2-LABEL: test124:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
; SSE2-NEXT: pand %xmm12, %xmm0		; SSE2-NEXT: pand %xmm12, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	entry:
%cmp = icmp ult <8 x i64> %a, %b		%cmp = icmp ult <8 x i64> %a, %b
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test126(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test126(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test126:		; SSE2-LABEL: test126:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
		; SSE2-NEXT: movdqa %xmm11, %xmm8
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test128(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test128(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test128:		; SSE2-LABEL: test128:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
; SSE2-NEXT: pand %xmm12, %xmm0		; SSE2-NEXT: pand %xmm12, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]
▲ Show 20 Lines • Show All 1,789 Lines • ▼ Show 20 Lines	entry:
%cmp = icmp slt <8 x i64> %a, %b		%cmp = icmp slt <8 x i64> %a, %b
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test154(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test154(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test154:		; SSE2-LABEL: test154:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
		; SSE2-NEXT: movdqa %xmm11, %xmm8
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
		qcolombetUnsubmitted Not Done Reply Inline Actions Ditto. qcolombet: Ditto.
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
▲ Show 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test156(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test156(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test156:		; SSE2-LABEL: test156:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
; SSE2-NEXT: pand %xmm12, %xmm0		; SSE2-NEXT: pand %xmm12, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]
▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	entry:
%cmp = icmp ult <8 x i64> %a, %b		%cmp = icmp ult <8 x i64> %a, %b
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test158(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test158(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test158:		; SSE2-LABEL: test158:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
		; SSE2-NEXT: movdqa %xmm11, %xmm8
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
▲ Show 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test160(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test160(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test160:		; SSE2-LABEL: test160:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
; SSE2-NEXT: pand %xmm12, %xmm0		; SSE2-NEXT: pand %xmm12, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]
▲ Show 20 Lines • Show All 2,533 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Greedy regalloc] Replace analyzeSiblingValues with something newClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 50643

include/llvm/CodeGen/LiveRangeEdit.h

lib/CodeGen/InlineSpiller.cpp

lib/CodeGen/LiveRangeEdit.cpp

lib/CodeGen/RegAllocBase.h

lib/CodeGen/RegAllocBase.cpp

lib/CodeGen/RegAllocBasic.cpp

lib/CodeGen/RegAllocGreedy.cpp

lib/CodeGen/RegAllocPBQP.cpp

lib/CodeGen/RegisterCoalescer.cpp

lib/CodeGen/Spiller.h

lib/CodeGen/SplitKit.h

lib/CodeGen/SplitKit.cpp

test/CodeGen/AArch64/aarch64-deferred-spilling.ll

test/CodeGen/ARM/subreg-remat.ll

test/CodeGen/SPARC/spill.ll

test/CodeGen/X86/avx512-bugfix-25270.ll

test/CodeGen/X86/fold-push.ll

test/CodeGen/X86/hoist-spill.ll

test/CodeGen/X86/new-remat.ll

test/CodeGen/X86/ragreedy-hoist-spill.ll

test/CodeGen/X86/vselect-minmax.ll

[Greedy regalloc] Replace analyzeSiblingValues with something new
ClosedPublic