This is an archive of the discontinued LLVM Phabricator instance.

[Greedy regalloc] Replace analyzeSiblingValues with something new
ClosedPublic

Authored by wmi on Dec 7 2015, 1:51 PM.

Download Raw Diff

Details

Reviewers

qcolombet
• tstellarAMD

Commits

rG9a16d655c718: Recommit r265547, and r265610,r265639,r265657 on top of it, plus two fixes with…
rG18293bef4e4e: Recommit r265309 after fixed an invalid memory reference bug happened when…
rGffbc9c7f3bd9: Replace analyzeSiblingValues with new algorithm to fix its compile time issue.
rL265547: Recommit r265309 after fixed an invalid memory reference bug happened
rL265309: Replace analyzeSiblingValues with new algorithm to fix its compile

Summary

The change is to solve PR17409 (https://llvm.org/bugs/show_bug.cgi?id=17409) and its several duplicates. The change is divided into three parts for easier review.

The major issue of analyzeSiblingValues is when a virtreg is splitted to N siblings with the same original VNI, and there are some PHIDefs generated in its splitting, the VNInfo of every sibling will be added to the Dependents of all other siblings, which creates a NxN network. traceSiblingValue is propagating SibValue info through this NxN network so it has NxN time complexity. In addition, selectOrSplit is called for all the N siblings sequentially. When reg pressure is high, a large percentage of siblings will be spilled (let's suppose N/2 siblings will be spilled), and traceSiblingValue will be called N/2 times indirectly from selectOrSplit, then there will be N^3 time complexity in total.

analyzeSiblingValues has two major usages: One is to figure out the SibValueInfo::SpillVNI of the virtReg to be spill so the spill can be hoisted to the place after SpillVNI->def and redundent spills are eliminated at the same time. The other is to trace the sibling copies back to the original value so the computation of the original value can be used for rematerialization. We replace analyzeSiblingValues by reimplementing these functionalities in Part1 and Part2.

Part1:
Instead of figure out the place to hoist spill for each virtReg to be spilled, we do that all at once when allocatePhysRegs is done. With all spills in place, we group spills with the same values (having the same OrigVNI). For each group of the spills with equal values, first we remove redundent spills dominated by other spill in the group, then traverse the dominate tree in post-order and hoist the spills to less frequent dominate tree node. Since spill can be hoisted to a cold dominate tree node without any sibling's VNI->def in the node, it can be better than the original implementation.

I didn't follow Jakub's proposal in pr17409 to change hoistCopiesForSize because redundent backcopies seen in one round of splitting is limited. Suppose Vreg1 is splitted to Vreg2, Vreg3 and Vreg4 in the first round of splitting, and Vreg4 is further splitted to Vreg5 and Vreg6 in the second round of splitting, the redundent backcopies between Vreg2=Vreg3 and Vreg5=Vreg6 cannot be found (I caught more than 100 such cases in llvm testsuites which left redundent spills in the final asm code)

Part2:
To find out the computation of the original value for rematerialization, we always query the inst at OrigVNI->def. To handle the case that the inst at OrigVNI->def has been removed during rematerialization, we change rematerialization to not delete the inst at OrigVNI->def even if it is already dead. In stead, we change the dest vreg to a new vreg (The new vreg will not be reg allocated so it will not affect the allocation of other vregs), save the inst to a set named as DeadRemats, and shrink the original dest vreg in the same way as previous. The insts in DeadRemats will be removed after allocatePhysRegs is done.

Part3:
The Part3 of the change is to clean up the code related with analyzeSiblingValues.

Test with all three parts combined on x86_64-linux-gnu:

The compile time for 1.c in pr24618 dropped from 0.34s to 0.25s The compile time for interpreter_goto_table_impl.ii in pr24618 dropped from 176.80s to 66.86s. I cannot verify the patch using tests in pr17409 because most bugs related with asan/ubsan have been workaround on sanitizer side.

llvm testsuites. Perf: mostly neutral except one perf regression I havn't addressed: SingleSource/Benchmarks/Misc/mandel. The reason has been understood. It is because we didn't do local spill hoist which the original implementation did. The usage of the local hoist is described in the comment in propagateSiblingValue (Starting with // This is an alternative def earlier in the same MBB.) The local hoist cannot be done after allocatePhysRegs so I will address it in a separate patch. CC time: MultiSource/Applications/sqlite3/sqlite3 has 1.5% improvement steadily.

I havn't cleaned up the llvm unit tests because I expect there will be many changes to the patches during the review. I will clean them up later.

Diff Detail

Repository: rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

wmi updated this revision to Diff 42080.Dec 7 2015, 1:51 PM

Herald added subscribers: qcolombet, MatzeB. · View Herald TranscriptDec 7 2015, 1:51 PM

Change std::set to SmallPtrSet and DenseSet.
Improve comments.

Hi Wei,

Thanks for working on this.

I haven’t looked into it yet, but I wanted to let you know this is on my todolist.

Thanks for your patience,
-Quentin

Hi Wei,

I had a quick look at the patch and although I believe it does the job done, I do not think this is the way to fix the problem.
My understanding is that you are trading an expensive value tracking mechanism with a fancy, but apparently less expensive, spill placement mechanism.

I think it is not the right approach because this creates a (bigger) gap between the cost model of the register allocator and the actual spill cost. Indeed, the cost model of the register allocator, w.r.t. spill cost, is basically reload before the uses, spill after the definitions.
In other word, it is better to keep the spiller simple but be smarter on the splitting of live-ranges so that the register allocator takes the right spilling/splitting decisions. That way, sharing spills/reloads will come naturally without to do anything in the spiller plus we may get better copies placement.

This was also, I believe, what Jakob had in mind when he described a solution in PR17409.

Concretely, what you should do:
0. Create a baseline for performance comparisons without any changes

Add an option to disable the InlineSpiller::analyzeSiblingValues, say -disable-spill-analyze-sibvalue
Benchmark with (-mllvm) -disable-spill-analyze-sibvalue (-mllvm) -split-spill-mode=size (we may want to use “speed" and improve that splitting mode instead of “size")
Investigate the regressions and/or file PRs
Fix the regressions

At this point, the new (or size) split mode should be better or equivalent to the baseline and we can just kill analyzeSiblingValue and make that new mode the default.

Hope that helps.

Cheers,
-Quentin

This revision now requires changes to proceed.Jan 8 2016, 3:00 PM

Thank you for the review.

I think it is not the right approach because this creates a (bigger) gap between the cost model of the register allocator and the actual spill cost. Indeed, the cost model of the register allocator, w.r.t. spill cost, is basically reload before the uses, spill after the definitions.
In other word, it is better to keep the spiller simple but be smarter on the splitting of live-ranges so that the register allocator takes the right spilling/splitting decisions. That way, sharing spills/reloads will come naturally without to do anything in the spiller plus we may get better copies placement.

This was also, I believe, what Jakob had in mind when he described a solution in PR17409.

I have some different ideas here because of the following three things:

I tried Jakob's solution when I just started to work on this problem, but quickly I realized a fundamental problem there compared with the existing way using InlineSpiller::analyzeSiblingValues. For Jakob's solution, it can only see the spills generated from siblings splitted from current VirtReg, .i.e, in current round of selectOrSplit. Let's say current VirtReg R1 is splitted into siblings: R2, R3, R4 (Suppose R2 is for the remainder interval). Suppose R4 is further splitted in the next round of selectOrSplit and a new remainder interval R5 is generated. If a spill for R2 dominate a spill for R5, Jakob's solution cannot remove the spill for R5 because they are generated in different rounds of selectOrSplit. InlineSpiller::analyzeSiblingValues doesn't have the issue and that is exactly why InlineSpiller::analyzeSiblingValues has to pay so much cost to track the siblings with equal values.

To evaluate how serious the problem above is, I did some experiment using Jakob's solution. I wrote a sanity check to caught redundent spills left over in the final stage (To make sure no later phase will clean up the redundent spills) because of the issue above, and I caught such cases in about 100 files when building llvm testsuite. And when I had the solution in this patch ready, I also used the sanity check and have ensured all those redundent spills have been cleaned up.

In Jakob's solution, the spills sharing work is done mostly in func SplitEditor::hoistCopiesForSize. hoistCopiesForSize is called inside SplitEditor::finish (last step of splitting) .i.e, it hoist spills and removes redundent spills after the splitting decision has been done for the current VirtReg. So for Jakob's solution, it has the same cost model issue as what you described here.

For my solution, the major func is called after RegAllocBase::allocatePhysRegs. That is to say it keeps the reload/spills in the original places (reload before the uses, spill after the definitions) during RegAllocBase::allocatePhysRegs, and only try to share/hoist spills after most of the regalloc work is done. So I think my solution seems to be more close to your idea here. It is like a cleanup work after regalloc, which has simpler logic and clearer impact which will be easier for performance tuning. In comparison, Jakob mentioned in PR17409 that enabling either split spill mode is going to affect the live range splitting algorithm a lot, and somebody has to track down the regressions and fix them.

Concretely, what you should do:
0. Create a baseline for performance comparisons without any changes

Add an option to disable the InlineSpiller::analyzeSiblingValues, say -disable-spill-analyze-sibvalue

Benchmark with (-mllvm) -disable-spill-analyze-sibvalue (-mllvm) -split-spill-mode=size (we may want to use “speed" and improve that splitting mode instead of “size")

Investigate the regressions and/or file PRs

Fix the regressions

At this point, the new (or size) split mode should be better or equivalent to the baseline and we can just kill analyzeSiblingValue and make that new mode the default.

Actually when I was working on the patch, I followed your steps here to improve it gradually by comparing with existing implementation and fixing regressions.

Another thing is: No matter which solution is adopted in the end, Part2 is needed because of there is no way to get DefMI for rematerialization after removing InlineSpiller::analyzeSiblingValues.

Thanks,
Wei.

Hi Wei,

Couple of comments:

Out of curiosity, do you have numbers of how many redundant spills we have for the current solution?
I am not saying the live-range splitting mechanism is perfect, but it fits nicely to the exiting framework, in particular w.r.t. the way we model the spill cost. Which leads me to #2.

hoistCopiesForSize is for Copies AFAIR, i.e., split points not spills. That means that, IIRC, we have the save cost model, since we do not care about split insertion in the cost.

I agree the spill hoisting thing is more like a post reg alloc phase. Therefore, I would rather have it a post regalloc pass instead of embedded in the spiller. That being said, I understand this is easier to directly work with the virtual registers.

To summarize, this is fine to have a the spill hoisting optimization where you put it. I believe though that making the splitting smarter would be the first logical step to mitigate the need for such optimization and I am still concerned that it may be bad for compile time.

If you’d like to pursue into that direction anyway, it is okay, just couple of inlined comments.

As for path part 2, it does not do anything at the moment does? (I.e., we clear the set before walking through it).

Thanks,
-Quentin

include/llvm/CodeGen/VirtRegMap.h
66 ↗	(On Diff #44027)	Is there a way this could be computed from the split map?
lib/CodeGen/InlineSpiller.cpp
221	Can set private, right?
222	Ditto.
224	Ditto.
228	Ditto.
lib/CodeGen/Spiller.h
34	Call this method postOptimization and make it a non-abstract method. We do not want the spillers existing out of tree to have to add a default implementation whereas they do not need to do anything.

#1

Out of curiosity, do you have numbers of how many redundant spills we have for the current solution?

I am not saying the live-range splitting mechanism is perfect, but it fits nicely to the exiting framework, in particular w.r.t. the way we model the spill cost. Which leads me to #2.

I did that experiment -- trying to catch redundent spills in current analyzeSiblingValues solution. I did catch some in llvm testsuite(~20, I don't remember exactly), but most of them are left there because of the HoistCondition checking in func propagateSiblingValue, .i.e, because of HoistCondition, current solution left some not very important but fully redundent spills in the code.

#2

hoistCopiesForSize is for Copies AFAIR, i.e., split points not spills. That means that, IIRC, we have the save cost model, since we do not care about split insertion in the cost.

Could you elaberate what the save cost model mean here?

#3

I agree the spill hoisting thing is more like a post reg alloc phase. Therefore, I would rather have it a post regalloc pass instead of embedded in the spiller. That being said, I understand this is easier to directly work with the virtual registers.

To summarize, this is fine to have a the spill hoisting optimization where you put it. I believe though that making the splitting smarter would be the first logical step to mitigate the need for such optimization and I am still concerned that it may be bad for compile time.

I compared the compile time between Jakob's solution (-disable-spill-analyze-sibvalue + -split-spill-mode=size) and my solution for some motivational testcases. They are the same. I will do more careful tests on this side, like using spec.

If you’d like to pursue into that direction anyway, it is okay, just couple of inlined comments.

As for path part 2, it does not do anything at the moment does? (I.e., we clear the set before walking through it).

The set DeadRemats is used to record def instructions which should have been removed when they are found to be dead after rematerialization. However, the def may still be useful for rematerialization of other siblings (Note without DefMI setting in analyzeSiblingValues, for all the siblings with equal values, the original_register_VNI->def is the best place to query the value expression. If the original def is deleted, we have no place to query value expression for rematerialization of siblings in the following rounds of selectOrSplit). So we decide to keep the dead instructions in their original places during the whole lifetime of allocatePhysRegs and use the set of DeadRemats to hold them (Changing the dest reg to a new dummy reg which will never be added to NewVRegs, so the live range can be updated properly in the same way as before). Those dead defs in DeadRemats are deleted after allocatePhysRegs is done.

I noticed a weakness of my hoistSpill patch: When a redundent spill is deleted, the RHS register may become dead and its live range can be shrinked. However, hoistSpill is done after register assignment so it cannot utilize the benefit of live range shrinking caused by deleting redundent spills. I also caught some testcases producing non-optimal code because of it.

To solve it, the best way now I can think of is to combine -split-spill-mode=size (Jakob's solution) and the hoistSpill patch here. So common cases of redundent spills can be deleted by -split-spill-mode=size during register allocation and redundent spills generated from different splitting rounds will be cleaned up by hoistSpill patch here. This combined way generated the best code from my analysis of llvm testsuites.

About compile time, I used spec2006 C/C++ benchmarks to do the evaluation.
hoistSpill + -split-spill-mode=size compared with base: -0.70% compile time decrease on average.
hoistSpill + -split-spill-mode=size compared with -split-spill-mode=size only: +0.18% compile time increase on average.

performance is generally neutral for llvm-testsuite and google benchmarks. except SingleSource/Benchmarks/Misc/mandel's degradation caused by problem1 below.

I reevaluated performance for hoistSpill + -split-spill-mode=size for llvm testsuite and google benchmarks. Generally they are neutral compared with trunk, except SingleSource/Benchmarks/Misc/mandel. mandel degradation is caused by problem1 below.

Other changes:
Addressed Quentin's comments. Code reorganized -- add a HoistSpiller class. Fix some bugs when -regalloc=pbqp and -regalloc=basic are used. Fix unit tests and add new unit tests.

problems unaddressed:

Spill hoist inside BB. propagateSiblingValue has a good description about its benefit in the comment, and I found testcases generating non-optimal code without it. I plan to address it separately.

I deleted CodeGen/AArch64/aarch64-deferred-spilling.ll but I havn't got a good replacement for it. With the hoistSpill + -split-spill-mode=size patch, the pattern checked by the test doesn't appear anymore, and the test is relatively large so it is not easy to look closely whether it is just transformed to another appearance. I did see many cases that defer spills can get phyregs in the end and I got a few small testcases on x86. However, those are still somewhat fragile -- I found when I changed regalloc a little bit, the pattern disappeared.

Thanks,
Wei.

Herald added subscribers: jyknight, srhines, danalbert, tberghammer. · View Herald TranscriptJan 26 2016, 10:20 AM

wmi marked 6 inline comments as done.Jan 26 2016, 10:26 AM

wmi added inline comments.

include/llvm/CodeGen/VirtRegMap.h
66 ↗	(On Diff #44027)	Yes, I removed Virt2SiblingsMap and computed it from split map.

Add SM_Speed split mode and use it as default. SM_Size sometimes will hoist spills from cold region in inner loop to hot region in outer loop, which is bad for performance. SM_Speed will only try to hoist spills from hot region to cold region. If it fails to hoist all the spills to a cold place, step back and remove spills dominated by others.

Compare "hoistSpill + split-spill-mode=speed" with "hoistSpill + split-spill-mode=size", an internal benchmark gets 1.5% improvement. llvm testsuite has no perf change.

Hi Mikhail,

Could you check how this patch impacts our performance?

Thanks,
-Quentin

Hi Mikhail,
Could you check how this patch impacts our performance?

Thanks. This is the patch "hoistspill + split-mode-speed [Part1] + redo rematerialize [Part2] + remove analyzeSiblingValues [Part3]" merged together, which I used to do the performance testing.
// I divided the patches for easier review. To merge the separate patches together, it needs to resolve some conflicts.

Ping.

Hi Wei,

Could you rebase your patches, they do not apply cleanly for me.

Also, while I am here, a couple of inlined comments :).

Cheers,
-Quentin

include/llvm/CodeGen/LiveRangeEdit.h
148 ↗	(On Diff #46399)	This seems strange that the API allows to drop some of the new registers. At the very least, we should document (i.e., put explanatory comments) why this is useful and why it is okay to drop such references. In general, it should not.
184 ↗	(On Diff #46399)	Instead of having at additional field which may be the same as ParentVNI in a lot of cases, this one could be computed. Then, if this has some performance problem, we can think of a better caching mechanism.
235 ↗	(On Diff #46399)	Replace 'it' by this live interval or something. The context is now high in the source file and repeating it wouldn't hurt IMO.
lib/CodeGen/InlineSpiller.cpp
58	Since this class does not inherit from Spiller, what about naming it HoistSpillHelper or something.

I am rebasing the patch now.

Wei.

Rebase the merged patch to r262808. Patch verified using llvm testsuite on x86-64 and qemu-aarch64.

Hi Wei,

The benchmarks are still running, but so far, the numbers look good.

Anyhow, I finally made time for the review.

Generally speaking this looks almost good to me. The quadratic behavior of the first loop in runHoistAllSpills scares me and we need to look for a better alternative.
Moreover we need better comments for the APIs.
Finally, the tests change with more moves are worrisome. Could you explain why this happens and how we will fix that?
It seems to me we are choosing an insertion point for the store that happens too late.

I have highlighted all my concerns with the inline comments.

Cheers,
-Quentin

include/llvm/CodeGen/LiveRangeEdit.h
123 ↗	(On Diff #50643)	Put that as at the end of the list with nullptr as default parameter.
221 ↗	(On Diff #50643)	Maybe just say that DeadRemats is an optional field. Mentioning Greedy here does not bring any value IMO.
lib/CodeGen/InlineSpiller.cpp
76	mergeable
81	Do not repeat the name of the field in the comment.
83	[…] as the source (instead of RHS) of the new ..
84	How big are the sets? I would expect very few siblings on average and was wondering if a SmallSetVector or SmallSet would be more appropriate.
87	Please use reference for values that cannot be nullptr. I.e., OrigVNI and BB.
92	SpillsToKeep
267	Hid the instantiation of the hoist spiller helper in the inliner spiller. The positive side effect is that we won’t leak the memory!
277	Hid the call to the hoist spiller helper in the inliner spiller.
1518	.i.e => i.e.
1518	Use doxygen style comment.
1530	DenseSet does not guarantee that the iteration order is stable from one run to another, does it? Although we should have several siblings live at the same time, this is theoretically possible. In other words, we should use a container that has a deterministic iteration order for reproducibility. See the earlier suggestion I made.
1544	This method would benefit a more verbose comment. Maybe something along the line: Starting from \p Root find a top-down traversal order of the dominator tree to visit all basic blocks containing the elements of \p Spills. Redundent spills will be found and put into \p SpillsToRm at the same time. \p SpillBBToSpill will populated as part of the process and maps a basic block to the first store occurring in the basic block. \post SpillToRm.union(Spills@post) == Spills@pre What is the usage of SpillsToKept? In particular the unsigned part? Should we consider to moving some of the arguments to field of the current hoistspill instance?
1544	This method does a bunch of things! Although I understand we want to share the logic that does the traversal and such, I found that it makes the code harder to read. I’d say as it is now and with more comment like I suggested, this is fine, but in general I rather have a better separation of concerns then try to optimize if it turns out to be problematic. I am guessing you already went through that process, we are just lacking the history :).
1563	Any chance this could be updated when we insert the spill? Like I said, it just feels like getVisitOrders does too many things.
1571	Please document what is WorkSet supposed to contain.
1573	I think we should describe what is the expected root, because it seems strange to me that we don’t just take the node for the Root.
1585	More comments, e.g., Node dominates Block and already store the value. This store is redundant.
1598	Ok, found the meaning of the unsigned elsewhere… A comment here as well would be great.
1618	Assert Orders.size == WorkSet.size?
1640	We do not insert the original store, it is already there, right?
1655	I believe we usually use bottom-up instead of bottom-top.
1657	have
1694	If the subtrees get big, we will end up recomputing this cost a bunch of time. Could it be something we keep alongside the subtree?
1701	We could add a mode for the hoist spiller, where code size is the priority. I.e., always hoist when SpillsInSubTree.size > 1 A follow-up patch is fine.
1728	typo.
1730	Variable must start with a capital letter. Also why use ent for the name?
1738	Explain the general algorithm here.
1754	This loop scares me. Any chance this information can be built as we insert spill.
1762	empty
lib/CodeGen/RegAllocGreedy.cpp
402	I would have put this into the base class.
2564	spiller().postOptimization()
2589	Should be created within the inline spiller. See my comment on createInlineSpiller.
lib/CodeGen/RegAllocPBQP.cpp
130 ↗	(On Diff #50643)	This should be a separate patch.
156 ↗	(On Diff #50643)	Ditto.
731 ↗	(On Diff #50643)	We shouldn’t have to touch this.
lib/CodeGen/RegisterCoalescer.cpp
463 ↗	(On Diff #50643)	We shouldn’t have to touch this.
lib/CodeGen/Spiller.h
38	add a bool here that default to false for using a post optimization.
46	Get rid of those.
lib/CodeGen/SplitKit.cpp
727 ↗	(On Diff #50643)	Please don’t repeat the comment from the declaration.
742 ↗	(On Diff #50643)	i.e.
749 ↗	(On Diff #50643)	We should we start iterating with the next iterator instead of starting over. The next call to count should early continue the loop but still!
1134 ↗	(On Diff #50643)	Add a comment saying that hoistCopies will behave differently between size and speed, otherwise it feels like those modes are the same.
lib/CodeGen/SplitKit.h
333 ↗	(On Diff #50643)	Don’t repeat the name of the method.
335 ↗	(On Diff #50643)	Given how this is used, the actual name of this method should be computeRedundentBackCopies.
338 ↗	(On Diff #50643)	Ditto.
test/CodeGen/X86/vselect-minmax.ll
4898	Why is this happening?
7620	Ditto.

This revision now requires changes to proceed.Mar 15 2016, 5:37 PM

Quentin, I addressed most of your comments.

Major changes or changes needs to pay attention to:

Added the patch that hoist Spill inside of BB to earlier place when the src of the spill is killed. It is done in InlineSpiller::hoistSpillInsideBB. With this part of change, some test changes are removed.

I am not sure I made the exact change as you expect about where to put postOptimization.

I found there was a comment in my previous patch saying DeadRemat is non-null only when regalloc is Greedy. It was wrong. All kinds of register allocator share the same InlineSpiller logic, so DeadRemat and original eliminateDeadRemats (Now it is put into RegAllocBase::postOptimization and RegAllocPBQP::postOptimization) are also necessary for PBQP and Basic.

I also noticed hoistCopies can be improved further. I plan to address it in the following patch.
With HoistSpillHelper, We still need hoistCopies when split-spill-mode=Speed because after removing some redundent spills, the sources of those spills may be shrinked. But when I addressed the review comments, I also found there is case that removing redundent spills not only cannot shrink the source of redundent spills, but also lengthen the live range of dst of those spills. Since we don't depend on hoistCopies to remove redundent spills (HoistSpillHelper can do that work better), we can change hoistCopies to remove spills only when it can shrink the source of redundent spills or at least not lengthen the live range of dst of spills. I can possibly do that by removing hoistCopies and extend hoistSpillInsideBB to handle cases across BBs -- to hoist spill only when its source is killed.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptMar 21 2016, 9:45 AM

wmi added inline comments.Mar 21 2016, 9:46 AM

lib/CodeGen/SplitKit.cpp
742 ↗	(On Diff #50643)	Fixed.

wmi added inline comments.Mar 21 2016, 9:46 AM

include/llvm/CodeGen/LiveRangeEdit.h
123 ↗	(On Diff #50643)	Fixed.
148 ↗	(On Diff #50643)	I Added comments to explain it. In short, we don't want to allocate phys register for the dummy register used as temporary dst register of instruction in DeadRemats set.
184 ↗	(On Diff #50643)	You are right. I don't have to save OrigVNI in struct Remat. Instead, add a parameter VNInfo *OrigVNI for LiveRangeEdit::canRematerializeAt.
221 ↗	(On Diff #50643)	Fixed.
235–239 ↗	(On Diff #50643)	Fixed.
lib/CodeGen/InlineSpiller.cpp
81	Fixed
83	Fixed
84	Most of the cases the size of it is less than 16 I guess, so I use SmallSetVector instead. I cannot use SmallSet because the set needs to be iterated.
87	Fixed.
92	Fixed.
267	Make HoistSpillHelper a field in inline spiller.
277	I Added postOptimization as a pure virtual func in class Spiller, and put the code of hoist spiller helper inside of InlineSpiller::postOptimization.
1518	Fixed.
1518	Fixed.
1530	Yes, when turn on split-mode-size, after hoistCopies, it is possible that several siblings live at the same time. it is not just therotically possible but realistic. I use SmallSetVector instead here.
1544	Thanks for your example comment. It is good. I copy most of them to the code. Should we consider to moving some of the arguments to field of the current hoistspill instance? hoistSpillHelper now has the same lifetime as InlineSpiller instance, so the lifes of those arguments are much shorter than that. That is why I choose to keep them as func local objects.
1544	After separate part of the work into HoistSpillHelper::rmRedundentSpills and add more comments in the func body, it may make the code easier to read now.
1563	I separate the first part of the work to another func: HoistSpillHelper::rmRedundentSpills, so HoistSpiller::getVisitOrders is more focus on what its name describes.
1571	Done.
1573	Done.
1585	Done.
1598	Done.
1618	Done.
1640	Yes, that is right.
1655	Fixed.
1657	Fixed.
1694	I keep it along side the subtree in SpillsInSubTreeMap.
1701	Ok, I will do it in a follow-up patch.
1728	Fixed.
1730	Fixed.
1738	Done.
1754	I simply remove the inner loop. SlotToOrigReg map will become somewhat bigger, but not a lot.
1762	Fixed.
lib/CodeGen/RegAllocGreedy.cpp
402	Done.
2564	Done.
2589	Done.
lib/CodeGen/RegAllocPBQP.cpp
130 ↗	(On Diff #50643)	InlineSpiller is shared by all kinds of register allocator, so the DeadRemats logic is also needed by PBQP. If I separate the part out, I need to fix the related unit test.
731 ↗	(On Diff #50643)	My original comment that DeadRemats is non-empty only when the regalloc is Greedy is wrong. Actually, InlineSpiller and related Remat logic are shared by all register allocators. And RegAllocPBQP is not a subclass of RegAllocBase, so the code is needed.
lib/CodeGen/RegisterCoalescer.cpp
463 ↗	(On Diff #50643)	Fixed.
lib/CodeGen/Spiller.h
38	I don't get the intention to add a bool here. Is it used to guard post optimization? why it is needed?
46	Done.
lib/CodeGen/SplitKit.cpp
727 ↗	(On Diff #50643)	Comment removed.
749 ↗	(On Diff #50643)	Fixed.
1134 ↗	(On Diff #50643)	Added.
lib/CodeGen/SplitKit.h
333 ↗	(On Diff #50643)	Fixed.
335 ↗	(On Diff #50643)	Fixed.
338 ↗	(On Diff #50643)	Fixed.

• tstellarAMD added inline comments.Mar 21 2016, 10:28 AM

test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
24 ↗	(On Diff #51175)	This is change confusing to me, because if we only use 254 VGPRs then there shouldn't be any spills, but there are still spill instructions being emitted. It seems like this is probably a bug, but I will need to look at it more closely to see if it is an AMDGPU bug or a generic regalloc bug.

I noticed that even without my change, although compiler output "GCN:
NumVgprs is 256", when I looked at the trace of -debug-only=regalloc,
I found there were some VGPR unused.

Here is what I did:
~/workarea/llvm-r262808/dbuild/./bin/llc -march=amdgcn -mcpu=tahiti
-mattr=+vgpr-spilling -verify-machineinstrs <
~/workarea/llvm-r262808/src/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
-debug-only=regalloc >/dev/null 2>out1

Delete the trace from out1 before the section of "REGISTER MAP", then
execute the command below:
for ((i=0; i<256; i++)); do

grep "VGPR$i[^0-9]" out1 &>/dev/null
if [[ "$?" != "0" ]]; then
  echo VGPR$i
fi

done

The output is:
VGPR40
VGPR189
VGPR190

So even if the compiler says GCN: NumVgprs is 256, there are three
VGPRs never used.

Thanks,
Wei.

Hi Wei,

I believe we are almost done. Thanks for your work and patience on this.

There are mainly three items to address:

There are typos widely spread in the file; mergable -> mergeable, redundent -> redundant
Do not repeat method names on comment.
Fix on the test cases. See the inline comment.

As for the benchmarking, almost all the diffs came back as improvement of up to 7%! This is impressive.
The regressions seem like side effect, i.e., we generate less load pair in a few case, because the related spill slots are not next to each other anymore. This was luck previously.

Anyhow, looking forward for the final fix-ups.

Cheers,
-Quentin

include/llvm/CodeGen/LiveRangeEdit.h
77 ↗	(On Diff #51175)	Switch to SmallPtrSetImpl, this the size of the type is not relevant.
152 ↗	(On Diff #51175)	Don’t repeat the method name in the comment.
lib/CodeGen/InlineSpiller.cpp
78	Mergeable. Do a search, the typo is widely spread :).
426	Don’t repeat the method name.
1494	Don’t repeat the name of the method.
1505	More mergeable typos...
lib/CodeGen/LiveRangeEdit.cpp
382 ↗	(On Diff #51175)	Some update problem I believe.
lib/CodeGen/RegAllocBase.cpp
159 ↗	(On Diff #51175)	Capitale letter for the first letter of the variable name.
lib/CodeGen/RegAllocBase.h
88 ↗	(On Diff #51175)	Add virtual keyword. Subclasses may want to do additional things.
lib/CodeGen/RegAllocPBQP.cpp
727 ↗	(On Diff #51175)	Variables start with a capital letter.
lib/CodeGen/Spiller.h
31–32	Other spillers out-of-tree may exist and there is little interest in having them to implement a post optimization method if they do not need it. In other words, instead of a pure virtual method, do nothing for the default implementation.
38–46	I was thinking in case we want to test without the post-optimization. But I am fine if it is always enabled.
lib/CodeGen/SplitKit.h
335 ↗	(On Diff #51175)	Typo: redundant
test/CodeGen/X86/hoist-spill.ll
2	Make this a file check test.
116	Get rid of the attributes if they are not actually needed.
test/CodeGen/X86/new-remat.ll
13	Use opt -instnamer to get rid of the %[0-9]+ variables.

In D15302#379497, @wmi wrote:
I noticed that even without my change, although compiler output "GCN:
NumVgprs is 256", when I looked at the trace of -debug-only=regalloc,
I found there were some VGPR unused.

Here is what I did:
~/workarea/llvm-r262808/dbuild/./bin/llc -march=amdgcn -mcpu=tahiti
-mattr=+vgpr-spilling -verify-machineinstrs <
~/workarea/llvm-r262808/src/test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot.ll
-debug-only=regalloc >/dev/null 2>out1

Delete the trace from out1 before the section of "REGISTER MAP", then
execute the command below:
for ((i=0; i<256; i++)); do
grep "VGPR$i[^0-9]" out1 &>/dev/null
if [[ "$?" != "0" ]]; then
  echo VGPR$i
fi
done

The output is:
VGPR40
VGPR189
VGPR190

So even if the compiler says GCN: NumVgprs is 256, there are three
VGPRs never used.

NumVgprs is the number of VGPRs that need to be allocated for the program, so the fact that there are gaps doesn't matter (though this is strange). If you use only register v255, you still need to allocate all 256 registers.

Fix all the comments Quentin suggested. Thanks for the careful review.

lib/CodeGen/InlineSpiller.cpp
78	Fixed.
426	Fixed.
1494	All similar comments fixed.
1505	All such typos Fixed.
lib/CodeGen/LiveRangeEdit.cpp
382 ↗	(On Diff #51175)	Fixed.
lib/CodeGen/RegAllocBase.cpp
159 ↗	(On Diff #51175)	Fixed.
lib/CodeGen/RegAllocBase.h
88 ↗	(On Diff #51175)	Fixed.
lib/CodeGen/RegAllocPBQP.cpp
727 ↗	(On Diff #51175)	Fixed.
lib/CodeGen/Spiller.h
31–32	Make sense. Fixed.
38–46	Ok, I leave it there for now.
lib/CodeGen/SplitKit.h
335 ↗	(On Diff #51175)	Fixed here and many other places.
test/CodeGen/X86/hoist-spill.ll
2	I felt the file check test was not as general as the above test, but filecheck can still work, so I switch to file check here.
116	Fixed.
test/CodeGen/X86/new-remat.ll
13	Fixed.

Hi Wei,

I think we will need to wait for Tom to double check what happened for AMDGPU.

One question though, this revision ended up being the combination of the 3 parts, right?

Cheers,
-Quentin

test/CodeGen/X86/hoist-spill.ll
3	You could check where the spills actually are. But it already looks pretty good now :).

So even if the compiler says GCN: NumVgprs is 256, there are three
VGPRs never used.

NumVgprs is the number of VGPRs that need to be allocated for the program, so the fact that there are gaps doesn't matter (though this is strange). If you use only register v255, you still need to allocate all 256 registers.

Hi Tom,

I found with my patch here, the Spill num for the testcase increases
from 68 to 152, and Reload num increases from 72 to 188. I havn't
throughly understood what is wrong here, but I can roughly describe
how the problem happen and say it may be a problem of local splitting,
instead of my patch.

In the testcase, there are roughly 64 VReg_128 vars overlapping with
each other consuming all the 256 VGPRs and some other scattered VGPR
uses. Each VReg_128 var occupies 4 consecutive VGPRs, so VGPR
registers are allocated in this way: vreg1: VGPR0_VGPR1_VGPR2_VGPR3;
vreg2: VGPR4_VGPR5_VGPR6_VGPR7; ......

Because we have some other scattered VGPR uses, we cannot allocate all
the 64 VReg_128 vars in register, so splitting is needed. region
splitting will not bring trouble because it only tries to fill holes,
i.e., vregs after the splitting usually will not evict other vregs.
local splitting can bring a lot of mess to the allocation here.
Suppose it tries to find a local gap inside BB to split vreg3
(VReg_128 type). After the local split is done, vreg3 will be splitted
into vreg3-1 and vreg3-2. vreg3-1 and vreg3-2 have short live ranges
so both of them have relatively larger weight. vreg3-1 may find a hole
and is allocated to VGPR2_VGPR3_VGPR4_VGPR5, then vreg3-2 will get a
hint of VGPR2_VGPR3_VGPR4_VGPR5 and will evict vreg1
(VGPR0_VGPR1_VGPR2_VGPR3) and vreg2 (VGPR4_VGPR5_VGPR6_VGPR7) above.
To find consecutive VGPRs for vreg1 and vreg2, reg alloc will do more
region splitting/local splitting and more evictions, and causes more
and more vregs hard to find consecutive VGPRs.

With my patch, it will add one more VReg_128 interval during splitting
because of hoisting (This is a separate problem I described in a TODO
about improving hoistCopies in previous reply). To allocate the
VReg_128 var, it triggers more region splitting and local splitting,
and makes more vars spilled.

To show the problem, I experimentally turn off local splitting for
trunk without my patch, the Spill num for the testcase drops from 68
to 56, and Reload num drops from 72 to 36. When turn off local
splitting for trunk with my patch, the Spill num for the testcase
drops from 152 to 24, and Reload num drops from 188 to 24.

So this is probably a separate issue for architecture using
consecutive combined registers for large data type.

Thanks,
Wei.

Hi Tom,

Do you think the issue is a blocker for this patch or a separated one?
Want to get your confirmation so I can decide how to push the work
forward.

As for using 254 VGPRs instead of 256 VGPRs, I think it just cannot
find 4 consecutive VGPRs for VReg_128 data. The holes in the end (v254
v255) have no difference with holes in the middle. Is it correct?

Thanks,
Wei.

I think your analysis is correct about why it doesn't use all 256 register. I actually hit this same thing in another patch I'm working on. I have to objections to this patch being pushed.

It turns out this was ready just in time: we just noticed that r263460 essentially undermines all of the work to avoid PR17409, and we now have widespread superlinear compile times with sanitizers (and possibly other code).

Just wanted to confirm with you Tom that this LGTM, and encourage Wei to go ahead and land it as soon as Tom acks. =D We have a *bunch* of stuff blocked on the compile time issues here.

One minor nit below.

lib/CodeGen/InlineSpiller.cpp
59	I get a warning saying this is unused when building with this patched in...

Thanks for the support of this patch. Looks like Tom's "to objection"
is a typo of "no objection". I will prepare to commit the patch.

Wei.

Fix my mistake introduced when I was addressing the review comments:

I accidentally remove the virtual keyword of postOptimization in lib/CodeGen/Spiller.h. It should not be a pure virtual function, but still should be virtual.

This will fix the warning Chandler saw.

Closed by commit rL265309: Replace analyzeSiblingValues with new algorithm to fix its compile (authored by wmi). · Explain WhyApr 4 2016, 9:48 AM

This revision was automatically updated to reflect the committed changes.

wmi retitled this revision from [Greedy regalloc] Replace analyzeSiblingValues with something new [Part1] to [Greedy regalloc] Replace analyzeSiblingValues with something new.Apr 4 2016, 9:49 AM

chandlerc mentioned this in rL265331: Revert r263460: [SpillPlacement] Fix a quadratic behavior in spill placement..Apr 4 2016, 12:03 PM

Hi Quentin,

Recently, I committed some bug fixes for the patch here without getting approvement first because I think they are relatively trivial. Please give them a postcommit review: http://reviews.llvm.org/D18934

There are another two fixes which are somewhat substantial, which I think needs to be reviewed before commit.
http://reviews.llvm.org/D18935
http://reviews.llvm.org/D18936

Thanks,
Wei.

Revision Contents

Path

Size

lib/

CodeGen/

InlineSpiller.cpp

431 lines

RegAllocGreedy.cpp

36 lines

Spiller.h

9 lines

test/

CodeGen/

AArch64/

aarch64-deferred-spilling.ll

514 lines

ARM/

subreg-remat.ll

6 lines

SPARC/

spill.ll

13 lines

X86/

avx512-bugfix-25270.ll

2 lines

fold-push.ll

10 lines

hoist-spill.ll

115 lines

new-remat.ll

75 lines

ragreedy-hoist-spill.ll

10 lines

vselect-minmax.ll

28 lines

win-catchpad.ll

4 lines

Diff 46011

lib/CodeGen/InlineSpiller.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
STATISTIC(NumRemats, "Number of rematerialized defs for spilling");		STATISTIC(NumRemats, "Number of rematerialized defs for spilling");
STATISTIC(NumOmitReloadSpill, "Number of omitted spills of reloads");		STATISTIC(NumOmitReloadSpill, "Number of omitted spills of reloads");
STATISTIC(NumHoists, "Number of hoisted spills");		STATISTIC(NumHoists, "Number of hoisted spills");

static cl::opt<bool> DisableHoisting("disable-spill-hoist", cl::Hidden,		static cl::opt<bool> DisableHoisting("disable-spill-hoist", cl::Hidden,
cl::desc("Disable inline spill hoisting"));		cl::desc("Disable inline spill hoisting"));

namespace {		namespace {
		class HoistSpiller {
		qcolombetUnsubmitted Not Done Reply Inline Actions Since this class does not inherit from Spiller, what about naming it HoistSpillHelper or something. qcolombet: Since this class does not inherit from Spiller, what about naming it HoistSpillHelper or…
		MachineFunction &MF;
		chandlercUnsubmitted Not Done Reply Inline Actions I get a warning saying this is unused when building with this patched in... chandlerc: I get a warning saying this is unused when building with this patched in...
		LiveIntervals &LIS;
		LiveStacks &LSS;
		AliasAnalysis *AA;
		MachineDominatorTree &MDT;
		MachineLoopInfo &Loops;
		VirtRegMap &VRM;
		MachineFrameInfo &MFI;
		MachineRegisterInfo &MRI;
		const TargetInstrInfo &TII;
		const TargetRegisterInfo &TRI;
		const MachineBlockFrequencyInfo &MBFI;

		// Map from StackSlot to its original register.
		DenseMap<int, unsigned> StackSlotToReg;
		// Map from pair of (StackSlot and Original VNI) to a set of spills which
		// have the same stackslot and have equal values defined by Original VNI.
		// These spills are mergable and are hoist candiates.
		qcolombetUnsubmitted Not Done Reply Inline Actions mergeable qcolombet: mergeable
		typedef DenseMap<std::pair<int, VNInfo >, SmallPtrSet<MachineInstr , 16>>
		MergableSpillsMap;
		qcolombetUnsubmitted Not Done Reply Inline Actions Mergeable. Do a search, the typo is widely spread :). qcolombet: Mergeable. Do a search, the typo is widely spread :).
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		MergableSpillsMap MergableSpills;

		/// Virt2SibingsMap - This is the map from original register to a set
		qcolombetUnsubmitted Not Done Reply Inline Actions Do not repeat the name of the field in the comment. qcolombet: Do not repeat the name of the field in the comment.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed wmi: Fixed
		/// containing all its siblings. To hoist a spill to another BB, we need
		/// to find out a live sibling there and use it as the RHS of the new spill.
		qcolombetUnsubmitted Not Done Reply Inline Actions […] as the source (instead of RHS) of the new .. qcolombet: […] as the source (instead of RHS) of the new ..
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed wmi: Fixed
		DenseMap<unsigned, DenseSet<unsigned>> Virt2SiblingsMap;
		qcolombetUnsubmitted Not Done Reply Inline Actions How big are the sets? I would expect very few siblings on average and was wondering if a SmallSetVector or SmallSet would be more appropriate. qcolombet: How big are the sets? I would expect very few siblings on average and was wondering if a…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Most of the cases the size of it is less than 16 I guess, so I use SmallSetVector instead. I cannot use SmallSet because the set needs to be iterated. wmi: Most of the cases the size of it is less than 16 I guess, so I use SmallSetVector instead. I…

		bool isSpillCandBB(unsigned OrigReg, VNInfo OrigVNI, MachineBasicBlock BB,
		unsigned &LiveReg);
		qcolombetUnsubmitted Not Done Reply Inline Actions Please use reference for values that cannot be nullptr. I.e., OrigVNI and BB. qcolombet: Please use reference for values that cannot be nullptr. I.e., OrigVNI and BB.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		void getVisitOrders(
		MachineBasicBlock Root, SmallPtrSet<MachineInstr , 16> &Spills,
		SmallVectorImpl<MachineDomTreeNode *> &Orders,
		SmallVectorImpl<MachineInstr *> &SpillsToRm,
		DenseMap<MachineDomTreeNode *, unsigned> &SpillsToKept,
		qcolombetUnsubmitted Not Done Reply Inline Actions SpillsToKeep qcolombet: SpillsToKeep
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		DenseMap<MachineDomTreeNode , MachineInstr > &SpillBBToSpill);
		void runHoistSpills(unsigned OrigReg, VNInfo *OrigVNI,
		SmallPtrSet<MachineInstr *, 16> &Spills,
		SmallVectorImpl<MachineInstr *> &SpillsToRm,
		DenseMap<MachineBasicBlock *, unsigned> &SpillsToIns);

		public:
		HoistSpiller(MachineFunctionPass &pass, MachineFunction &mf, VirtRegMap &vrm)
		: MF(mf), LIS(pass.getAnalysis<LiveIntervals>()),
		LSS(pass.getAnalysis<LiveStacks>()),
		AA(&pass.getAnalysis<AAResultsWrapperPass>().getAAResults()),
		MDT(pass.getAnalysis<MachineDominatorTree>()),
		Loops(pass.getAnalysis<MachineLoopInfo>()), VRM(vrm),
		MFI(*mf.getFrameInfo()), MRI(mf.getRegInfo()),
		TII(*mf.getSubtarget().getInstrInfo()),
		TRI(*mf.getSubtarget().getRegisterInfo()),
		MBFI(pass.getAnalysis<MachineBlockFrequencyInfo>()) {}

		void addToMergableSpills(MachineInstr *Spill, int StackSlot,
		unsigned Original);
		bool rmFromMergableSpills(MachineInstr *Spill, int StackSlot);
		void hoistAllSpills(LiveRangeEdit &Edit);
		};

class InlineSpiller : public Spiller {		class InlineSpiller : public Spiller {
MachineFunction &MF;		MachineFunction &MF;
LiveIntervals &LIS;		LiveIntervals &LIS;
LiveStacks &LSS;		LiveStacks &LSS;
AliasAnalysis *AA;		AliasAnalysis *AA;
MachineDominatorTree &MDT;		MachineDominatorTree &MDT;
MachineLoopInfo &Loops;		MachineLoopInfo &Loops;
VirtRegMap &VRM;		VirtRegMap &VRM;
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
private:		private:
// Values in RegsToSpill defined by sibling copies.		// Values in RegsToSpill defined by sibling copies.
typedef DenseMap<VNInfo*, SibValueInfo> SibValueMap;		typedef DenseMap<VNInfo*, SibValueInfo> SibValueMap;
SibValueMap SibValues;		SibValueMap SibValues;

// Dead defs generated during spilling.		// Dead defs generated during spilling.
SmallVector<MachineInstr*, 8> DeadDefs;		SmallVector<MachineInstr*, 8> DeadDefs;

		// Object records spills information and does the hoisting.
		HoistSpiller *HSpiller;

~InlineSpiller() override {}		~InlineSpiller() override {}

public:		public:
InlineSpiller(MachineFunctionPass &pass, MachineFunction &mf, VirtRegMap &vrm)		InlineSpiller(MachineFunctionPass &pass, MachineFunction &mf, VirtRegMap &vrm)
: MF(mf), LIS(pass.getAnalysis<LiveIntervals>()),		: MF(mf), LIS(pass.getAnalysis<LiveIntervals>()),
LSS(pass.getAnalysis<LiveStacks>()),		LSS(pass.getAnalysis<LiveStacks>()),
AA(&pass.getAnalysis<AAResultsWrapperPass>().getAAResults()),		AA(&pass.getAnalysis<AAResultsWrapperPass>().getAAResults()),
MDT(pass.getAnalysis<MachineDominatorTree>()),		MDT(pass.getAnalysis<MachineDominatorTree>()),
Loops(pass.getAnalysis<MachineLoopInfo>()), VRM(vrm),		Loops(pass.getAnalysis<MachineLoopInfo>()), VRM(vrm),
MFI(*mf.getFrameInfo()), MRI(mf.getRegInfo()),		MFI(*mf.getFrameInfo()), MRI(mf.getRegInfo()),
TII(*mf.getSubtarget().getInstrInfo()),		TII(*mf.getSubtarget().getInstrInfo()),
TRI(*mf.getSubtarget().getRegisterInfo()),		TRI(*mf.getSubtarget().getRegisterInfo()),
MBFI(pass.getAnalysis<MachineBlockFrequencyInfo>()) {}		MBFI(pass.getAnalysis<MachineBlockFrequencyInfo>()), HSpiller(nullptr) {
		}

void spill(LiveRangeEdit &) override;		void spill(LiveRangeEdit &) override;
		void setHSpiller(HoistSpiller *HS) { HSpiller = HS; }
		HoistSpiller *getHSpiller() { return HSpiller; }
		/// Methods for support type inquiry through isa, cast, and dyn_cast:
		static inline bool classof(const Spiller *V) { return true; }

private:		private:
		qcolombetUnsubmitted Done Reply Inline Actions Can set private, right? qcolombet: Can set private, right?
bool isSnippet(const LiveInterval &SnipLI);		bool isSnippet(const LiveInterval &SnipLI);
		qcolombetUnsubmitted Done Reply Inline Actions Ditto. qcolombet: Ditto.
void collectRegsToSpill();		void collectRegsToSpill();

		qcolombetUnsubmitted Done Reply Inline Actions Ditto. qcolombet: Ditto.
bool isRegToSpill(unsigned Reg) {		bool isRegToSpill(unsigned Reg) {
return std::find(RegsToSpill.begin(),		return std::find(RegsToSpill.begin(),
RegsToSpill.end(), Reg) != RegsToSpill.end();		RegsToSpill.end(), Reg) != RegsToSpill.end();
}		}
		qcolombetUnsubmitted Done Reply Inline Actions Ditto. qcolombet: Ditto.

bool isSibling(unsigned Reg);		bool isSibling(unsigned Reg);
MachineInstr traceSiblingValue(unsigned, VNInfo, VNInfo*);		MachineInstr traceSiblingValue(unsigned, VNInfo, VNInfo*);
void propagateSiblingValue(SibValueMap::iterator, VNInfo *VNI = nullptr);		void propagateSiblingValue(SibValueMap::iterator, VNInfo *VNI = nullptr);
void analyzeSiblingValues();		void analyzeSiblingValues();

bool hoistSpill(LiveInterval &SpillLI, MachineInstr *CopyMI);		bool hoistSpill(LiveInterval &SpillLI, MachineInstr *CopyMI);
void eliminateRedundantSpills(LiveInterval &LI, VNInfo *VNI);		void eliminateRedundantSpills(LiveInterval &LI, VNInfo *VNI);
Show All 19 Lines
void Spiller::anchor() { }		void Spiller::anchor() { }

Spiller *createInlineSpiller(MachineFunctionPass &pass,		Spiller *createInlineSpiller(MachineFunctionPass &pass,
MachineFunction &mf,		MachineFunction &mf,
VirtRegMap &vrm) {		VirtRegMap &vrm) {
return new InlineSpiller(pass, mf, vrm);		return new InlineSpiller(pass, mf, vrm);
}		}

		void createHoistSpiller(MachineFunctionPass &pass, MachineFunction &mf,
		VirtRegMap &vrm, Spiller *spiller) {
		HoistSpiller *HSpiller = new HoistSpiller(pass, mf, vrm);
		(dyn_cast<InlineSpiller>(spiller))->setHSpiller(HSpiller);
		qcolombetUnsubmitted Not Done Reply Inline Actions Hid the instantiation of the hoist spiller helper in the inliner spiller. The positive side effect is that we won’t leak the memory! qcolombet: Hid the instantiation of the hoist spiller helper in the inliner spiller. The positive side…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Make HoistSpillHelper a field in inline spiller. wmi: Make HoistSpillHelper a field in inline spiller.
		}

		void startHoistSpiller(MachineFunction &mf, VirtRegMap &vrm, LiveIntervals &lis,
		Spiller *spiller) {
		SmallVector<unsigned, 4> NewVRegs;
		LiveRangeEdit LRE(nullptr, NewVRegs, nullptr, mf, lis, &vrm, nullptr);
		HoistSpiller *HSpiller = (dyn_cast<InlineSpiller>(spiller))->getHSpiller();
		HSpiller->hoistAllSpills(LRE);
		assert(NewVRegs.size() == 0 &&
		"No new vregs should be generated in hoistAllSpills");
		qcolombetUnsubmitted Not Done Reply Inline Actions Hid the call to the hoist spiller helper in the inliner spiller. qcolombet: Hid the call to the hoist spiller helper in the inliner spiller.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I Added postOptimization as a pure virtual func in class Spiller, and put the code of hoist spiller helper inside of InlineSpiller::postOptimization. wmi: I Added postOptimization as a pure virtual func in class Spiller, and put the code of hoist…
		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Snippets		// Snippets
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// When spilling a virtual register, we also spill any snippets it is connected		// When spilling a virtual register, we also spill any snippets it is connected
// to. The snippets are small live ranges that only have a single real use,		// to. The snippets are small live ranges that only have a single real use,
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	if (SVI.DefMI)
OS << " def: " << *SVI.DefMI;		OS << " def: " << *SVI.DefMI;
else		else
OS << '\n';		OS << '\n';
return OS;		return OS;
}		}
#endif		#endif

/// propagateSiblingValue - Propagate the value in SVI to dependents if it is		/// propagateSiblingValue - Propagate the value in SVI to dependents if it is
/// known. Otherwise remember the dependency for later.		/// known. Otherwise remember the dependency for later.
		qcolombetUnsubmitted Not Done Reply Inline Actions Don’t repeat the method name. qcolombet: Don’t repeat the method name.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
///		///
/// @param SVIIter SibValues entry to propagate.		/// @param SVIIter SibValues entry to propagate.
/// @param VNI Dependent value, or NULL to propagate to all saved dependents.		/// @param VNI Dependent value, or NULL to propagate to all saved dependents.
void InlineSpiller::propagateSiblingValue(SibValueMap::iterator SVIIter,		void InlineSpiller::propagateSiblingValue(SibValueMap::iterator SVIIter,
VNInfo *VNI) {		VNInfo *VNI) {
SibValueMap::value_type SVI = &SVIIter;		SibValueMap::value_type SVI = &SVIIter;

// When VNI is non-NULL, add it to SVI's deps, and only propagate to that.		// When VNI is non-NULL, add it to SVI's deps, and only propagate to that.
▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	for (MachineRegisterInfo::use_instr_nodbg_iterator
// Erase spills.		// Erase spills.
int FI;		int FI;
if (Reg == TII.isStoreToStackSlot(MI, FI) && FI == StackSlot) {		if (Reg == TII.isStoreToStackSlot(MI, FI) && FI == StackSlot) {
DEBUG(dbgs() << "Redundant spill " << Idx << '\t' << *MI);		DEBUG(dbgs() << "Redundant spill " << Idx << '\t' << *MI);
// eliminateDeadDefs won't normally remove stores, so switch opcode.		// eliminateDeadDefs won't normally remove stores, so switch opcode.
MI->setDesc(TII.get(TargetOpcode::KILL));		MI->setDesc(TII.get(TargetOpcode::KILL));
DeadDefs.push_back(MI);		DeadDefs.push_back(MI);
++NumSpillsRemoved;		++NumSpillsRemoved;
		if (HSpiller && HSpiller->rmFromMergableSpills(MI, StackSlot))
--NumSpills;		--NumSpills;
}		}
}		}
} while (!WorkList.empty());		} while (!WorkList.empty());
}		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Rematerialization		// Rematerialization
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	bool InlineSpiller::coalesceStackAccess(MachineInstr *MI, unsigned Reg) {
bool IsLoad = InstrReg;		bool IsLoad = InstrReg;
if (!IsLoad)		if (!IsLoad)
InstrReg = TII.isStoreToStackSlot(MI, FI);		InstrReg = TII.isStoreToStackSlot(MI, FI);

// We have a stack access. Is it the right register and slot?		// We have a stack access. Is it the right register and slot?
if (InstrReg != Reg \|\| FI != StackSlot)		if (InstrReg != Reg \|\| FI != StackSlot)
return false;		return false;

		if (!IsLoad && HSpiller)
		HSpiller->rmFromMergableSpills(MI, StackSlot);

DEBUG(dbgs() << "Coalescing stack access: " << *MI);		DEBUG(dbgs() << "Coalescing stack access: " << *MI);
LIS.RemoveMachineInstrFromMaps(MI);		LIS.RemoveMachineInstrFromMaps(MI);
MI->eraseFromParent();		MI->eraseFromParent();

if (IsLoad) {		if (IsLoad) {
++NumReloadsRemoved;		++NumReloadsRemoved;
--NumReloads;		--NumReloads;
} else {		} else {
▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	for (MIBundleOperands MO(MI); MO.isValid(); ++MO) {
if (RI.Defines)		if (RI.Defines)
continue;		continue;
// FoldMI does not define this physreg. Remove the LI segment.		// FoldMI does not define this physreg. Remove the LI segment.
assert(MO->isDead() && "Cannot fold physreg def");		assert(MO->isDead() && "Cannot fold physreg def");
SlotIndex Idx = LIS.getInstructionIndex(MI).getRegSlot();		SlotIndex Idx = LIS.getInstructionIndex(MI).getRegSlot();
LIS.removePhysRegDefAt(Reg, Idx);		LIS.removePhysRegDefAt(Reg, Idx);
}		}

		int FI;
		if (TII.isStoreToStackSlot(MI, FI) && HSpiller &&
		HSpiller->rmFromMergableSpills(MI, FI))
		--NumSpills;
LIS.ReplaceMachineInstrInMaps(MI, FoldMI);		LIS.ReplaceMachineInstrInMaps(MI, FoldMI);
MI->eraseFromParent();		MI->eraseFromParent();

// Insert any new instructions other than FoldMI into the LIS maps.		// Insert any new instructions other than FoldMI into the LIS maps.
assert(!MIS.empty() && "Unexpected empty span of instructions!");		assert(!MIS.empty() && "Unexpected empty span of instructions!");
for (MachineBasicBlock::iterator MII = MIS.begin(), End = MIS.end();		for (MachineBasicBlock::iterator MII = MIS.begin(), End = MIS.end();
MII != End; ++MII)		MII != End; ++MII)
if (&*MII != FoldMI)		if (&*MII != FoldMI)
Show All 10 Lines	for (unsigned i = FoldMI->getNumOperands(); i; --i) {
FoldMI->RemoveOperand(i - 1);		FoldMI->RemoveOperand(i - 1);
}		}

DEBUG(dumpMachineInstrRangeWithSlotIndex(MIS.begin(), MIS.end(), LIS,		DEBUG(dumpMachineInstrRangeWithSlotIndex(MIS.begin(), MIS.end(), LIS,
"folded"));		"folded"));

if (!WasCopy)		if (!WasCopy)
++NumFolded;		++NumFolded;
else if (Ops.front().second == 0)		else if (Ops.front().second == 0) {
++NumSpills;		++NumSpills;
else		if (HSpiller)
		HSpiller->addToMergableSpills(FoldMI, StackSlot, Original);
		} else
++NumReloads;		++NumReloads;
return true;		return true;
}		}

void InlineSpiller::insertReload(unsigned NewVReg,		void InlineSpiller::insertReload(unsigned NewVReg,
SlotIndex Idx,		SlotIndex Idx,
MachineBasicBlock::iterator MI) {		MachineBasicBlock::iterator MI) {
MachineBasicBlock &MBB = *MI->getParent();		MachineBasicBlock &MBB = *MI->getParent();
Show All 18 Lines	void InlineSpiller::insertSpill(unsigned NewVReg, bool isKill,
TII.storeRegToStackSlot(MBB, std::next(MI), NewVReg, isKill, StackSlot,		TII.storeRegToStackSlot(MBB, std::next(MI), NewVReg, isKill, StackSlot,
MRI.getRegClass(NewVReg), &TRI);		MRI.getRegClass(NewVReg), &TRI);

LIS.InsertMachineInstrRangeInMaps(std::next(MI), MIS.end());		LIS.InsertMachineInstrRangeInMaps(std::next(MI), MIS.end());

DEBUG(dumpMachineInstrRangeWithSlotIndex(std::next(MI), MIS.end(), LIS,		DEBUG(dumpMachineInstrRangeWithSlotIndex(std::next(MI), MIS.end(), LIS,
"spill"));		"spill"));
++NumSpills;		++NumSpills;
		if (HSpiller)
		HSpiller->addToMergableSpills(std::next(MI), StackSlot, Original);
}		}

/// spillAroundUses - insert spill code around each use of Reg.		/// spillAroundUses - insert spill code around each use of Reg.
void InlineSpiller::spillAroundUses(unsigned Reg) {		void InlineSpiller::spillAroundUses(unsigned Reg) {
DEBUG(dbgs() << "spillAroundUses " << PrintReg(Reg) << '\n');		DEBUG(dbgs() << "spillAroundUses " << PrintReg(Reg) << '\n');
LiveInterval &OldLI = LIS.getInterval(Reg);		LiveInterval &OldLI = LIS.getInterval(Reg);

// Iterate over instructions using Reg.		// Iterate over instructions using Reg.
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	void InlineSpiller::spill(LiveRangeEdit &edit) {
reMaterializeAll();		reMaterializeAll();

// Remat may handle everything.		// Remat may handle everything.
if (!RegsToSpill.empty())		if (!RegsToSpill.empty())
spillAll();		spillAll();

Edit->calculateRegClassAndHint(MF, Loops, MBFI);		Edit->calculateRegClassAndHint(MF, Loops, MBFI);
}		}

		// When a spill is inserted, add the spill to MergableSpills map.
		qcolombetUnsubmitted Not Done Reply Inline Actions Don’t repeat the name of the method. qcolombet: Don’t repeat the name of the method.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions All similar comments fixed. wmi: All similar comments fixed.
		void HoistSpiller::addToMergableSpills(MachineInstr *Spill, int StackSlot,
		unsigned Original) {
		StackSlotToReg[StackSlot] = Original;
		SlotIndex Idx = LIS.getInstructionIndex(Spill);
		VNInfo *OrigVNI = LIS.getInterval(Original).getVNInfoAt(Idx.getRegSlot());
		std::pair<int, VNInfo *> MIdx = std::make_pair(StackSlot, OrigVNI);
		MergableSpills[MIdx].insert(Spill);
		}

		// When a spill is removed, remove the spill from MergableSpills map.
		// Return true if the spill is removed successfully.
		qcolombetUnsubmitted Not Done Reply Inline Actions More mergeable typos... qcolombet: More mergeable typos...
		wmiAuthorUnsubmitted Not Done Reply Inline Actions All such typos Fixed. wmi: All such typos Fixed.
		bool HoistSpiller::rmFromMergableSpills(MachineInstr *Spill, int StackSlot) {
		int Original = StackSlotToReg[StackSlot];
		if (!Original)
		return false;
		SlotIndex Idx = LIS.getInstructionIndex(Spill);
		VNInfo *OrigVNI = LIS.getInterval(Original).getVNInfoAt(Idx.getRegSlot());
		std::pair<int, VNInfo *> MIdx = std::make_pair(StackSlot, OrigVNI);
		return MergableSpills[MIdx].erase(Spill);
		}

		// Check BB to see if it is a possible target BB to place a hoisted spill,
		// .i.e, there should be a living sibling of OrigReg at the insert point.
		bool HoistSpiller::isSpillCandBB(unsigned OrigReg, VNInfo *OrigVNI,
		qcolombetUnsubmitted Not Done Reply Inline Actions .i.e => i.e. qcolombet: .i.e => i.e.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		qcolombetUnsubmitted Not Done Reply Inline Actions Use doxygen style comment. qcolombet: Use doxygen style comment.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		MachineBasicBlock *BB, unsigned &LiveReg) {
		SlotIndex Idx;
		MachineBasicBlock::iterator MI = BB->getFirstTerminator();
		if (MI != BB->end())
		Idx = LIS.getInstructionIndex(MI);
		else
		Idx = LIS.getMBBEndIdx(BB).getPrevSlot();
		DenseSet<unsigned> &Siblings = Virt2SiblingsMap[OrigReg];
		assert((LIS.getInterval(OrigReg)).getVNInfoAt(Idx) == OrigVNI &&
		"Unexpected VNI");

		for (auto const ent : Siblings) {
		qcolombetUnsubmitted Not Done Reply Inline Actions DenseSet does not guarantee that the iteration order is stable from one run to another, does it? Although we should have several siblings live at the same time, this is theoretically possible. In other words, we should use a container that has a deterministic iteration order for reproducibility. See the earlier suggestion I made. qcolombet: DenseSet does not guarantee that the iteration order is stable from one run to another, does it?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, when turn on split-mode-size, after hoistCopies, it is possible that several siblings live at the same time. it is not just therotically possible but realistic. I use SmallSetVector instead here. wmi: Yes, when turn on split-mode-size, after hoistCopies, it is possible that several siblings live…
		LiveInterval &LI = LIS.getInterval(ent);
		VNInfo *VNI = LI.getVNInfoAt(Idx);
		if (VNI) {
		LiveReg = ent;
		return true;
		}
		}
		return false;
		}

		/// Get the top-bottom order to visit the BB nodes containing spills.
		/// Redundent spills will be found and put into SpillsToRm at the
		/// same time.
		void HoistSpiller::getVisitOrders(
		qcolombetUnsubmitted Not Done Reply Inline Actions This method would benefit a more verbose comment. Maybe something along the line: Starting from \p Root find a top-down traversal order of the dominator tree to visit all basic blocks containing the elements of \p Spills. Redundent spills will be found and put into \p SpillsToRm at the same time. \p SpillBBToSpill will populated as part of the process and maps a basic block to the first store occurring in the basic block. \post SpillToRm.union(Spills@post) == Spills@pre What is the usage of SpillsToKept? In particular the unsigned part? Should we consider to moving some of the arguments to field of the current hoistspill instance? qcolombet: This method would benefit a more verbose comment. Maybe something along the line: Starting from…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Thanks for your example comment. It is good. I copy most of them to the code. Should we consider to moving some of the arguments to field of the current hoistspill instance? hoistSpillHelper now has the same lifetime as InlineSpiller instance, so the lifes of those arguments are much shorter than that. That is why I choose to keep them as func local objects. wmi: Thanks for your example comment. It is good. I copy most of them to the code. > Should we…
		qcolombetUnsubmitted Not Done Reply Inline Actions This method does a bunch of things! Although I understand we want to share the logic that does the traversal and such, I found that it makes the code harder to read. I’d say as it is now and with more comment like I suggested, this is fine, but in general I rather have a better separation of concerns then try to optimize if it turns out to be problematic. I am guessing you already went through that process, we are just lacking the history :). qcolombet: This method does a bunch of things! Although I understand we want to share the logic that does…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions After separate part of the work into HoistSpillHelper::rmRedundentSpills and add more comments in the func body, it may make the code easier to read now. wmi: After separate part of the work into HoistSpillHelper::rmRedundentSpills and add more comments…
		MachineBasicBlock Root, SmallPtrSet<MachineInstr , 16> &Spills,
		SmallVectorImpl<MachineDomTreeNode *> &Orders,
		SmallVectorImpl<MachineInstr *> &SpillsToRm,
		DenseMap<MachineDomTreeNode *, unsigned> &SpillsToKept,
		DenseMap<MachineDomTreeNode , MachineInstr > &SpillBBToSpill) {
		// For each spill, check the BB the spill is located at and set
		// SpillBBToSpill[]. If a BB contains more than one spill, only
		// keep the spill with smaller SlotIndex.
		for (const auto CurrentSpill : Spills) {
		MachineBasicBlock *Block = CurrentSpill->getParent();
		MachineDomTreeNode *Node = MDT.DT->getNode(Block);
		MachineInstr *PrevSpill = SpillBBToSpill[Node];
		if (PrevSpill) {
		SlotIndex PIdx = LIS.getInstructionIndex(PrevSpill);
		SlotIndex CIdx = LIS.getInstructionIndex(CurrentSpill);
		MachineInstr *SpillToRm = (CIdx > PIdx) ? CurrentSpill : PrevSpill;
		MachineInstr *SpillToKeep = (CIdx > PIdx) ? PrevSpill : CurrentSpill;
		SpillsToRm.push_back(SpillToRm);
		SpillBBToSpill[MDT.DT->getNode(Block)] = SpillToKeep;
		qcolombetUnsubmitted Not Done Reply Inline Actions Any chance this could be updated when we insert the spill? Like I said, it just feels like getVisitOrders does too many things. qcolombet: Any chance this could be updated when we insert the spill? Like I said, it just feels like…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I separate the first part of the work to another func: HoistSpillHelper::rmRedundentSpills, so HoistSpiller::getVisitOrders is more focus on what its name describes. wmi: I separate the first part of the work to another func: HoistSpillHelper::rmRedundentSpills, so…
		} else {
		SpillBBToSpill[MDT.DT->getNode(Block)] = CurrentSpill;
		}
		}
		for (const auto SpillToRm : SpillsToRm)
		Spills.erase(SpillToRm);

		SmallPtrSet<MachineDomTreeNode *, 8> WorkSet;
		qcolombetUnsubmitted Not Done Reply Inline Actions Please document what is WorkSet supposed to contain. qcolombet: Please document what is WorkSet supposed to contain.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		SmallPtrSet<MachineDomTreeNode *, 8> NodesOnPath;
		MachineDomTreeNode *RootIDomNode = MDT[Root]->getIDom();
		qcolombetUnsubmitted Not Done Reply Inline Actions I think we should describe what is the expected root, because it seems strange to me that we don’t just take the node for the Root. qcolombet: I think we should describe what is the expected root, because it seems strange to me that we…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		// For every node on the dominator tree with spill, walk upside on the
		// dominator tree until reaching the Root node. If there is other node
		// found with spill on the path, the original node is redundent and will
		// be removed. All the nodes on the path from node with non-redundent spill
		// to Root node will be added to the WorkSet, which is the set we want
		// to look at during hoisting spills in the next step.
		for (const auto Spill : Spills) {
		MachineBasicBlock *Block = Spill->getParent();
		MachineDomTreeNode *Node = MDT[Block];
		MachineInstr *SpillToRm = nullptr;
		while (Node != RootIDomNode) {
		if (Node != MDT[Block] && SpillBBToSpill[Node]) {
		qcolombetUnsubmitted Not Done Reply Inline Actions More comments, e.g., Node dominates Block and already store the value. This store is redundant. qcolombet: More comments, e.g., Node dominates Block and already store the value. This store is redundant.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		SpillToRm = SpillBBToSpill[MDT[Block]];
		break;
		} else if (WorkSet.count(Node)) {
		break;
		} else {
		NodesOnPath.insert(Node);
		}
		Node = Node->getIDom();
		}
		if (SpillToRm) {
		SpillsToRm.push_back(SpillToRm);
		} else {
		SpillsToKept[MDT[Block]] = 0;
		qcolombetUnsubmitted Not Done Reply Inline Actions Ok, found the meaning of the unsigned elsewhere… A comment here as well would be great. qcolombet: Ok, found the meaning of the unsigned elsewhere… A comment here as well would be great.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		WorkSet.insert(NodesOnPath.begin(), NodesOnPath.end());
		}
		NodesOnPath.clear();
		}

		// Sort the nodes in WorkSet in top-bottom order and save the nodes
		// in Orders.
		unsigned idx = 0;
		Orders.push_back(MDT.DT->getNode(Root));
		do {
		MachineDomTreeNode *Node = Orders[idx++];
		const std::vector<MachineDomTreeNode *> &Children = Node->getChildren();
		unsigned NumChildren = Children.size();
		for (unsigned i = 0; i != NumChildren; ++i) {
		MachineDomTreeNode *Child = Children[i];
		if (WorkSet.count(Child))
		Orders.push_back(Child);
		}
		} while (idx != Orders.size());

		qcolombetUnsubmitted Not Done Reply Inline Actions Assert Orders.size == WorkSet.size? qcolombet: Assert Orders.size == WorkSet.size?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		DEBUG(dbgs() << "Orders size is " << Orders.size() << "\n");
		{
		SmallVector<MachineDomTreeNode *, 32>::reverse_iterator RIt =
		Orders.rbegin();
		for (; RIt != Orders.rend(); RIt++)
		DEBUG(dbgs() << "BB" << (*RIt)->getBlock()->getNumber() << ",");
		}
		DEBUG(dbgs() << "\n");
		}

		/// Try to hoist spills according to BB hotness. The spills to removed will
		/// be saved in SpillsToRm. The spills to be inserted will be saved in
		/// SpillsToIns.
		void HoistSpiller::runHoistSpills(
		unsigned OrigReg, VNInfo OrigVNI, SmallPtrSet<MachineInstr , 16> &Spills,
		SmallVectorImpl<MachineInstr *> &SpillsToRm,
		DenseMap<MachineBasicBlock *, unsigned> &SpillsToIns) {
		// Visit order of dominator tree nodes.
		SmallVector<MachineDomTreeNode *, 32> Orders;
		// SpillsToKept contains all the nodes where spills are to be inserted
		// during hoisting. If the spill to be inserted is an original spill
		// (not a hoisted one), the value of the map entry is 0. If the spill
		qcolombetUnsubmitted Not Done Reply Inline Actions We do not insert the original store, it is already there, right? qcolombet: We do not insert the original store, it is already there, right?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, that is right. wmi: Yes, that is right.
		// is a hoisted spill, the value of the map entry is the VReg to be used
		// on the RHS of the spill.
		DenseMap<MachineDomTreeNode *, unsigned> SpillsToKept;
		// Map from BB to the spill inside of it.
		DenseMap<MachineDomTreeNode , MachineInstr > SpillBBToSpill;
		MachineBasicBlock *Root = LIS.getMBBFromIndex(OrigVNI->def);
		getVisitOrders(Root, Spills, Orders, SpillsToRm, SpillsToKept,
		SpillBBToSpill);

		// SpillsInSubTree keeps the map from a dom tree node to a nodes set.
		// It saves the locations where spills are to be inserted in the
		// subtree of the node.
		DenseMap<MachineDomTreeNode , SmallPtrSet<MachineDomTreeNode , 16>>
		SpillsInSubTree;
		// Iterate Orders set in reverse order, which will be a bottom-top order
		qcolombetUnsubmitted Not Done Reply Inline Actions I believe we usually use bottom-up instead of bottom-top. qcolombet: I believe we usually use bottom-up instead of bottom-top.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		// in the dominator tree. Once we visit a dom tree node, we know its
		// children has already been visited and the spill locations in the
		qcolombetUnsubmitted Not Done Reply Inline Actions have qcolombet: have
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		// subtrees of all the children have been determined.
		SmallVector<MachineDomTreeNode *, 32>::reverse_iterator RIt = Orders.rbegin();
		for (; RIt != Orders.rend(); RIt++) {
		MachineBasicBlock Block = (RIt)->getBlock();

		// If Block contains an original spill, simply continue.
		if (SpillsToKept.find(RIt) != SpillsToKept.end() && !SpillsToKept[RIt]) {
		SpillsInSubTree[RIt].insert(RIt);
		continue;
		}

		// Collect spills in subtree of current node (*RIt) to
		// SpillsInSubTree[*RIt].
		const std::vector<MachineDomTreeNode > &Children = (RIt)->getChildren();
		unsigned NumChildren = Children.size();
		for (unsigned i = 0; i != NumChildren; ++i) {
		MachineDomTreeNode *Child = Children[i];
		SpillsInSubTree[*RIt].insert(SpillsInSubTree[Child].begin(),
		SpillsInSubTree[Child].end());
		SpillsInSubTree.erase(Child);
		}

		// No spills in subtree, simply continue.
		if (SpillsInSubTree[*RIt].empty())
		continue;

		// Check whether Block is a possible candidate to insert spill.
		unsigned LiveReg = 0;
		if (!isSpillCandBB(OrigReg, OrigVNI, Block, LiveReg))
		continue;

		// Now Block is a proper target BB for hoisting spills. Decide whether to
		// hoist the spills to current node. Get existing cost of all the spills
		// in SpillsInSubTree[Block].
		BlockFrequency SpillCost = 0;
		for (const auto SpillBB : SpillsInSubTree[*RIt])
		SpillCost += MBFI.getBlockFreq(SpillBB->getBlock());
		qcolombetUnsubmitted Not Done Reply Inline Actions If the subtrees get big, we will end up recomputing this cost a bunch of time. Could it be something we keep alongside the subtree? qcolombet: If the subtrees get big, we will end up recomputing this cost a bunch of time. Could it be…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I keep it along side the subtree in SpillsInSubTreeMap. wmi: I keep it along side the subtree in SpillsInSubTreeMap.

		// If there are multiple spills that could be merged, bias a little
		// to hoist the spill.
		BranchProbability MarginProb = (SpillsInSubTree[*RIt].size() > 1)
		? BranchProbability(9, 10)
		: BranchProbability(1, 1);
		if (SpillCost > MBFI.getBlockFreq(Block) * MarginProb) {
		qcolombetUnsubmitted Not Done Reply Inline Actions We could add a mode for the hoist spiller, where code size is the priority. I.e., always hoist when SpillsInSubTree.size > 1 A follow-up patch is fine. qcolombet: We could add a mode for the hoist spiller, where code size is the priority. I.e., always hoist…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok, I will do it in a follow-up patch. wmi: Ok, I will do it in a follow-up patch.
		// Hoist: Move spills to current Block.
		for (const auto SpillBB : SpillsInSubTree[*RIt]) {
		// When SpillBB is a BB contains original spill, insert the spill
		// to SpillsToRm.
		if (SpillsToKept.find(SpillBB) != SpillsToKept.end() &&
		!SpillsToKept[SpillBB]) {
		MachineInstr *SpillToRm = SpillBBToSpill[SpillBB];
		SpillsToRm.push_back(SpillToRm);
		}
		// SpillBB will not contain spill anymore, remove it from SpillsToKept.
		SpillsToKept.erase(SpillBB);
		}
		// Current Block is the BB containing the new hoisted spill. Add it to
		// SpillsToKept. LiveReg is the RHS of the spill.
		SpillsToKept[*RIt] = LiveReg;
		DEBUG({
		dbgs() << "spills in BB: ";
		for (const auto Rspill : SpillsInSubTree[*RIt])
		dbgs() << Rspill->getBlock()->getNumber() << " ";
		dbgs() << "were promoted to BB" << (*RIt)->getBlock()->getNumber()
		<< "\n";
		});
		SpillsInSubTree[*RIt].clear();
		SpillsInSubTree[RIt].insert(RIt);
		}
		}
		// For spills in SpillsToKept with LiveReg set (.i.e, not original spill),
		qcolombetUnsubmitted Not Done Reply Inline Actions typo. qcolombet: typo.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		// save them to SpillsToIns.
		for (const auto ent : SpillsToKept) {
		qcolombetUnsubmitted Not Done Reply Inline Actions Variable must start with a capital letter. Also why use ent for the name? qcolombet: Variable must start with a capital letter. Also why use ent for the name?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		if (ent.second)
		SpillsToIns[ent.first->getBlock()] = ent.second;
		}
		}

		/// For spills with equal values, remove redundent spills and hoist spills
		/// to a less hot spot.
		void HoistSpiller::hoistAllSpills(LiveRangeEdit &Edit) {
		qcolombetUnsubmitted Not Done Reply Inline Actions Explain the general algorithm here. qcolombet: Explain the general algorithm here.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
		// Save the mapping between stackslot and its original reg.
		DenseMap<int, unsigned> SlotToOrigReg;
		for (unsigned i = 0, e = MRI.getNumVirtRegs(); i != e; ++i) {
		unsigned Reg = TargetRegisterInfo::index2VirtReg(i);
		int Slot = VRM.getStackSlot(Reg);
		if (Slot != VirtRegMap::NO_STACK_SLOT) {
		for (const auto &ent : MergableSpills) {
		if (ent.first.first == Slot &&
		SlotToOrigReg.find(Slot) == SlotToOrigReg.end())
		SlotToOrigReg[Slot] = VRM.getOriginal(Reg);
		}
		}
		unsigned Original = VRM.getPreSplitReg(Reg);
		if (!MRI.def_empty(Reg))
		Virt2SiblingsMap[Original].insert(Reg);
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions This loop scares me. Any chance this information can be built as we insert spill. qcolombet: This loop scares me. Any chance this information can be built as we insert spill.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions I simply remove the inner loop. SlotToOrigReg map will become somewhat bigger, but not a lot. wmi: I simply remove the inner loop. SlotToOrigReg map will become somewhat bigger, but not a lot.

		// Each entry in MergableSpills contains a spill set with equal values.
		for (auto &ent : MergableSpills) {
		int Slot = ent.first.first;
		unsigned OrigReg = SlotToOrigReg[Slot];
		VNInfo *OrigVNI = ent.first.second;
		SmallPtrSet<MachineInstr *, 16> &EqValSpills = ent.second;
		if (!ent.second.size())
		qcolombetUnsubmitted Not Done Reply Inline Actions empty qcolombet: empty
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
		continue;

		DEBUG({
		dbgs() << "\nFor Slot" << Slot << " and VN" << OrigVNI->id << ":\n"
		<< "Equal spills in BB: ";
		for (const auto spill : EqValSpills)
		dbgs() << spill->getParent()->getNumber() << " ";
		dbgs() << "\n";
		});

		// SpillsToRm is the spill set to be removed from EqValSpills.
		SmallVector<MachineInstr *, 16> SpillsToRm;
		// SpillsToIns is the spill set to be newly inserted after hoisting.
		DenseMap<MachineBasicBlock *, unsigned> SpillsToIns;

		runHoistSpills(OrigReg, OrigVNI, EqValSpills, SpillsToRm, SpillsToIns);

		DEBUG({
		dbgs() << "Finally inserted spills in BB: ";
		for (const auto Ispill : SpillsToIns)
		dbgs() << Ispill.first->getNumber() << " ";
		dbgs() << "\nFinally removed spills in BB: ";
		for (const auto Rspill : SpillsToRm)
		dbgs() << Rspill->getParent()->getNumber() << " ";
		dbgs() << "\n";
		});

		// Stack live range update.
		LiveInterval &StackIntvl = LSS.getInterval(Slot);
		if (!SpillsToIns.empty() \|\| !SpillsToRm.empty()) {
		LiveInterval &OrigLI = LIS.getInterval(OrigReg);
		StackIntvl.MergeValueInAsValue(OrigLI, OrigVNI,
		StackIntvl.getValNumInfo(0));
		}

		// Insert hoisted spills.
		for (auto const ent : SpillsToIns) {
		MachineBasicBlock *BB = ent.first;
		unsigned LiveReg = ent.second;
		MachineBasicBlock::iterator MI = BB->getFirstTerminator();
		TII.storeRegToStackSlot(*BB, MI, LiveReg, false, Slot,
		MRI.getRegClass(LiveReg), &TRI);
		LIS.InsertMachineInstrRangeInMaps(std::prev(MI), MI);
		++NumSpills;
		}

		// Remove redundent spills or change them to dead instructions.
		NumSpills -= SpillsToRm.size();
		for (auto const ent : SpillsToRm) {
		ent->setDesc(TII.get(TargetOpcode::KILL));
		for (unsigned i = ent->getNumOperands(); i; --i) {
		MachineOperand &MO = ent->getOperand(i - 1);
		if (MO.isReg() && MO.isImplicit() && MO.isDef() && !MO.isDead())
		ent->RemoveOperand(i - 1);
		}
		}
		Edit.eliminateDeadDefs(SpillsToRm, None, true);
		}
		}

lib/CodeGen/RegAllocGreedy.cpp

//===-- RegAllocGreedy.cpp - greedy register allocator --------------------===//		//===-- RegAllocGreedy.cpp - greedy register allocator --------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the RAGreedy function pass for register allocation in		// This file defines the RAGreedy function pass for register allocation in
// optimized builds.		// optimized builds.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/Passes.h"
#include "AllocationOrder.h"		#include "AllocationOrder.h"
#include "InterferenceCache.h"		#include "InterferenceCache.h"
#include "LiveDebugVariables.h"		#include "LiveDebugVariables.h"
#include "RegAllocBase.h"		#include "RegAllocBase.h"
#include "SpillPlacement.h"		#include "SpillPlacement.h"
#include "Spiller.h"		#include "Spiller.h"
#include "SplitKit.h"		#include "SplitKit.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/CalcSpillWeights.h"		#include "llvm/CodeGen/CalcSpillWeights.h"
#include "llvm/CodeGen/EdgeBundles.h"		#include "llvm/CodeGen/EdgeBundles.h"
#include "llvm/CodeGen/LiveIntervalAnalysis.h"		#include "llvm/CodeGen/LiveIntervalAnalysis.h"
#include "llvm/CodeGen/LiveRangeEdit.h"		#include "llvm/CodeGen/LiveRangeEdit.h"
#include "llvm/CodeGen/LiveRegMatrix.h"		#include "llvm/CodeGen/LiveRegMatrix.h"
#include "llvm/CodeGen/LiveStackAnalysis.h"		#include "llvm/CodeGen/LiveStackAnalysis.h"
#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"		#include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
#include "llvm/CodeGen/MachineDominators.h"		#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/RegAllocRegistry.h"		#include "llvm/CodeGen/RegAllocRegistry.h"
#include "llvm/CodeGen/RegisterClassInfo.h"		#include "llvm/CodeGen/RegisterClassInfo.h"
#include "llvm/CodeGen/VirtRegMap.h"		#include "llvm/CodeGen/VirtRegMap.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/PassAnalysisSupport.h"		#include "llvm/PassAnalysisSupport.h"
#include "llvm/Support/BranchProbability.h"		#include "llvm/Support/BranchProbability.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Timer.h"		#include "llvm/Support/Timer.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
#include <queue>		#include <queue>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "regalloc"		#define DEBUG_TYPE "regalloc"

STATISTIC(NumGlobalSplits, "Number of split global live ranges");		STATISTIC(NumGlobalSplits, "Number of split global live ranges");
STATISTIC(NumLocalSplits, "Number of split local live ranges");		STATISTIC(NumLocalSplits, "Number of split local live ranges");
STATISTIC(NumEvicted, "Number of interferences evicted");		STATISTIC(NumEvicted, "Number of interferences evicted");

static cl::opt<SplitEditor::ComplementSpillMode>		static cl::opt<SplitEditor::ComplementSpillMode> SplitSpillMode(
SplitSpillMode("split-spill-mode", cl::Hidden,		"split-spill-mode", cl::Hidden,
cl::desc("Spill mode for splitting live ranges"),		cl::desc("Spill mode for splitting live ranges"),
cl::values(clEnumValN(SplitEditor::SM_Partition, "default", "Default"),		cl::values(clEnumValN(SplitEditor::SM_Partition, "default", "Default"),
clEnumValN(SplitEditor::SM_Size, "size", "Optimize for size"),		clEnumValN(SplitEditor::SM_Size, "size", "Optimize for size"),
clEnumValN(SplitEditor::SM_Speed, "speed", "Optimize for speed"),		clEnumValN(SplitEditor::SM_Speed, "speed", "Optimize for speed"),
clEnumValEnd),		clEnumValEnd),
cl::init(SplitEditor::SM_Partition));		// cl::init(SplitEditor::SM_Partition));
		cl::init(SplitEditor::SM_Size));

static cl::opt<unsigned>		static cl::opt<unsigned>
LastChanceRecoloringMaxDepth("lcr-max-depth", cl::Hidden,		LastChanceRecoloringMaxDepth("lcr-max-depth", cl::Hidden,
cl::desc("Last chance recoloring max depth"),		cl::desc("Last chance recoloring max depth"),
cl::init(5));		cl::init(5));

static cl::opt<unsigned> LastChanceRecoloringMaxInterference(		static cl::opt<unsigned> LastChanceRecoloringMaxInterference(
"lcr-max-interf", cl::Hidden,		"lcr-max-interf", cl::Hidden,
▲ Show 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	unsigned trySplit(LiveInterval&, AllocationOrder&,
SmallVectorImpl<unsigned>&);		SmallVectorImpl<unsigned>&);
unsigned tryLastChanceRecoloring(LiveInterval &, AllocationOrder &,		unsigned tryLastChanceRecoloring(LiveInterval &, AllocationOrder &,
SmallVectorImpl<unsigned> &,		SmallVectorImpl<unsigned> &,
SmallVirtRegSet &, unsigned);		SmallVirtRegSet &, unsigned);
bool tryRecoloringCandidates(PQueue &, SmallVectorImpl<unsigned> &,		bool tryRecoloringCandidates(PQueue &, SmallVectorImpl<unsigned> &,
SmallVirtRegSet &, unsigned);		SmallVirtRegSet &, unsigned);
void tryHintRecoloring(LiveInterval &);		void tryHintRecoloring(LiveInterval &);
void tryHintsRecoloring();		void tryHintsRecoloring();
		void postOptimization();
		qcolombetUnsubmitted Not Done Reply Inline Actions I would have put this into the base class. qcolombet: I would have put this into the base class.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.

/// Model the information carried by one end of a copy.		/// Model the information carried by one end of a copy.
struct HintInfo {		struct HintInfo {
/// The frequency of the copy.		/// The frequency of the copy.
BlockFrequency Freq;		BlockFrequency Freq;
/// The virtual register or physical register.		/// The virtual register or physical register.
unsigned Reg;		unsigned Reg;
/// Its currently assigned register.		/// Its currently assigned register.
▲ Show 20 Lines • Show All 1,044 Lines • ▼ Show 20 Lines	unsigned RAGreedy::calculateRegionSplitCost(LiveInterval &VirtReg,
return BestCand;		return BestCand;
}		}

unsigned RAGreedy::doRegionSplit(LiveInterval &VirtReg, unsigned BestCand,		unsigned RAGreedy::doRegionSplit(LiveInterval &VirtReg, unsigned BestCand,
bool HasCompact,		bool HasCompact,
SmallVectorImpl<unsigned> &NewVRegs) {		SmallVectorImpl<unsigned> &NewVRegs) {
SmallVector<unsigned, 8> UsedCands;		SmallVector<unsigned, 8> UsedCands;
// Prepare split editor.		// Prepare split editor.
LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LRE(&VirtReg, NewVRegs, MF, LIS, VRM, this);
SE->reset(LREdit, SplitSpillMode);		SE->reset(LREdit, SplitSpillMode);

// Assign all edge bundles to the preferred candidate, or NoCand.		// Assign all edge bundles to the preferred candidate, or NoCand.
BundleCand.assign(Bundles->getNumBundles(), NoCand);		BundleCand.assign(Bundles->getNumBundles(), NoCand);

// Assign bundles for the best candidate region.		// Assign bundles for the best candidate region.
if (BestCand != NoCand) {		if (BestCand != NoCand) {
GlobalSplitCandidate &Cand = GlobalCand[BestCand];		GlobalSplitCandidate &Cand = GlobalCand[BestCand];
Show All 31 Lines
/// tryBlockSplit - Split a global live range around every block with uses. This		/// tryBlockSplit - Split a global live range around every block with uses. This
/// creates a lot of local live ranges, that will be split by tryLocalSplit if		/// creates a lot of local live ranges, that will be split by tryLocalSplit if
/// they don't allocate.		/// they don't allocate.
unsigned RAGreedy::tryBlockSplit(LiveInterval &VirtReg, AllocationOrder &Order,		unsigned RAGreedy::tryBlockSplit(LiveInterval &VirtReg, AllocationOrder &Order,
SmallVectorImpl<unsigned> &NewVRegs) {		SmallVectorImpl<unsigned> &NewVRegs) {
assert(&SA->getParent() == &VirtReg && "Live range wasn't analyzed");		assert(&SA->getParent() == &VirtReg && "Live range wasn't analyzed");
unsigned Reg = VirtReg.reg;		unsigned Reg = VirtReg.reg;
bool SingleInstrs = RegClassInfo.isProperSubClass(MRI->getRegClass(Reg));		bool SingleInstrs = RegClassInfo.isProperSubClass(MRI->getRegClass(Reg));
LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LRE(&VirtReg, NewVRegs, MF, LIS, VRM, this);
SE->reset(LREdit, SplitSpillMode);		SE->reset(LREdit, SplitSpillMode);
ArrayRef<SplitAnalysis::BlockInfo> UseBlocks = SA->getUseBlocks();		ArrayRef<SplitAnalysis::BlockInfo> UseBlocks = SA->getUseBlocks();
for (unsigned i = 0; i != UseBlocks.size(); ++i) {		for (unsigned i = 0; i != UseBlocks.size(); ++i) {
const SplitAnalysis::BlockInfo &BI = UseBlocks[i];		const SplitAnalysis::BlockInfo &BI = UseBlocks[i];
if (SA->shouldSplitSingleBlock(BI, SingleInstrs))		if (SA->shouldSplitSingleBlock(BI, SingleInstrs))
SE->splitSingleBlock(BI);		SE->splitSingleBlock(BI);
}		}
// No blocks were split.		// No blocks were split.
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	RAGreedy::tryInstructionSplit(LiveInterval &VirtReg, AllocationOrder &Order,
SmallVectorImpl<unsigned> &NewVRegs) {		SmallVectorImpl<unsigned> &NewVRegs) {
const TargetRegisterClass *CurRC = MRI->getRegClass(VirtReg.reg);		const TargetRegisterClass *CurRC = MRI->getRegClass(VirtReg.reg);
// There is no point to this if there are no larger sub-classes.		// There is no point to this if there are no larger sub-classes.
if (!RegClassInfo.isProperSubClass(CurRC))		if (!RegClassInfo.isProperSubClass(CurRC))
return 0;		return 0;

// Always enable split spill mode, since we're effectively spilling to a		// Always enable split spill mode, since we're effectively spilling to a
// register.		// register.
LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LRE(&VirtReg, NewVRegs, MF, LIS, VRM, this);
SE->reset(LREdit, SplitEditor::SM_Size);		SE->reset(LREdit, SplitEditor::SM_Size);

ArrayRef<SlotIndex> Uses = SA->getUseSlots();		ArrayRef<SlotIndex> Uses = SA->getUseSlots();
if (Uses.size() <= 1)		if (Uses.size() <= 1)
return 0;		return 0;

DEBUG(dbgs() << "Split around " << Uses.size() << " individual instrs.\n");		DEBUG(dbgs() << "Split around " << Uses.size() << " individual instrs.\n");

▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	unsigned RAGreedy::tryLocalSplit(LiveInterval &VirtReg, AllocationOrder &Order,
// Didn't find any candidates?		// Didn't find any candidates?
if (BestBefore == NumGaps)		if (BestBefore == NumGaps)
return 0;		return 0;

DEBUG(dbgs() << "Best local split range: " << Uses[BestBefore]		DEBUG(dbgs() << "Best local split range: " << Uses[BestBefore]
<< '-' << Uses[BestAfter] << ", " << BestDiff		<< '-' << Uses[BestAfter] << ", " << BestDiff
<< ", " << (BestAfter - BestBefore + 1) << " instrs\n");		<< ", " << (BestAfter - BestBefore + 1) << " instrs\n");

LiveRangeEdit LREdit(&VirtReg, NewVRegs, MF, LIS, VRM, this);		LiveRangeEdit LRE(&VirtReg, NewVRegs, MF, LIS, VRM, this);
SE->reset(LREdit);		SE->reset(LREdit);

SE->openIntv();		SE->openIntv();
SlotIndex SegStart = SE->enterIntvBefore(Uses[BestBefore]);		SlotIndex SegStart = SE->enterIntvBefore(Uses[BestBefore]);
SlotIndex SegStop = SE->leaveIntvAfter(Uses[BestAfter]);		SlotIndex SegStop = SE->leaveIntvAfter(Uses[BestAfter]);
SE->useIntv(SegStart, SegStop);		SE->useIntv(SegStart, SegStop);
SmallVector<unsigned, 8> IntvMap;		SmallVector<unsigned, 8> IntvMap;
SE->finish(&IntvMap);		SE->finish(&IntvMap);
▲ Show 20 Lines • Show All 639 Lines • ▼ Show 20 Lines	if (VerifyEnabled)
MF->verify(this, "After spilling");		MF->verify(this, "After spilling");
}		}

// The live virtual register requesting allocation was spilled, so tell		// The live virtual register requesting allocation was spilled, so tell
// the caller not to allocate anything during this round.		// the caller not to allocate anything during this round.
return 0;		return 0;
}		}

		void RAGreedy::postOptimization() {
		startHoistSpiller(MF, VRM, *LIS, &spiller());
		}
		qcolombetUnsubmitted Not Done Reply Inline Actions spiller().postOptimization() qcolombet: spiller().postOptimization()
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.

bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {		bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {
DEBUG(dbgs() << "******** GREEDY REGISTER ALLOCATION ********\n"		DEBUG(dbgs() << "******** GREEDY REGISTER ALLOCATION ********\n"
<< "********** Function: " << mf.getName() << '\n');		<< "********** Function: " << mf.getName() << '\n');

MF = &mf;		MF = &mf;
TRI = MF->getSubtarget().getRegisterInfo();		TRI = MF->getSubtarget().getRegisterInfo();
TII = MF->getSubtarget().getInstrInfo();		TII = MF->getSubtarget().getInstrInfo();
RCI.runOnMachineFunction(mf);		RCI.runOnMachineFunction(mf);

EnableLocalReassign = EnableLocalReassignment \|\|		EnableLocalReassign = EnableLocalReassignment \|\|
MF->getSubtarget().enableRALocalReassignment(		MF->getSubtarget().enableRALocalReassignment(
MF->getTarget().getOptLevel());		MF->getTarget().getOptLevel());

if (VerifyEnabled)		if (VerifyEnabled)
MF->verify(this, "Before greedy register allocator");		MF->verify(this, "Before greedy register allocator");

RegAllocBase::init(getAnalysis<VirtRegMap>(),		RegAllocBase::init(getAnalysis<VirtRegMap>(),
getAnalysis<LiveIntervals>(),		getAnalysis<LiveIntervals>(),
getAnalysis<LiveRegMatrix>());		getAnalysis<LiveRegMatrix>());
Indexes = &getAnalysis<SlotIndexes>();		Indexes = &getAnalysis<SlotIndexes>();
MBFI = &getAnalysis<MachineBlockFrequencyInfo>();		MBFI = &getAnalysis<MachineBlockFrequencyInfo>();
DomTree = &getAnalysis<MachineDominatorTree>();		DomTree = &getAnalysis<MachineDominatorTree>();
SpillerInstance.reset(createInlineSpiller(this, MF, *VRM));		SpillerInstance.reset(createInlineSpiller(this, MF, *VRM));
		createHoistSpiller(this, MF, *VRM, &spiller());
		qcolombetUnsubmitted Not Done Reply Inline Actions Should be created within the inline spiller. See my comment on createInlineSpiller. qcolombet: Should be created within the inline spiller. See my comment on createInlineSpiller.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
Loops = &getAnalysis<MachineLoopInfo>();		Loops = &getAnalysis<MachineLoopInfo>();
Bundles = &getAnalysis<EdgeBundles>();		Bundles = &getAnalysis<EdgeBundles>();
SpillPlacer = &getAnalysis<SpillPlacement>();		SpillPlacer = &getAnalysis<SpillPlacement>();
DebugVars = &getAnalysis<LiveDebugVariables>();		DebugVars = &getAnalysis<LiveDebugVariables>();

initializeCSRCost();		initializeCSRCost();

calculateSpillWeightsAndHints(LIS, mf, VRM, Loops, *MBFI);		calculateSpillWeightsAndHints(LIS, mf, VRM, Loops, *MBFI);

DEBUG(LIS->dump());		DEBUG(LIS->dump());

SA.reset(new SplitAnalysis(VRM, LIS, *Loops));		SA.reset(new SplitAnalysis(VRM, LIS, *Loops));
SE.reset(new SplitEditor(SA, LIS, VRM, DomTree, *MBFI));		SE.reset(new SplitEditor(SA, LIS, VRM, DomTree, *MBFI));
ExtraRegInfo.clear();		ExtraRegInfo.clear();
ExtraRegInfo.resize(MRI->getNumVirtRegs());		ExtraRegInfo.resize(MRI->getNumVirtRegs());
NextCascade = 1;		NextCascade = 1;
IntfCache.init(MF, Matrix->getLiveUnions(), Indexes, LIS, TRI);		IntfCache.init(MF, Matrix->getLiveUnions(), Indexes, LIS, TRI);
GlobalCand.resize(32); // This will grow as needed.		GlobalCand.resize(32); // This will grow as needed.
SetOfBrokenHints.clear();		SetOfBrokenHints.clear();

allocatePhysRegs();		allocatePhysRegs();
tryHintsRecoloring();		tryHintsRecoloring();
		postOptimization();

releaseMemory();		releaseMemory();
return true;		return true;
}		}

lib/CodeGen/Spiller.h

	Show All 10 Lines
	#define LLVM_LIB_CODEGEN_SPILLER_H			#define LLVM_LIB_CODEGEN_SPILLER_H

	namespace llvm {			namespace llvm {

	class LiveRangeEdit;			class LiveRangeEdit;
	class MachineFunction;			class MachineFunction;
	class MachineFunctionPass;			class MachineFunctionPass;
	class VirtRegMap;			class VirtRegMap;
				class LiveIntervals;

	/// Spiller interface.			/// Spiller interface.
	///			///
	/// Implementations are utility classes which insert spill or remat code on			/// Implementations are utility classes which insert spill or remat code on
	/// demand.			/// demand.
	class Spiller {			class Spiller {
	virtual void anchor();			virtual void anchor();
	public:			public:
	virtual ~Spiller() = 0;			virtual ~Spiller() = 0;

	/// spill - Spill the LRE.getParent() live interval.			/// spill - Spill the LRE.getParent() live interval.
	virtual void spill(LiveRangeEdit &LRE) = 0;			virtual void spill(LiveRangeEdit &LRE) = 0;

	};			};
				qcolombetUnsubmitted Not Done Reply Inline Actions Other spillers out-of-tree may exist and there is little interest in having them to implement a post optimization method if they do not need it. In other words, instead of a pure virtual method, do nothing for the default implementation. qcolombet: Other spillers out-of-tree may exist and there is little interest in having them to implement a…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Make sense. Fixed. wmi: Make sense. Fixed.

	/// Create and return a spiller that will insert spill code directly instead			/// Create and return a spiller that will insert spill code directly instead
				qcolombetUnsubmitted Done Reply Inline Actions Call this method postOptimization and make it a non-abstract method. We do not want the spillers existing out of tree to have to add a default implementation whereas they do not need to do anything. qcolombet: Call this method postOptimization and make it a non-abstract method. We do not want the…
	/// of deferring though VirtRegMap.			/// of deferring though VirtRegMap.
	Spiller *createInlineSpiller(MachineFunctionPass &pass,			Spiller *createInlineSpiller(MachineFunctionPass &pass,
	MachineFunction &mf,			MachineFunction &mf,
	VirtRegMap &vrm);			VirtRegMap &vrm);
				qcolombetUnsubmitted Not Done Reply Inline Actions add a bool here that default to false for using a post optimization. qcolombet: add a bool here that default to false for using a post optimization.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I don't get the intention to add a bool here. Is it used to guard post optimization? why it is needed? wmi: I don't get the intention to add a bool here. Is it used to guard post optimization? why it is…

				void createHoistSpiller(MachineFunctionPass &pass, MachineFunction &mf,
				VirtRegMap &vrm, Spiller *);

				/// startHoistSpiller - create a HoistSpiller object and start to hoist
				/// Spills.
				void startHoistSpiller(MachineFunction &mf, VirtRegMap &vrm,
				LiveIntervals &lis, Spiller *);
				qcolombetUnsubmitted Not Done Reply Inline Actions I was thinking in case we want to test without the post-optimization. But I am fine if it is always enabled. qcolombet: I was thinking in case we want to test without the post-optimization. But I am fine if it is…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok, I leave it there for now. wmi: Ok, I leave it there for now.
				qcolombetUnsubmitted Not Done Reply Inline Actions Get rid of those. qcolombet: Get rid of those.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Done. wmi: Done.
	}			}

	#endif			#endif

test/CodeGen/AArch64/aarch64-deferred-spilling.ll

	;RUN: llc < %s -mtriple=aarch64--linux-android -regalloc=greedy -enable-deferred-spilling=true -mcpu=cortex-a57 \| FileCheck %s --check-prefix=CHECK --check-prefix=DEFERRED
	;RUN: llc < %s -mtriple=aarch64--linux-android -regalloc=greedy -enable-deferred-spilling=false -mcpu=cortex-a57 \| FileCheck %s --check-prefix=CHECK --check-prefix=REGULAR

	; Check that we do not end up with useless spill code.
	;
	; Move to the basic block we are interested in.
	;
	; CHECK: // %if.then.120
	;
	; REGULAR: str w21, [sp, #[[OFFSET:[0-9]+]]] // 4-byte Folded Spill
	; Check that w21 wouldn't need to be spilled since it is never reused.
	; REGULAR-NOT: {{[wx]}}21{{,?}}
	;
	; Check that w22 is used to carry a value through the call.
	; DEFERRED-NOT: str {{[wx]}}22,
	; DEFERRED: mov {{[wx]}}22,
	; DEFERRED-NOT: str {{[wx]}}22,
	;
	; CHECK: bl fprintf
	;
	; DEFERRED-NOT: ldr {{[wx]}}22,
	; DEFERRED: mov {{[wx][0-9]+}}, {{[wx]}}22
	; DEFERRED-NOT: ldr {{[wx]}}22,
	;
	; REGULAR-NOT: {{[wx]}}21{{,?}}
	; REGULAR: ldr w21, [sp, #[[OFFSET]]] // 4-byte Folded Reload
	;
	; End of the basic block we are interested in.
	; CHECK: b
	; CHECK: {{[^:]+}}: // %sw.bb.123

	%struct.__sFILE = type { i8, i32, i32, i32, i32, %struct.__sbuf, i32, i8, i32 (i8), i32 (i8, i8, i32), i64 (i8, i64, i32), i32 (i8, i8, i32), %struct.__sbuf, i8*, i32, [3 x i8], [1 x i8], %struct.__sbuf, i32, i64 }
	%struct.__sbuf = type { i8*, i64 }
	%struct.DState = type { %struct.bz_stream, i32, i8, i32, i8, i32, i32, i32, i32, i32, i8, i32, i32, i32, i32, i32, [256 x i32], i32, [257 x i32], [257 x i32], i32, i16, i8, i32, i32, i32, i32, i32, [256 x i8], [16 x i8], [256 x i8], [4096 x i8], [16 x i32], [18002 x i8], [18002 x i8], [6 x [258 x i8]], [6 x [258 x i32]], [6 x [258 x i32]], [6 x [258 x i32]], [6 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32* }
	%struct.bz_stream = type { i8, i32, i32, i32, i8, i32, i32, i32, i8, i8 (i8, i32, i32), void (i8, i8), i8 }

	@__sF = external global [0 x %struct.__sFILE], align 8
	@.str = private unnamed_addr constant [20 x i8] c"\0A [%d: stuff+mf \00", align 1

	declare i32 @fprintf(%struct.__sFILE* nocapture, i8* nocapture readonly, ...)

	declare void @bar(i32)

	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1)

	define i32 @foo(%struct.DState* %s) {
	entry:
	%state = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 1
	%tmp = load i32, i32* %state, align 4
	%cmp = icmp eq i32 %tmp, 10
	%save_i = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 40
	br i1 %cmp, label %if.end.thread, label %if.end

	if.end.thread: ; preds = %entry
	%save_j = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 41
	%save_t = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 42
	%save_alphaSize = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 43
	%save_nGroups = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 44
	%save_nSelectors = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 45
	%save_EOB = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 46
	%save_groupNo = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 47
	%save_groupPos = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 48
	%save_nextSym = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 49
	%save_nblockMAX = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 50
	%save_nblock = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 51
	%save_es = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 52
	%save_N = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 53
	%save_curr = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 54
	%save_zt = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 55
	%save_zn = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 56
	%save_zvec = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 57
	%save_zj = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 58
	%tmp1 = bitcast i32* %save_i to i8*
	call void @llvm.memset.p0i8.i64(i8* %tmp1, i8 0, i64 108, i32 4, i1 false)
	br label %sw.default

	if.end: ; preds = %entry
	%.pre = load i32, i32* %save_i, align 4
	%save_j3.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 41
	%.pre406 = load i32, i32* %save_j3.phi.trans.insert, align 4
	%save_t4.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 42
	%.pre407 = load i32, i32* %save_t4.phi.trans.insert, align 4
	%save_alphaSize5.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 43
	%.pre408 = load i32, i32* %save_alphaSize5.phi.trans.insert, align 4
	%save_nGroups6.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 44
	%.pre409 = load i32, i32* %save_nGroups6.phi.trans.insert, align 4
	%save_nSelectors7.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 45
	%.pre410 = load i32, i32* %save_nSelectors7.phi.trans.insert, align 4
	%save_EOB8.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 46
	%.pre411 = load i32, i32* %save_EOB8.phi.trans.insert, align 4
	%save_groupNo9.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 47
	%.pre412 = load i32, i32* %save_groupNo9.phi.trans.insert, align 4
	%save_groupPos10.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 48
	%.pre413 = load i32, i32* %save_groupPos10.phi.trans.insert, align 4
	%save_nextSym11.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 49
	%.pre414 = load i32, i32* %save_nextSym11.phi.trans.insert, align 4
	%save_nblockMAX12.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 50
	%.pre415 = load i32, i32* %save_nblockMAX12.phi.trans.insert, align 4
	%save_nblock13.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 51
	%.pre416 = load i32, i32* %save_nblock13.phi.trans.insert, align 4
	%save_es14.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 52
	%.pre417 = load i32, i32* %save_es14.phi.trans.insert, align 4
	%save_N15.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 53
	%.pre418 = load i32, i32* %save_N15.phi.trans.insert, align 4
	%save_curr16.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 54
	%.pre419 = load i32, i32* %save_curr16.phi.trans.insert, align 4
	%save_zt17.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 55
	%.pre420 = load i32, i32* %save_zt17.phi.trans.insert, align 4
	%save_zn18.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 56
	%.pre421 = load i32, i32* %save_zn18.phi.trans.insert, align 4
	%save_zvec19.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 57
	%.pre422 = load i32, i32* %save_zvec19.phi.trans.insert, align 4
	%save_zj20.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 58
	%.pre423 = load i32, i32* %save_zj20.phi.trans.insert, align 4
	switch i32 %tmp, label %sw.default [
	i32 13, label %sw.bb
	i32 14, label %if.end.sw.bb.65_crit_edge
	i32 25, label %if.end.sw.bb.123_crit_edge
	]

	if.end.sw.bb.123_crit_edge: ; preds = %if.end
	%.pre433 = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 8
	br label %sw.bb.123

	if.end.sw.bb.65_crit_edge: ; preds = %if.end
	%bsLive69.phi.trans.insert = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 8
	%.pre426 = load i32, i32* %bsLive69.phi.trans.insert, align 4
	br label %sw.bb.65

	sw.bb: ; preds = %if.end
	%sunkaddr = ptrtoint %struct.DState* %s to i64
	%sunkaddr485 = add i64 %sunkaddr, 8
	%sunkaddr486 = inttoptr i64 %sunkaddr485 to i32*
	store i32 13, i32* %sunkaddr486, align 4
	%bsLive = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 8
	%tmp2 = load i32, i32* %bsLive, align 4
	%cmp28.400 = icmp sgt i32 %tmp2, 7
	br i1 %cmp28.400, label %sw.bb.if.then.29_crit_edge, label %if.end.33.lr.ph

	sw.bb.if.then.29_crit_edge: ; preds = %sw.bb
	%sunkaddr487 = ptrtoint %struct.DState* %s to i64
	%sunkaddr488 = add i64 %sunkaddr487, 32
	%sunkaddr489 = inttoptr i64 %sunkaddr488 to i32*
	%.pre425 = load i32, i32* %sunkaddr489, align 4
	br label %if.then.29

	if.end.33.lr.ph: ; preds = %sw.bb
	%tmp3 = bitcast %struct.DState* %s to %struct.bz_stream**
	%.pre424 = load %struct.bz_stream, %struct.bz_stream* %tmp3, align 8
	%avail_in.phi.trans.insert = getelementptr inbounds %struct.bz_stream, %struct.bz_stream* %.pre424, i64 0, i32 1
	%.pre430 = load i32, i32* %avail_in.phi.trans.insert, align 4
	%tmp4 = add i32 %.pre430, -1
	br label %if.end.33

	if.then.29: ; preds = %while.body.backedge, %sw.bb.if.then.29_crit_edge
	%tmp5 = phi i32 [ %.pre425, %sw.bb.if.then.29_crit_edge ], [ %or, %while.body.backedge ]
	%.lcssa393 = phi i32 [ %tmp2, %sw.bb.if.then.29_crit_edge ], [ %add, %while.body.backedge ]
	%sub = add nsw i32 %.lcssa393, -8
	%shr = lshr i32 %tmp5, %sub
	%and = and i32 %shr, 255
	%sunkaddr491 = ptrtoint %struct.DState* %s to i64
	%sunkaddr492 = add i64 %sunkaddr491, 36
	%sunkaddr493 = inttoptr i64 %sunkaddr492 to i32*
	store i32 %sub, i32* %sunkaddr493, align 4
	%blockSize100k = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 9
	store i32 %and, i32* %blockSize100k, align 4
	%and.off = add nsw i32 %and, -49
	%tmp6 = icmp ugt i32 %and.off, 8
	br i1 %tmp6, label %save_state_and_return, label %if.end.62

	if.end.33: ; preds = %while.body.backedge, %if.end.33.lr.ph
	%lsr.iv482 = phi i32 [ %tmp4, %if.end.33.lr.ph ], [ %lsr.iv.next483, %while.body.backedge ]
	%tmp7 = phi i32 [ %tmp2, %if.end.33.lr.ph ], [ %add, %while.body.backedge ]
	%cmp35 = icmp eq i32 %lsr.iv482, -1
	br i1 %cmp35, label %save_state_and_return, label %if.end.37

	if.end.37: ; preds = %if.end.33
	%tmp8 = bitcast %struct.bz_stream* %.pre424 to i8**
	%sunkaddr494 = ptrtoint %struct.DState* %s to i64
	%sunkaddr495 = add i64 %sunkaddr494, 32
	%sunkaddr496 = inttoptr i64 %sunkaddr495 to i32*
	%tmp9 = load i32, i32* %sunkaddr496, align 4
	%shl = shl i32 %tmp9, 8
	%tmp10 = load i8, i8* %tmp8, align 8
	%tmp11 = load i8, i8* %tmp10, align 1
	%conv = zext i8 %tmp11 to i32
	%or = or i32 %conv, %shl
	store i32 %or, i32* %sunkaddr496, align 4
	%add = add nsw i32 %tmp7, 8
	%sunkaddr497 = ptrtoint %struct.DState* %s to i64
	%sunkaddr498 = add i64 %sunkaddr497, 36
	%sunkaddr499 = inttoptr i64 %sunkaddr498 to i32*
	store i32 %add, i32* %sunkaddr499, align 4
	%incdec.ptr = getelementptr inbounds i8, i8* %tmp10, i64 1
	store i8* %incdec.ptr, i8** %tmp8, align 8
	%sunkaddr500 = ptrtoint %struct.bz_stream* %.pre424 to i64
	%sunkaddr501 = add i64 %sunkaddr500, 8
	%sunkaddr502 = inttoptr i64 %sunkaddr501 to i32*
	store i32 %lsr.iv482, i32* %sunkaddr502, align 4
	%sunkaddr503 = ptrtoint %struct.bz_stream* %.pre424 to i64
	%sunkaddr504 = add i64 %sunkaddr503, 12
	%sunkaddr505 = inttoptr i64 %sunkaddr504 to i32*
	%tmp12 = load i32, i32* %sunkaddr505, align 4
	%inc = add i32 %tmp12, 1
	store i32 %inc, i32* %sunkaddr505, align 4
	%cmp49 = icmp eq i32 %inc, 0
	br i1 %cmp49, label %if.then.51, label %while.body.backedge

	if.then.51: ; preds = %if.end.37
	%sunkaddr506 = ptrtoint %struct.bz_stream* %.pre424 to i64
	%sunkaddr507 = add i64 %sunkaddr506, 16
	%sunkaddr508 = inttoptr i64 %sunkaddr507 to i32*
	%tmp13 = load i32, i32* %sunkaddr508, align 4
	%inc53 = add i32 %tmp13, 1
	store i32 %inc53, i32* %sunkaddr508, align 4
	br label %while.body.backedge

	while.body.backedge: ; preds = %if.then.51, %if.end.37
	%lsr.iv.next483 = add i32 %lsr.iv482, -1
	%cmp28 = icmp sgt i32 %add, 7
	br i1 %cmp28, label %if.then.29, label %if.end.33

	if.end.62: ; preds = %if.then.29
	%sub64 = add nsw i32 %and, -48
	%sunkaddr509 = ptrtoint %struct.DState* %s to i64
	%sunkaddr510 = add i64 %sunkaddr509, 40
	%sunkaddr511 = inttoptr i64 %sunkaddr510 to i32*
	store i32 %sub64, i32* %sunkaddr511, align 4
	br label %sw.bb.65

	sw.bb.65: ; preds = %if.end.62, %if.end.sw.bb.65_crit_edge
	%bsLive69.pre-phi = phi i32* [ %bsLive69.phi.trans.insert, %if.end.sw.bb.65_crit_edge ], [ %bsLive, %if.end.62 ]
	%tmp14 = phi i32 [ %.pre426, %if.end.sw.bb.65_crit_edge ], [ %sub, %if.end.62 ]
	%sunkaddr512 = ptrtoint %struct.DState* %s to i64
	%sunkaddr513 = add i64 %sunkaddr512, 8
	%sunkaddr514 = inttoptr i64 %sunkaddr513 to i32*
	store i32 14, i32* %sunkaddr514, align 4
	%cmp70.397 = icmp sgt i32 %tmp14, 7
	br i1 %cmp70.397, label %if.then.72, label %if.end.82.lr.ph

	if.end.82.lr.ph: ; preds = %sw.bb.65
	%tmp15 = bitcast %struct.DState* %s to %struct.bz_stream**
	%.pre427 = load %struct.bz_stream, %struct.bz_stream* %tmp15, align 8
	%avail_in84.phi.trans.insert = getelementptr inbounds %struct.bz_stream, %struct.bz_stream* %.pre427, i64 0, i32 1
	%.pre431 = load i32, i32* %avail_in84.phi.trans.insert, align 4
	%tmp16 = add i32 %.pre431, -1
	br label %if.end.82

	if.then.72: ; preds = %while.body.68.backedge, %sw.bb.65
	%.lcssa390 = phi i32 [ %tmp14, %sw.bb.65 ], [ %add97, %while.body.68.backedge ]
	%sub76 = add nsw i32 %.lcssa390, -8
	%sunkaddr516 = ptrtoint %struct.DState* %s to i64
	%sunkaddr517 = add i64 %sunkaddr516, 36
	%sunkaddr518 = inttoptr i64 %sunkaddr517 to i32*
	store i32 %sub76, i32* %sunkaddr518, align 4
	%currBlockNo = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 11
	%tmp17 = load i32, i32* %currBlockNo, align 4
	%inc117 = add nsw i32 %tmp17, 1
	store i32 %inc117, i32* %currBlockNo, align 4
	%verbosity = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 12
	%tmp18 = load i32, i32* %verbosity, align 4
	%cmp118 = icmp sgt i32 %tmp18, 1
	br i1 %cmp118, label %if.then.120, label %sw.bb.123, !prof !0

	if.end.82: ; preds = %while.body.68.backedge, %if.end.82.lr.ph
	%lsr.iv480 = phi i32 [ %tmp16, %if.end.82.lr.ph ], [ %lsr.iv.next481, %while.body.68.backedge ]
	%tmp19 = phi i32 [ %tmp14, %if.end.82.lr.ph ], [ %add97, %while.body.68.backedge ]
	%cmp85 = icmp eq i32 %lsr.iv480, -1
	br i1 %cmp85, label %save_state_and_return, label %if.end.88

	if.end.88: ; preds = %if.end.82
	%tmp20 = bitcast %struct.bz_stream* %.pre427 to i8**
	%sunkaddr519 = ptrtoint %struct.DState* %s to i64
	%sunkaddr520 = add i64 %sunkaddr519, 32
	%sunkaddr521 = inttoptr i64 %sunkaddr520 to i32*
	%tmp21 = load i32, i32* %sunkaddr521, align 4
	%shl90 = shl i32 %tmp21, 8
	%tmp22 = load i8, i8* %tmp20, align 8
	%tmp23 = load i8, i8* %tmp22, align 1
	%conv93 = zext i8 %tmp23 to i32
	%or94 = or i32 %conv93, %shl90
	store i32 %or94, i32* %sunkaddr521, align 4
	%add97 = add nsw i32 %tmp19, 8
	%sunkaddr522 = ptrtoint %struct.DState* %s to i64
	%sunkaddr523 = add i64 %sunkaddr522, 36
	%sunkaddr524 = inttoptr i64 %sunkaddr523 to i32*
	store i32 %add97, i32* %sunkaddr524, align 4
	%incdec.ptr100 = getelementptr inbounds i8, i8* %tmp22, i64 1
	store i8* %incdec.ptr100, i8** %tmp20, align 8
	%sunkaddr525 = ptrtoint %struct.bz_stream* %.pre427 to i64
	%sunkaddr526 = add i64 %sunkaddr525, 8
	%sunkaddr527 = inttoptr i64 %sunkaddr526 to i32*
	store i32 %lsr.iv480, i32* %sunkaddr527, align 4
	%sunkaddr528 = ptrtoint %struct.bz_stream* %.pre427 to i64
	%sunkaddr529 = add i64 %sunkaddr528, 12
	%sunkaddr530 = inttoptr i64 %sunkaddr529 to i32*
	%tmp24 = load i32, i32* %sunkaddr530, align 4
	%inc106 = add i32 %tmp24, 1
	store i32 %inc106, i32* %sunkaddr530, align 4
	%cmp109 = icmp eq i32 %inc106, 0
	br i1 %cmp109, label %if.then.111, label %while.body.68.backedge

	if.then.111: ; preds = %if.end.88
	%sunkaddr531 = ptrtoint %struct.bz_stream* %.pre427 to i64
	%sunkaddr532 = add i64 %sunkaddr531, 16
	%sunkaddr533 = inttoptr i64 %sunkaddr532 to i32*
	%tmp25 = load i32, i32* %sunkaddr533, align 4
	%inc114 = add i32 %tmp25, 1
	store i32 %inc114, i32* %sunkaddr533, align 4
	br label %while.body.68.backedge

	while.body.68.backedge: ; preds = %if.then.111, %if.end.88
	%lsr.iv.next481 = add i32 %lsr.iv480, -1
	%cmp70 = icmp sgt i32 %add97, 7
	br i1 %cmp70, label %if.then.72, label %if.end.82

	if.then.120: ; preds = %if.then.72
	%call = tail call i32 (%struct.__sFILE, i8, ...) @fprintf(%struct.__sFILE* getelementptr inbounds ([0 x %struct.__sFILE], [0 x %struct.__sFILE]* @__sF, i64 0, i64 2), i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.str, i64 0, i64 0), i32 %inc117)
	br label %sw.bb.123

	sw.bb.123: ; preds = %if.then.120, %if.then.72, %if.end.sw.bb.123_crit_edge
	%bsLive127.pre-phi = phi i32* [ %.pre433, %if.end.sw.bb.123_crit_edge ], [ %bsLive69.pre-phi, %if.then.72 ], [ %bsLive69.pre-phi, %if.then.120 ]
	%sunkaddr534 = ptrtoint %struct.DState* %s to i64
	%sunkaddr535 = add i64 %sunkaddr534, 8
	%sunkaddr536 = inttoptr i64 %sunkaddr535 to i32*
	store i32 25, i32* %sunkaddr536, align 4
	%tmp26 = load i32, i32* %bsLive127.pre-phi, align 4
	%cmp128.395 = icmp sgt i32 %tmp26, 7
	br i1 %cmp128.395, label %sw.bb.123.if.then.130_crit_edge, label %if.end.140.lr.ph

	sw.bb.123.if.then.130_crit_edge: ; preds = %sw.bb.123
	%sunkaddr537 = ptrtoint %struct.DState* %s to i64
	%sunkaddr538 = add i64 %sunkaddr537, 32
	%sunkaddr539 = inttoptr i64 %sunkaddr538 to i32*
	%.pre429 = load i32, i32* %sunkaddr539, align 4
	br label %if.then.130

	if.end.140.lr.ph: ; preds = %sw.bb.123
	%tmp27 = bitcast %struct.DState* %s to %struct.bz_stream**
	%.pre428 = load %struct.bz_stream, %struct.bz_stream* %tmp27, align 8
	%avail_in142.phi.trans.insert = getelementptr inbounds %struct.bz_stream, %struct.bz_stream* %.pre428, i64 0, i32 1
	%.pre432 = load i32, i32* %avail_in142.phi.trans.insert, align 4
	%tmp28 = add i32 %.pre432, -1
	br label %if.end.140

	if.then.130: ; preds = %while.body.126.backedge, %sw.bb.123.if.then.130_crit_edge
	%tmp29 = phi i32 [ %.pre429, %sw.bb.123.if.then.130_crit_edge ], [ %or152, %while.body.126.backedge ]
	%.lcssa = phi i32 [ %tmp26, %sw.bb.123.if.then.130_crit_edge ], [ %add155, %while.body.126.backedge ]
	%sub134 = add nsw i32 %.lcssa, -8
	%shr135 = lshr i32 %tmp29, %sub134
	store i32 %sub134, i32* %bsLive127.pre-phi, align 4
	%origPtr = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 13
	%tmp30 = load i32, i32* %origPtr, align 4
	%shl175 = shl i32 %tmp30, 8
	%conv176 = and i32 %shr135, 255
	%or177 = or i32 %shl175, %conv176
	store i32 %or177, i32* %origPtr, align 4
	%nInUse = getelementptr inbounds %struct.DState, %struct.DState* %s, i64 0, i32 27
	%tmp31 = load i32, i32* %nInUse, align 4
	%add179 = add nsw i32 %tmp31, 2
	br label %save_state_and_return

	if.end.140: ; preds = %while.body.126.backedge, %if.end.140.lr.ph
	%lsr.iv = phi i32 [ %tmp28, %if.end.140.lr.ph ], [ %lsr.iv.next, %while.body.126.backedge ]
	%tmp32 = phi i32 [ %tmp26, %if.end.140.lr.ph ], [ %add155, %while.body.126.backedge ]
	%cmp143 = icmp eq i32 %lsr.iv, -1
	br i1 %cmp143, label %save_state_and_return, label %if.end.146

	if.end.146: ; preds = %if.end.140
	%tmp33 = bitcast %struct.bz_stream* %.pre428 to i8**
	%sunkaddr541 = ptrtoint %struct.DState* %s to i64
	%sunkaddr542 = add i64 %sunkaddr541, 32
	%sunkaddr543 = inttoptr i64 %sunkaddr542 to i32*
	%tmp34 = load i32, i32* %sunkaddr543, align 4
	%shl148 = shl i32 %tmp34, 8
	%tmp35 = load i8, i8* %tmp33, align 8
	%tmp36 = load i8, i8* %tmp35, align 1
	%conv151 = zext i8 %tmp36 to i32
	%or152 = or i32 %conv151, %shl148
	store i32 %or152, i32* %sunkaddr543, align 4
	%add155 = add nsw i32 %tmp32, 8
	store i32 %add155, i32* %bsLive127.pre-phi, align 4
	%incdec.ptr158 = getelementptr inbounds i8, i8* %tmp35, i64 1
	store i8* %incdec.ptr158, i8** %tmp33, align 8
	%sunkaddr544 = ptrtoint %struct.bz_stream* %.pre428 to i64
	%sunkaddr545 = add i64 %sunkaddr544, 8
	%sunkaddr546 = inttoptr i64 %sunkaddr545 to i32*
	store i32 %lsr.iv, i32* %sunkaddr546, align 4
	%sunkaddr547 = ptrtoint %struct.bz_stream* %.pre428 to i64
	%sunkaddr548 = add i64 %sunkaddr547, 12
	%sunkaddr549 = inttoptr i64 %sunkaddr548 to i32*
	%tmp37 = load i32, i32* %sunkaddr549, align 4
	%inc164 = add i32 %tmp37, 1
	store i32 %inc164, i32* %sunkaddr549, align 4
	%cmp167 = icmp eq i32 %inc164, 0
	br i1 %cmp167, label %if.then.169, label %while.body.126.backedge

	if.then.169: ; preds = %if.end.146
	%sunkaddr550 = ptrtoint %struct.bz_stream* %.pre428 to i64
	%sunkaddr551 = add i64 %sunkaddr550, 16
	%sunkaddr552 = inttoptr i64 %sunkaddr551 to i32*
	%tmp38 = load i32, i32* %sunkaddr552, align 4
	%inc172 = add i32 %tmp38, 1
	store i32 %inc172, i32* %sunkaddr552, align 4
	br label %while.body.126.backedge

	while.body.126.backedge: ; preds = %if.then.169, %if.end.146
	%lsr.iv.next = add i32 %lsr.iv, -1
	%cmp128 = icmp sgt i32 %add155, 7
	br i1 %cmp128, label %if.then.130, label %if.end.140

	sw.default: ; preds = %if.end, %if.end.thread
	%tmp39 = phi i32 [ 0, %if.end.thread ], [ %.pre, %if.end ]
	%tmp40 = phi i32 [ 0, %if.end.thread ], [ %.pre406, %if.end ]
	%tmp41 = phi i32 [ 0, %if.end.thread ], [ %.pre407, %if.end ]
	%tmp42 = phi i32 [ 0, %if.end.thread ], [ %.pre408, %if.end ]
	%tmp43 = phi i32 [ 0, %if.end.thread ], [ %.pre409, %if.end ]
	%tmp44 = phi i32 [ 0, %if.end.thread ], [ %.pre410, %if.end ]
	%tmp45 = phi i32 [ 0, %if.end.thread ], [ %.pre411, %if.end ]
	%tmp46 = phi i32 [ 0, %if.end.thread ], [ %.pre412, %if.end ]
	%tmp47 = phi i32 [ 0, %if.end.thread ], [ %.pre413, %if.end ]
	%tmp48 = phi i32 [ 0, %if.end.thread ], [ %.pre414, %if.end ]
	%tmp49 = phi i32 [ 0, %if.end.thread ], [ %.pre415, %if.end ]
	%tmp50 = phi i32 [ 0, %if.end.thread ], [ %.pre416, %if.end ]
	%tmp51 = phi i32 [ 0, %if.end.thread ], [ %.pre417, %if.end ]
	%tmp52 = phi i32 [ 0, %if.end.thread ], [ %.pre418, %if.end ]
	%tmp53 = phi i32 [ 0, %if.end.thread ], [ %.pre419, %if.end ]
	%tmp54 = phi i32 [ 0, %if.end.thread ], [ %.pre420, %if.end ]
	%tmp55 = phi i32 [ 0, %if.end.thread ], [ %.pre421, %if.end ]
	%tmp56 = phi i32 [ 0, %if.end.thread ], [ %.pre422, %if.end ]
	%tmp57 = phi i32 [ 0, %if.end.thread ], [ %.pre423, %if.end ]
	%save_j3.pre-phi469 = phi i32* [ %save_j, %if.end.thread ], [ %save_j3.phi.trans.insert, %if.end ]
	%save_t4.pre-phi467 = phi i32* [ %save_t, %if.end.thread ], [ %save_t4.phi.trans.insert, %if.end ]
	%save_alphaSize5.pre-phi465 = phi i32* [ %save_alphaSize, %if.end.thread ], [ %save_alphaSize5.phi.trans.insert, %if.end ]
	%save_nGroups6.pre-phi463 = phi i32* [ %save_nGroups, %if.end.thread ], [ %save_nGroups6.phi.trans.insert, %if.end ]
	%save_nSelectors7.pre-phi461 = phi i32* [ %save_nSelectors, %if.end.thread ], [ %save_nSelectors7.phi.trans.insert, %if.end ]
	%save_EOB8.pre-phi459 = phi i32* [ %save_EOB, %if.end.thread ], [ %save_EOB8.phi.trans.insert, %if.end ]
	%save_groupNo9.pre-phi457 = phi i32* [ %save_groupNo, %if.end.thread ], [ %save_groupNo9.phi.trans.insert, %if.end ]
	%save_groupPos10.pre-phi455 = phi i32* [ %save_groupPos, %if.end.thread ], [ %save_groupPos10.phi.trans.insert, %if.end ]
	%save_nextSym11.pre-phi453 = phi i32* [ %save_nextSym, %if.end.thread ], [ %save_nextSym11.phi.trans.insert, %if.end ]
	%save_nblockMAX12.pre-phi451 = phi i32* [ %save_nblockMAX, %if.end.thread ], [ %save_nblockMAX12.phi.trans.insert, %if.end ]
	%save_nblock13.pre-phi449 = phi i32* [ %save_nblock, %if.end.thread ], [ %save_nblock13.phi.trans.insert, %if.end ]
	%save_es14.pre-phi447 = phi i32* [ %save_es, %if.end.thread ], [ %save_es14.phi.trans.insert, %if.end ]
	%save_N15.pre-phi445 = phi i32* [ %save_N, %if.end.thread ], [ %save_N15.phi.trans.insert, %if.end ]
	%save_curr16.pre-phi443 = phi i32* [ %save_curr, %if.end.thread ], [ %save_curr16.phi.trans.insert, %if.end ]
	%save_zt17.pre-phi441 = phi i32* [ %save_zt, %if.end.thread ], [ %save_zt17.phi.trans.insert, %if.end ]
	%save_zn18.pre-phi439 = phi i32* [ %save_zn, %if.end.thread ], [ %save_zn18.phi.trans.insert, %if.end ]
	%save_zvec19.pre-phi437 = phi i32* [ %save_zvec, %if.end.thread ], [ %save_zvec19.phi.trans.insert, %if.end ]
	%save_zj20.pre-phi435 = phi i32* [ %save_zj, %if.end.thread ], [ %save_zj20.phi.trans.insert, %if.end ]
	tail call void @bar(i32 4001)
	br label %save_state_and_return

	save_state_and_return: ; preds = %sw.default, %if.end.140, %if.then.130, %if.end.82, %if.end.33, %if.then.29
	%tmp58 = phi i32 [ %tmp39, %sw.default ], [ %.pre, %if.then.29 ], [ %.pre, %if.then.130 ], [ %.pre, %if.end.140 ], [ %.pre, %if.end.82 ], [ %.pre, %if.end.33 ]
	%tmp59 = phi i32 [ %tmp40, %sw.default ], [ %.pre406, %if.then.29 ], [ %.pre406, %if.then.130 ], [ %.pre406, %if.end.140 ], [ %.pre406, %if.end.82 ], [ %.pre406, %if.end.33 ]
	%tmp60 = phi i32 [ %tmp41, %sw.default ], [ %.pre407, %if.then.29 ], [ %.pre407, %if.then.130 ], [ %.pre407, %if.end.140 ], [ %.pre407, %if.end.82 ], [ %.pre407, %if.end.33 ]
	%tmp61 = phi i32 [ %tmp43, %sw.default ], [ %.pre409, %if.then.29 ], [ %.pre409, %if.then.130 ], [ %.pre409, %if.end.140 ], [ %.pre409, %if.end.82 ], [ %.pre409, %if.end.33 ]
	%tmp62 = phi i32 [ %tmp44, %sw.default ], [ %.pre410, %if.then.29 ], [ %.pre410, %if.then.130 ], [ %.pre410, %if.end.140 ], [ %.pre410, %if.end.82 ], [ %.pre410, %if.end.33 ]
	%tmp63 = phi i32 [ %tmp45, %sw.default ], [ %.pre411, %if.then.29 ], [ %.pre411, %if.then.130 ], [ %.pre411, %if.end.140 ], [ %.pre411, %if.end.82 ], [ %.pre411, %if.end.33 ]
	%tmp64 = phi i32 [ %tmp46, %sw.default ], [ %.pre412, %if.then.29 ], [ %.pre412, %if.then.130 ], [ %.pre412, %if.end.140 ], [ %.pre412, %if.end.82 ], [ %.pre412, %if.end.33 ]
	%tmp65 = phi i32 [ %tmp47, %sw.default ], [ %.pre413, %if.then.29 ], [ %.pre413, %if.then.130 ], [ %.pre413, %if.end.140 ], [ %.pre413, %if.end.82 ], [ %.pre413, %if.end.33 ]
	%tmp66 = phi i32 [ %tmp48, %sw.default ], [ %.pre414, %if.then.29 ], [ %.pre414, %if.then.130 ], [ %.pre414, %if.end.140 ], [ %.pre414, %if.end.82 ], [ %.pre414, %if.end.33 ]
	%tmp67 = phi i32 [ %tmp49, %sw.default ], [ %.pre415, %if.then.29 ], [ %.pre415, %if.then.130 ], [ %.pre415, %if.end.140 ], [ %.pre415, %if.end.82 ], [ %.pre415, %if.end.33 ]
	%tmp68 = phi i32 [ %tmp51, %sw.default ], [ %.pre417, %if.then.29 ], [ %.pre417, %if.then.130 ], [ %.pre417, %if.end.140 ], [ %.pre417, %if.end.82 ], [ %.pre417, %if.end.33 ]
	%tmp69 = phi i32 [ %tmp52, %sw.default ], [ %.pre418, %if.then.29 ], [ %.pre418, %if.then.130 ], [ %.pre418, %if.end.140 ], [ %.pre418, %if.end.82 ], [ %.pre418, %if.end.33 ]
	%tmp70 = phi i32 [ %tmp53, %sw.default ], [ %.pre419, %if.then.29 ], [ %.pre419, %if.then.130 ], [ %.pre419, %if.end.140 ], [ %.pre419, %if.end.82 ], [ %.pre419, %if.end.33 ]
	%tmp71 = phi i32 [ %tmp54, %sw.default ], [ %.pre420, %if.then.29 ], [ %.pre420, %if.then.130 ], [ %.pre420, %if.end.140 ], [ %.pre420, %if.end.82 ], [ %.pre420, %if.end.33 ]
	%tmp72 = phi i32 [ %tmp55, %sw.default ], [ %.pre421, %if.then.29 ], [ %.pre421, %if.then.130 ], [ %.pre421, %if.end.140 ], [ %.pre421, %if.end.82 ], [ %.pre421, %if.end.33 ]
	%tmp73 = phi i32 [ %tmp56, %sw.default ], [ %.pre422, %if.then.29 ], [ %.pre422, %if.then.130 ], [ %.pre422, %if.end.140 ], [ %.pre422, %if.end.82 ], [ %.pre422, %if.end.33 ]
	%tmp74 = phi i32 [ %tmp57, %sw.default ], [ %.pre423, %if.then.29 ], [ %.pre423, %if.then.130 ], [ %.pre423, %if.end.140 ], [ %.pre423, %if.end.82 ], [ %.pre423, %if.end.33 ]
	%save_j3.pre-phi468 = phi i32* [ %save_j3.pre-phi469, %sw.default ], [ %save_j3.phi.trans.insert, %if.then.29 ], [ %save_j3.phi.trans.insert, %if.then.130 ], [ %save_j3.phi.trans.insert, %if.end.140 ], [ %save_j3.phi.trans.insert, %if.end.82 ], [ %save_j3.phi.trans.insert, %if.end.33 ]
	%save_t4.pre-phi466 = phi i32* [ %save_t4.pre-phi467, %sw.default ], [ %save_t4.phi.trans.insert, %if.then.29 ], [ %save_t4.phi.trans.insert, %if.then.130 ], [ %save_t4.phi.trans.insert, %if.end.140 ], [ %save_t4.phi.trans.insert, %if.end.82 ], [ %save_t4.phi.trans.insert, %if.end.33 ]
	%save_alphaSize5.pre-phi464 = phi i32* [ %save_alphaSize5.pre-phi465, %sw.default ], [ %save_alphaSize5.phi.trans.insert, %if.then.29 ], [ %save_alphaSize5.phi.trans.insert, %if.then.130 ], [ %save_alphaSize5.phi.trans.insert, %if.end.140 ], [ %save_alphaSize5.phi.trans.insert, %if.end.82 ], [ %save_alphaSize5.phi.trans.insert, %if.end.33 ]
	%save_nGroups6.pre-phi462 = phi i32* [ %save_nGroups6.pre-phi463, %sw.default ], [ %save_nGroups6.phi.trans.insert, %if.then.29 ], [ %save_nGroups6.phi.trans.insert, %if.then.130 ], [ %save_nGroups6.phi.trans.insert, %if.end.140 ], [ %save_nGroups6.phi.trans.insert, %if.end.82 ], [ %save_nGroups6.phi.trans.insert, %if.end.33 ]
	%save_nSelectors7.pre-phi460 = phi i32* [ %save_nSelectors7.pre-phi461, %sw.default ], [ %save_nSelectors7.phi.trans.insert, %if.then.29 ], [ %save_nSelectors7.phi.trans.insert, %if.then.130 ], [ %save_nSelectors7.phi.trans.insert, %if.end.140 ], [ %save_nSelectors7.phi.trans.insert, %if.end.82 ], [ %save_nSelectors7.phi.trans.insert, %if.end.33 ]
	%save_EOB8.pre-phi458 = phi i32* [ %save_EOB8.pre-phi459, %sw.default ], [ %save_EOB8.phi.trans.insert, %if.then.29 ], [ %save_EOB8.phi.trans.insert, %if.then.130 ], [ %save_EOB8.phi.trans.insert, %if.end.140 ], [ %save_EOB8.phi.trans.insert, %if.end.82 ], [ %save_EOB8.phi.trans.insert, %if.end.33 ]
	%save_groupNo9.pre-phi456 = phi i32* [ %save_groupNo9.pre-phi457, %sw.default ], [ %save_groupNo9.phi.trans.insert, %if.then.29 ], [ %save_groupNo9.phi.trans.insert, %if.then.130 ], [ %save_groupNo9.phi.trans.insert, %if.end.140 ], [ %save_groupNo9.phi.trans.insert, %if.end.82 ], [ %save_groupNo9.phi.trans.insert, %if.end.33 ]
	%save_groupPos10.pre-phi454 = phi i32* [ %save_groupPos10.pre-phi455, %sw.default ], [ %save_groupPos10.phi.trans.insert, %if.then.29 ], [ %save_groupPos10.phi.trans.insert, %if.then.130 ], [ %save_groupPos10.phi.trans.insert, %if.end.140 ], [ %save_groupPos10.phi.trans.insert, %if.end.82 ], [ %save_groupPos10.phi.trans.insert, %if.end.33 ]
	%save_nextSym11.pre-phi452 = phi i32* [ %save_nextSym11.pre-phi453, %sw.default ], [ %save_nextSym11.phi.trans.insert, %if.then.29 ], [ %save_nextSym11.phi.trans.insert, %if.then.130 ], [ %save_nextSym11.phi.trans.insert, %if.end.140 ], [ %save_nextSym11.phi.trans.insert, %if.end.82 ], [ %save_nextSym11.phi.trans.insert, %if.end.33 ]
	%save_nblockMAX12.pre-phi450 = phi i32* [ %save_nblockMAX12.pre-phi451, %sw.default ], [ %save_nblockMAX12.phi.trans.insert, %if.then.29 ], [ %save_nblockMAX12.phi.trans.insert, %if.then.130 ], [ %save_nblockMAX12.phi.trans.insert, %if.end.140 ], [ %save_nblockMAX12.phi.trans.insert, %if.end.82 ], [ %save_nblockMAX12.phi.trans.insert, %if.end.33 ]
	%save_nblock13.pre-phi448 = phi i32* [ %save_nblock13.pre-phi449, %sw.default ], [ %save_nblock13.phi.trans.insert, %if.then.29 ], [ %save_nblock13.phi.trans.insert, %if.then.130 ], [ %save_nblock13.phi.trans.insert, %if.end.140 ], [ %save_nblock13.phi.trans.insert, %if.end.82 ], [ %save_nblock13.phi.trans.insert, %if.end.33 ]
	%save_es14.pre-phi446 = phi i32* [ %save_es14.pre-phi447, %sw.default ], [ %save_es14.phi.trans.insert, %if.then.29 ], [ %save_es14.phi.trans.insert, %if.then.130 ], [ %save_es14.phi.trans.insert, %if.end.140 ], [ %save_es14.phi.trans.insert, %if.end.82 ], [ %save_es14.phi.trans.insert, %if.end.33 ]
	%save_N15.pre-phi444 = phi i32* [ %save_N15.pre-phi445, %sw.default ], [ %save_N15.phi.trans.insert, %if.then.29 ], [ %save_N15.phi.trans.insert, %if.then.130 ], [ %save_N15.phi.trans.insert, %if.end.140 ], [ %save_N15.phi.trans.insert, %if.end.82 ], [ %save_N15.phi.trans.insert, %if.end.33 ]
	%save_curr16.pre-phi442 = phi i32* [ %save_curr16.pre-phi443, %sw.default ], [ %save_curr16.phi.trans.insert, %if.then.29 ], [ %save_curr16.phi.trans.insert, %if.then.130 ], [ %save_curr16.phi.trans.insert, %if.end.140 ], [ %save_curr16.phi.trans.insert, %if.end.82 ], [ %save_curr16.phi.trans.insert, %if.end.33 ]
	%save_zt17.pre-phi440 = phi i32* [ %save_zt17.pre-phi441, %sw.default ], [ %save_zt17.phi.trans.insert, %if.then.29 ], [ %save_zt17.phi.trans.insert, %if.then.130 ], [ %save_zt17.phi.trans.insert, %if.end.140 ], [ %save_zt17.phi.trans.insert, %if.end.82 ], [ %save_zt17.phi.trans.insert, %if.end.33 ]
	%save_zn18.pre-phi438 = phi i32* [ %save_zn18.pre-phi439, %sw.default ], [ %save_zn18.phi.trans.insert, %if.then.29 ], [ %save_zn18.phi.trans.insert, %if.then.130 ], [ %save_zn18.phi.trans.insert, %if.end.140 ], [ %save_zn18.phi.trans.insert, %if.end.82 ], [ %save_zn18.phi.trans.insert, %if.end.33 ]
	%save_zvec19.pre-phi436 = phi i32* [ %save_zvec19.pre-phi437, %sw.default ], [ %save_zvec19.phi.trans.insert, %if.then.29 ], [ %save_zvec19.phi.trans.insert, %if.then.130 ], [ %save_zvec19.phi.trans.insert, %if.end.140 ], [ %save_zvec19.phi.trans.insert, %if.end.82 ], [ %save_zvec19.phi.trans.insert, %if.end.33 ]
	%save_zj20.pre-phi434 = phi i32* [ %save_zj20.pre-phi435, %sw.default ], [ %save_zj20.phi.trans.insert, %if.then.29 ], [ %save_zj20.phi.trans.insert, %if.then.130 ], [ %save_zj20.phi.trans.insert, %if.end.140 ], [ %save_zj20.phi.trans.insert, %if.end.82 ], [ %save_zj20.phi.trans.insert, %if.end.33 ]
	%nblock.1 = phi i32 [ %tmp50, %sw.default ], [ %.pre416, %if.then.29 ], [ 0, %if.then.130 ], [ %.pre416, %if.end.140 ], [ %.pre416, %if.end.82 ], [ %.pre416, %if.end.33 ]
	%alphaSize.1 = phi i32 [ %tmp42, %sw.default ], [ %.pre408, %if.then.29 ], [ %add179, %if.then.130 ], [ %.pre408, %if.end.140 ], [ %.pre408, %if.end.82 ], [ %.pre408, %if.end.33 ]
	%retVal.0 = phi i32 [ 0, %sw.default ], [ -5, %if.then.29 ], [ -4, %if.then.130 ], [ 0, %if.end.140 ], [ 0, %if.end.82 ], [ 0, %if.end.33 ]
	store i32 %tmp58, i32* %save_i, align 4
	store i32 %tmp59, i32* %save_j3.pre-phi468, align 4
	store i32 %tmp60, i32* %save_t4.pre-phi466, align 4
	store i32 %alphaSize.1, i32* %save_alphaSize5.pre-phi464, align 4
	store i32 %tmp61, i32* %save_nGroups6.pre-phi462, align 4
	store i32 %tmp62, i32* %save_nSelectors7.pre-phi460, align 4
	store i32 %tmp63, i32* %save_EOB8.pre-phi458, align 4
	store i32 %tmp64, i32* %save_groupNo9.pre-phi456, align 4
	store i32 %tmp65, i32* %save_groupPos10.pre-phi454, align 4
	store i32 %tmp66, i32* %save_nextSym11.pre-phi452, align 4
	store i32 %tmp67, i32* %save_nblockMAX12.pre-phi450, align 4
	store i32 %nblock.1, i32* %save_nblock13.pre-phi448, align 4
	store i32 %tmp68, i32* %save_es14.pre-phi446, align 4
	store i32 %tmp69, i32* %save_N15.pre-phi444, align 4
	store i32 %tmp70, i32* %save_curr16.pre-phi442, align 4
	store i32 %tmp71, i32* %save_zt17.pre-phi440, align 4
	store i32 %tmp72, i32* %save_zn18.pre-phi438, align 4
	store i32 %tmp73, i32* %save_zvec19.pre-phi436, align 4
	store i32 %tmp74, i32* %save_zj20.pre-phi434, align 4
	ret i32 %retVal.0
	}

	!0 = !{!"branch_weights", i32 10, i32 1}

test/CodeGen/ARM/subreg-remat.ll

	; RUN: llc < %s -relocation-model=pic -disable-fp-elim -mcpu=cortex-a8 -pre-RA-sched=source -no-integrated-as \| FileCheck %s			; RUN: llc < %s -relocation-model=pic -disable-fp-elim -mcpu=cortex-a8 -pre-RA-sched=source -no-integrated-as \| FileCheck %s
	target triple = "thumbv7-apple-ios"			target triple = "thumbv7-apple-ios"
	; <rdar://problem/10032939>			; <rdar://problem/10032939>
	;			;
	; The vector %v2 is built like this:			; The vector %v2 is built like this:
	;			;
	; %vreg6:ssub_1<def> = ...			; %vreg6:ssub_1<def> = ...
	; %vreg6:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg; mem:LD4[ConstantPool] DPR_VFP2:%vreg6			; %vreg6:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg; mem:LD4[ConstantPool] DPR_VFP2:%vreg6
	;			;
	; When %vreg6 spills, the VLDRS constant pool load cannot be rematerialized			; When %vreg6 spills, the VLDRS constant pool load cannot be rematerialized
	; since it implicitly reads the ssub_1 sub-register.			; since it implicitly reads the ssub_1 sub-register.
	;			;
	; CHECK: f1			; CHECK: f1
	; CHECK: vmov d0, r0, r0			; CHECK: vmov d1, r0, r0
	; CHECK: vldr s1, LCPI			; CHECK: vldr s3, LCPI
	; The vector must be spilled:			; The vector must be spilled:
	; CHECK: vstr d0,			; CHECK: vstr d1,
	; CHECK: asm clobber d0			; CHECK: asm clobber d0
	; And reloaded after the asm:			; And reloaded after the asm:
	; CHECK: vldr [[D16:d[0-9]+]],			; CHECK: vldr [[D16:d[0-9]+]],
	; CHECK: vstr [[D16]], [r1]			; CHECK: vstr [[D16]], [r1]
	define void @f1(float %x, <2 x float>* %p) {			define void @f1(float %x, <2 x float>* %p) {
	%v1 = insertelement <2 x float> undef, float %x, i32 0			%v1 = insertelement <2 x float> undef, float %x, i32 0
	%v2 = insertelement <2 x float> %v1, float 0x400921FB60000000, i32 1			%v2 = insertelement <2 x float> %v1, float 0x400921FB60000000, i32 1
	%y = call double asm sideeffect "asm clobber $0", "=w,0,~{d1},~{d2},~{d3},~{d4},~{d5},~{d6},~{d7},~{d8},~{d9},~{d10},~{d11},~{d12},~{d13},~{d14},~{d15},~{d16},~{d17},~{d18},~{d19},~{d20},~{d21},~{d22},~{d23},~{d24},~{d25},~{d26},~{d27},~{d28},~{d29},~{d30},~{d31}"(<2 x float> %v2) nounwind			%y = call double asm sideeffect "asm clobber $0", "=w,0,~{d1},~{d2},~{d3},~{d4},~{d5},~{d6},~{d7},~{d8},~{d9},~{d10},~{d11},~{d12},~{d13},~{d14},~{d15},~{d16},~{d17},~{d18},~{d19},~{d20},~{d21},~{d22},~{d23},~{d24},~{d25},~{d26},~{d27},~{d28},~{d29},~{d30},~{d31}"(<2 x float> %v2) nounwind
	Show All 27 Lines

test/CodeGen/SPARC/spill.ll

	; RUN: llc -march=sparc < %s \| FileCheck %s			; RUN: llc -march=sparc < %s \| FileCheck %s

	;; Ensure that spills and reloads work for various types on			;; Ensure that spills and reloads work for various types on
	;; sparcv8.			;; sparcv8.

	;; For i32/i64 tests, use an asm statement which clobbers most			;; For i32/i64 tests, use an asm statement which clobbers most
	;; registers to ensure the spill will happen.			;; registers to ensure the spill will happen.

	; CHECK-LABEL: test_i32_spill:			; CHECK-LABEL: test_i32_spill:
	; CHECK: and %i0, %i1, %o0			; CHECK: and %i0, %i1, %i0
	; CHECK: st %o0, [%fp+{{.+}}]			; CHECK: mov %i0, %o0
				; CHECK: st %i0, [%fp+{{.+}}]
	; CHECK: add %o0, %o0, %g0			; CHECK: add %o0, %o0, %g0
	; CHECK: ld [%fp+{{.+}}, %i0			; CHECK: ld [%fp+{{.+}}, %i0
	define i32 @test_i32_spill(i32 %a, i32 %b) {			define i32 @test_i32_spill(i32 %a, i32 %b) {
	entry:			entry:
	%r0 = and i32 %a, %b			%r0 = and i32 %a, %b
	; The clobber list has all registers except g0/o0. (Only o0 is usable.)			; The clobber list has all registers except g0/o0. (Only o0 is usable.)
	%0 = call i32 asm sideeffect "add $0,$1,%g0", "=r,0,~{i0},~{i1},~{i2},~{i3},~{i4},~{i5},~{i6},~{i7},~{g1},~{g2},~{g3},~{g4},~{g5},~{g6},~{g7},~{l0},~{l1},~{l2},~{l3},~{l4},~{l5},~{l6},~{l7},~{o1},~{o2},~{o3},~{o4},~{o5},~{o6},~{o7}"(i32 %r0)			%0 = call i32 asm sideeffect "add $0,$1,%g0", "=r,0,~{i0},~{i1},~{i2},~{i3},~{i4},~{i5},~{i6},~{i7},~{g1},~{g2},~{g3},~{g4},~{g5},~{g6},~{g7},~{l0},~{l1},~{l2},~{l3},~{l4},~{l5},~{l6},~{l7},~{o1},~{o2},~{o3},~{o4},~{o5},~{o6},~{o7}"(i32 %r0)
	ret i32 %r0			ret i32 %r0
	}			}

	; CHECK-LABEL: test_i64_spill:			; CHECK-LABEL: test_i64_spill:
	; CHECK: and %i0, %i2, %o0			; CHECK: and %i0, %i2, %i4
	; CHECK: and %i1, %i3, %o1			; CHECK: and %i1, %i3, %i5
	; CHECK: std %o0, [%fp+{{.+}}]			; CHECK: mov %i4, %o0
				; CHECK: mov %i5, %o1
				; CHECK: std %i4, [%fp+{{.+}}]
	; CHECK: add %o0, %o0, %g0			; CHECK: add %o0, %o0, %g0
	; CHECK: ldd [%fp+{{.+}}, %i0			; CHECK: ldd [%fp+{{.+}}, %i0
	define i64 @test_i64_spill(i64 %a, i64 %b) {			define i64 @test_i64_spill(i64 %a, i64 %b) {
	entry:			entry:
	%r0 = and i64 %a, %b			%r0 = and i64 %a, %b
	; The clobber list has all registers except g0,g1,o0,o1. (Only o0/o1 are a usable pair)			; The clobber list has all registers except g0,g1,o0,o1. (Only o0/o1 are a usable pair)
	; So, o0/o1 must be used.			; So, o0/o1 must be used.
	%0 = call i64 asm sideeffect "add $0,$1,%g0", "=r,0,~{i0},~{i1},~{i2},~{i3},~{i4},~{i5},~{i6},~{i7},~{g2},~{g3},~{g4},~{g5},~{g6},~{g7},~{l0},~{l1},~{l2},~{l3},~{l4},~{l5},~{l6},~{l7},~{o2},~{o3},~{o4},~{o5},~{o7}"(i64 %r0)			%0 = call i64 asm sideeffect "add $0,$1,%g0", "=r,0,~{i0},~{i1},~{i2},~{i3},~{i4},~{i5},~{i6},~{i7},~{g2},~{g3},~{g4},~{g5},~{g6},~{g7},~{l0},~{l1},~{l2},~{l3},~{l4},~{l5},~{l6},~{l7},~{o2},~{o3},~{o4},~{o5},~{o7}"(i64 %r0)
	Show All 31 Lines

test/CodeGen/X86/avx512-bugfix-25270.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl \| FileCheck %s

	declare void @Print__512(<16 x i32>) #0			declare void @Print__512(<16 x i32>) #0

	define void @bar__512(<16 x i32>* %var) #0 {			define void @bar__512(<16 x i32>* %var) #0 {
	; CHECK-LABEL: bar__512:			; CHECK-LABEL: bar__512:
	; CHECK: ## BB#0: ## %allocas			; CHECK: ## BB#0: ## %allocas
	; CHECK-NEXT: pushq %rbx			; CHECK-NEXT: pushq %rbx
	; CHECK-NEXT: subq $112, %rsp			; CHECK-NEXT: subq $112, %rsp
	; CHECK-NEXT: movq %rdi, %rbx			; CHECK-NEXT: movq %rdi, %rbx
	; CHECK-NEXT: vmovdqu32 (%rbx), %zmm0			; CHECK-NEXT: vmovdqu32 (%rbx), %zmm0
	; CHECK-NEXT: vmovups %zmm0, (%rsp) ## 64-byte Spill
	; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %zmm1			; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %zmm1
	; CHECK-NEXT: vmovdqa32 %zmm1, (%rbx)			; CHECK-NEXT: vmovdqa32 %zmm1, (%rbx)
				; CHECK-NEXT: vmovups %zmm0, (%rsp) ## 64-byte Spill
	; CHECK-NEXT: callq _Print__512			; CHECK-NEXT: callq _Print__512
	; CHECK-NEXT: vmovups (%rsp), %zmm0 ## 64-byte Reload			; CHECK-NEXT: vmovups (%rsp), %zmm0 ## 64-byte Reload
	; CHECK-NEXT: callq _Print__512			; CHECK-NEXT: callq _Print__512
	; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0			; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %zmm0
	; CHECK-NEXT: vmovdqa32 %zmm0, (%rbx)			; CHECK-NEXT: vmovdqa32 %zmm0, (%rbx)
	; CHECK-NEXT: addq $112, %rsp			; CHECK-NEXT: addq $112, %rsp
	; CHECK-NEXT: popq %rbx			; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	Show All 12 Lines

test/CodeGen/X86/fold-push.ll

	; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=CHECK -check-prefix=NORMAL			; RUN: llc < %s -mtriple=i686-windows \| FileCheck %s -check-prefix=CHECK -check-prefix=NORMAL
	; RUN: llc < %s -mtriple=i686-windows -mattr=call-reg-indirect \| FileCheck %s -check-prefix=CHECK -check-prefix=SLM			; RUN: llc < %s -mtriple=i686-windows -mattr=call-reg-indirect \| FileCheck %s -check-prefix=CHECK -check-prefix=SLM

	declare void @foo(i32 %r)			declare void @foo(i32 %r)

	define void @test(i32 %a, i32 %b) optsize nounwind {			define void @test(i32 %a, i32 %b) optsize nounwind {
	; CHECK-LABEL: test:			; CHECK-LABEL: test:
	; CHECK: movl [[EAX:%e..]], (%esp)			; CHECK: addl
	; CHECK-NEXT: pushl [[EAX]]			; CHECK-NEXT: pushl [[EAX:%e..]]
				; CHECK-NEXT: movl [[EAX]], 4(%esp)
	; CHECK-NEXT: calll			; CHECK-NEXT: calll
	; CHECK-NEXT: addl $4, %esp			; CHECK-NEXT: addl $4, %esp
	; CHECK: nop			; CHECK: nop
	; NORMAL: pushl (%esp)			; NORMAL: pushl (%esp)
	; SLM: movl (%esp), [[RELOAD:%e..]]			; SLM: movl (%esp), [[RELOAD:%e..]]
	; SLM-NEXT: pushl [[RELOAD]]			; SLM-NEXT: pushl [[RELOAD]]
	; CHECK: calll			; CHECK: calll
	; CHECK-NEXT: addl $4, %esp			; CHECK-NEXT: addl $4, %esp
	%c = add i32 %a, %b			%c = add i32 %a, %b
	call void @foo(i32 %c)			call void @foo(i32 %c)
	call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()			call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()
	call void @foo(i32 %c)			call void @foo(i32 %c)
	ret void			ret void
	}			}

	define void @test_min(i32 %a, i32 %b) minsize nounwind {			define void @test_min(i32 %a, i32 %b) minsize nounwind {
	; CHECK-LABEL: test_min:			; CHECK-LABEL: test_min:
	; CHECK: movl [[EAX:%e..]], (%esp)			; CHECK: addl
	; CHECK-NEXT: pushl [[EAX]]			; CHECK-NEXT: pushl [[EAX:%e..]]
				; CHECK-NEXT: movl [[EAX]], 4(%esp)
	; CHECK-NEXT: calll			; CHECK-NEXT: calll
	; CHECK-NEXT: popl			; CHECK-NEXT: popl
	; CHECK: nop			; CHECK: nop
	; CHECK: pushl (%esp)			; CHECK: pushl (%esp)
	; CHECK: calll			; CHECK: calll
	; CHECK-NEXT: popl			; CHECK-NEXT: popl
	%c = add i32 %a, %b			%c = add i32 %a, %b
	call void @foo(i32 %c)			call void @foo(i32 %c)
	call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()			call void asm sideeffect "nop", "~{ax},~{bx},~{cx},~{dx},~{bp},~{si},~{di}"()
	call void @foo(i32 %c)			call void @foo(i32 %c)
	ret void			ret void
	}			}

test/CodeGen/X86/hoist-spill.ll

				; RUN: llc < %s \| grep 'Spill' \|sed 's%.$-[0-9]\+(\%rsp)$.%\1%g' \|sort \|uniq -d \|awk '{if (/rsp/); exit -1}'
				; Check no spills to the same stack slot after hoisting.
				qcolombetUnsubmitted Not Done Reply Inline Actions Make this a file check test. qcolombet: Make this a file check test.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I felt the file check test was not as general as the above test, but filecheck can still work, so I switch to file check here. wmi: I felt the file check test was not as general as the above test, but filecheck can still work…

				qcolombetUnsubmitted Not Done Reply Inline Actions You could check where the spills actually are. But it already looks pretty good now :). qcolombet: You could check where the spills actually are. But it already looks pretty good now :).
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@a = external global i32*, align 8
				@b = external global i32, align 4
				@d = external global i32*, align 8

				; Function Attrs: norecurse noreturn nounwind uwtable
				define void @fn1(i32 %p1) #0 {
				entry:
				%tmp = load i32, i32* @d, align 8
				%tmp1 = load i32, i32* @a, align 8
				%tmp2 = sext i32 %p1 to i64
				br label %for.cond

				for.cond: ; preds = %for.inc14, %entry
				%indvar = phi i32 [ %indvar.next, %for.inc14 ], [ 0, %entry ]
				%indvars.iv30.in = phi i32 [ %indvars.iv30, %for.inc14 ], [ %p1, %entry ]
				%c.0 = phi i32 [ %inc15, %for.inc14 ], [ 1, %entry ]
				%k.0 = phi i32 [ %k.1.lcssa, %for.inc14 ], [ undef, %entry ]
				%tmp3 = icmp sgt i32 undef, 0
				%smax52 = select i1 %tmp3, i32 undef, i32 0
				%tmp4 = zext i32 %smax52 to i64
				%tmp5 = icmp sgt i64 undef, %tmp4
				%smax53 = select i1 %tmp5, i64 undef, i64 %tmp4
				%tmp6 = add nsw i64 %smax53, 1
				%tmp7 = sub nsw i64 %tmp6, %tmp4
				%tmp8 = add nsw i64 %tmp7, -8
				%tmp9 = sub i32 undef, %indvar
				%tmp10 = icmp sgt i64 %tmp2, 0
				%smax40 = select i1 %tmp10, i64 %tmp2, i64 0
				%scevgep41 = getelementptr i32, i32* %tmp1, i64 %smax40
				%indvars.iv30 = add i32 %indvars.iv30.in, -1
				%tmp11 = icmp sgt i32 %indvars.iv30, 0
				%smax = select i1 %tmp11, i32 %indvars.iv30, i32 0
				%tmp12 = zext i32 %smax to i64
				%sub = sub nsw i32 %p1, %c.0
				%cmp = icmp sgt i32 %sub, 0
				%sub. = select i1 %cmp, i32 %sub, i32 0
				%cmp326 = icmp sgt i32 %k.0, %p1
				br i1 %cmp326, label %for.cond4.preheader, label %for.body.preheader

				for.body.preheader: ; preds = %for.cond
				br label %for.body

				for.cond4.preheader: ; preds = %for.body, %for.cond
				%k.1.lcssa = phi i32 [ %k.0, %for.cond ], [ %add, %for.body ]
				%cmp528 = icmp sgt i32 %sub., %p1
				br i1 %cmp528, label %for.inc14, label %for.body6.preheader

				for.body6.preheader: ; preds = %for.cond4.preheader
				br i1 undef, label %for.body6, label %min.iters.checked

				min.iters.checked: ; preds = %for.body6.preheader
				br i1 undef, label %for.body6, label %vector.memcheck

				vector.memcheck: ; preds = %min.iters.checked
				%bound1 = icmp ule i32* undef, %scevgep41
				%memcheck.conflict = and i1 undef, %bound1
				br i1 %memcheck.conflict, label %for.body6, label %vector.body.preheader

				vector.body.preheader: ; preds = %vector.memcheck
				%lcmp.mod = icmp eq i64 undef, 0
				br i1 %lcmp.mod, label %vector.body.preheader.split, label %vector.body.prol

				vector.body.prol: ; preds = %vector.body.prol, %vector.body.preheader
				%prol.iter.cmp = icmp eq i64 undef, 0
				br i1 %prol.iter.cmp, label %vector.body.preheader.split, label %vector.body.prol

				vector.body.preheader.split: ; preds = %vector.body.prol, %vector.body.preheader
				%tmp13 = icmp ult i64 %tmp8, 24
				br i1 %tmp13, label %middle.block, label %vector.body

				vector.body: ; preds = %vector.body, %vector.body.preheader.split
				%index = phi i64 [ %index.next.3, %vector.body ], [ 0, %vector.body.preheader.split ]
				%index.next = add i64 %index, 8
				%offset.idx.1 = add i64 %tmp12, %index.next
				%tmp14 = getelementptr inbounds i32, i32* %tmp, i64 %offset.idx.1
				%tmp15 = bitcast i32* %tmp14 to <4 x i32>*
				%wide.load.1 = load <4 x i32>, <4 x i32>* %tmp15, align 4
				%tmp16 = getelementptr inbounds i32, i32* %tmp1, i64 %offset.idx.1
				%tmp17 = bitcast i32* %tmp16 to <4 x i32>*
				store <4 x i32> %wide.load.1, <4 x i32>* %tmp17, align 4
				%index.next.3 = add i64 %index, 32
				br i1 undef, label %middle.block, label %vector.body

				middle.block: ; preds = %vector.body, %vector.body.preheader.split
				br i1 undef, label %for.inc14, label %for.body6

				for.body: ; preds = %for.body, %for.body.preheader
				%k.127 = phi i32 [ %k.0, %for.body.preheader ], [ %add, %for.body ]
				%add = add nsw i32 %k.127, 1
				%tmp18 = load i32, i32* undef, align 4
				store i32 %tmp18, i32* @b, align 4
				br i1 undef, label %for.body, label %for.cond4.preheader

				for.body6: ; preds = %for.body6, %middle.block, %vector.memcheck, %min.iters.checked, %for.body6.preheader
				%indvars.iv32 = phi i64 [ undef, %for.body6 ], [ %tmp12, %vector.memcheck ], [ %tmp12, %min.iters.checked ], [ %tmp12, %for.body6.preheader ], [ undef, %middle.block ]
				%arrayidx8 = getelementptr inbounds i32, i32* %tmp, i64 %indvars.iv32
				%tmp19 = load i32, i32* %arrayidx8, align 4
				%arrayidx10 = getelementptr inbounds i32, i32* %tmp1, i64 %indvars.iv32
				store i32 %tmp19, i32* %arrayidx10, align 4
				%cmp5 = icmp slt i64 %indvars.iv32, undef
				br i1 %cmp5, label %for.body6, label %for.inc14

				for.inc14: ; preds = %for.body6, %middle.block, %for.cond4.preheader
				%inc15 = add nuw nsw i32 %c.0, 1
				%indvar.next = add i32 %indvar, 1
				br label %for.cond
				}

				attributes #0 = { norecurse noreturn nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				qcolombetUnsubmitted Not Done Reply Inline Actions Get rid of the attributes if they are not actually needed. qcolombet: Get rid of the attributes if they are not actually needed.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.

test/CodeGen/X86/new-remat.ll

				; RUN: llc < %s \| FileCheck %s
				; Check all spills are rematerialized.
				; CHECK-NOT: Spill

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@b = common global double 0.000000e+00, align 8
				@a = common global i32 0, align 4

				; Function Attrs: nounwind uwtable
				define i32 @uniform_testdata(i32 %p1) #0 {
				entry:
				qcolombetUnsubmitted Not Done Reply Inline Actions Use opt -instnamer to get rid of the %[0-9]+ variables. qcolombet: Use opt -instnamer to get rid of the %[0-9]+ variables.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
				%cmp3 = icmp sgt i32 %p1, 0
				br i1 %cmp3, label %for.body.preheader, label %for.end

				for.body.preheader: ; preds = %entry
				%0 = add i32 %p1, -1
				%xtraiter = and i32 %p1, 7
				%lcmp.mod = icmp eq i32 %xtraiter, 0
				br i1 %lcmp.mod, label %for.body.preheader.split, label %for.body.prol.preheader

				for.body.prol.preheader: ; preds = %for.body.preheader
				br label %for.body.prol

				for.body.prol: ; preds = %for.body.prol.preheader, %for.body.prol
				%i.04.prol = phi i32 [ %inc.prol, %for.body.prol ], [ 0, %for.body.prol.preheader ]
				%prol.iter = phi i32 [ %prol.iter.sub, %for.body.prol ], [ %xtraiter, %for.body.prol.preheader ]
				%1 = load double, double* @b, align 8
				%call.prol = tail call double @pow(double %1, double 2.500000e-01) #2
				%inc.prol = add nuw nsw i32 %i.04.prol, 1
				%prol.iter.sub = add i32 %prol.iter, -1
				%prol.iter.cmp = icmp eq i32 %prol.iter.sub, 0
				br i1 %prol.iter.cmp, label %for.body.preheader.split.loopexit, label %for.body.prol

				for.body.preheader.split.loopexit: ; preds = %for.body.prol
				%inc.prol.lcssa = phi i32 [ %inc.prol, %for.body.prol ]
				br label %for.body.preheader.split

				for.body.preheader.split: ; preds = %for.body.preheader.split.loopexit, %for.body.preheader
				%i.04.unr = phi i32 [ 0, %for.body.preheader ], [ %inc.prol.lcssa, %for.body.preheader.split.loopexit ]
				%2 = icmp ult i32 %0, 7
				br i1 %2, label %for.end.loopexit, label %for.body.preheader.split.split

				for.body.preheader.split.split: ; preds = %for.body.preheader.split
				br label %for.body

				for.body: ; preds = %for.body, %for.body.preheader.split.split
				%i.04 = phi i32 [ %i.04.unr, %for.body.preheader.split.split ], [ %inc.7, %for.body ]
				%3 = load double, double* @b, align 8
				%call = tail call double @pow(double %3, double 2.500000e-01) #2
				%4 = load double, double* @b, align 8
				%call.1 = tail call double @pow(double %4, double 2.500000e-01) #2
				%inc.7 = add nsw i32 %i.04, 8
				%exitcond.7 = icmp eq i32 %inc.7, %p1
				br i1 %exitcond.7, label %for.end.loopexit.unr-lcssa, label %for.body

				for.end.loopexit.unr-lcssa: ; preds = %for.body
				br label %for.end.loopexit

				for.end.loopexit: ; preds = %for.body.preheader.split, %for.end.loopexit.unr-lcssa
				br label %for.end

				for.end: ; preds = %for.end.loopexit, %entry
				%5 = load i32, i32* @a, align 4
				ret i32 %5
				}

				; Function Attrs: nounwind
				declare double @pow(double, double) #1

				attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { nounwind }

test/CodeGen/X86/ragreedy-hoist-spill.ll

	; RUN: llc < %s -mtriple=x86_64-apple-macosx -regalloc=greedy \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-apple-macosx -regalloc=greedy \| FileCheck %s

	; This testing case is reduced from 254.gap SyFgets funciton.			; This testing case is reduced from 254.gap SyFgets funciton.
	; We make sure a spill is not hoisted to a hotter outer loop.			; We make sure a spill is hoisted to a cold BB inside the hotter outer loop.

	%struct.TMP.1 = type { %struct.TMP.2, %struct.TMP.2, [1024 x i8] }			%struct.TMP.1 = type { %struct.TMP.2, %struct.TMP.2, [1024 x i8] }
	%struct.TMP.2 = type { i8, i32, i32, i16, i16, %struct.TMP.3, i32, i8, i32 (i8), i32 (i8, i8, i32), i64 (i8, i64, i32), i32 (i8, i8, i32), %struct.TMP.3, %struct.TMP.4*, i32, [3 x i8], [1 x i8], %struct.TMP.3, i32, i64 }			%struct.TMP.2 = type { i8, i32, i32, i16, i16, %struct.TMP.3, i32, i8, i32 (i8), i32 (i8, i8, i32), i64 (i8, i64, i32), i32 (i8, i8, i32), %struct.TMP.3, %struct.TMP.4*, i32, [3 x i8], [1 x i8], %struct.TMP.3, i32, i64 }
	%struct.TMP.4 = type opaque			%struct.TMP.4 = type opaque
	%struct.TMP.3 = type { i8*, i32 }			%struct.TMP.3 = type { i8*, i32 }

	@syBuf = external global [16 x %struct.TMP.1], align 16			@syBuf = external global [16 x %struct.TMP.1], align 16
	@syHistory = external global [8192 x i8], align 16			@syHistory = external global [8192 x i8], align 16
	▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	for.cond357:			for.cond357:
	br label %for.cond357			br label %for.cond357

	sw.bb474:			sw.bb474:
	%cmp476 = icmp eq i8 undef, 0			%cmp476 = icmp eq i8 undef, 0
	br i1 %cmp476, label %if.end517, label %do.body479.preheader			br i1 %cmp476, label %if.end517, label %do.body479.preheader

	do.body479.preheader:			do.body479.preheader:
				; CHECK: do.body479.preheader
				; spill is hoisted here. Although loop depth1 is even hotter than loop depth2, do.body479.preheader is cold.
				; CHECK: movq %r{{.*}}, {{[0-9]+}}(%rsp)
				; CHECK: land.rhs485
	%cmp4833314 = icmp eq i8 undef, 0			%cmp4833314 = icmp eq i8 undef, 0
	br i1 %cmp4833314, label %if.end517, label %land.rhs485			br i1 %cmp4833314, label %if.end517, label %land.rhs485

	land.rhs485:			land.rhs485:
	%incdec.ptr4803316 = phi i8* [ %incdec.ptr480, %do.body479.backedge.land.rhs485_crit_edge ], [ undef, %do.body479.preheader ]			%incdec.ptr4803316 = phi i8* [ %incdec.ptr480, %do.body479.backedge.land.rhs485_crit_edge ], [ undef, %do.body479.preheader ]
	%isascii.i.i27763151 = icmp sgt i8 undef, -1			%isascii.i.i27763151 = icmp sgt i8 undef, -1
	br i1 %isascii.i.i27763151, label %cond.true.i.i2780, label %cond.false.i.i2782			br i1 %isascii.i.i27763151, label %cond.true.i.i2780, label %cond.false.i.i2782

	cond.true.i.i2780:			cond.true.i.i2780:
	br i1 undef, label %land.lhs.true490, label %lor.rhs500			br i1 undef, label %land.lhs.true490, label %lor.rhs500

	cond.false.i.i2782:			cond.false.i.i2782:
	unreachable			unreachable

	land.lhs.true490:			land.lhs.true490:
	br i1 false, label %lor.rhs500, label %do.body479.backedge			br i1 false, label %lor.rhs500, label %do.body479.backedge

	lor.rhs500:			lor.rhs500:
	; CHECK: lor.rhs500			; CHECK: lor.rhs500
	; Make sure that we don't hoist the spill to outer loops.			; Make sure spill is hoisted to a cold preheader in outside loop.
	; CHECK: movq %r{{.*}}, {{[0-9]+}}(%rsp)			; CHECK-NOT: movq %r{{.*}}, {{[0-9]+}}(%rsp)
	; CHECK: callq {{.*}}maskrune			; CHECK: callq {{.*}}maskrune
	%call3.i.i2792 = call i32 @__maskrune(i32 undef, i64 256)			%call3.i.i2792 = call i32 @__maskrune(i32 undef, i64 256)
	br i1 undef, label %land.lhs.true504, label %do.body479.backedge			br i1 undef, label %land.lhs.true504, label %do.body479.backedge

	land.lhs.true504:			land.lhs.true504:
	br i1 undef, label %do.body479.backedge, label %if.end517			br i1 undef, label %do.body479.backedge, label %if.end517

	do.body479.backedge:			do.body479.backedge:
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

test/CodeGen/X86/vselect-minmax.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,882 Lines • ▼ Show 20 Lines	entry:
%cmp = icmp slt <8 x i64> %a, %b		%cmp = icmp slt <8 x i64> %a, %b
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test122(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test122(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test122:		; SSE2-LABEL: test122:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
		; SSE2-NEXT: movdqa %xmm11, %xmm8
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
		qcolombetUnsubmitted Not Done Reply Inline Actions Why is this happening? qcolombet: Why is this happening?
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test124(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test124(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test124:		; SSE2-LABEL: test124:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
; SSE2-NEXT: pand %xmm12, %xmm0		; SSE2-NEXT: pand %xmm12, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]
▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	entry:
%cmp = icmp ult <8 x i64> %a, %b		%cmp = icmp ult <8 x i64> %a, %b
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test126(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test126(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test126:		; SSE2-LABEL: test126:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
		; SSE2-NEXT: movdqa %xmm11, %xmm8
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b		%sel = select <8 x i1> %cmp, <8 x i64> %a, <8 x i64> %b
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test128(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test128(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test128:		; SSE2-LABEL: test128:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
; SSE2-NEXT: pand %xmm12, %xmm0		; SSE2-NEXT: pand %xmm12, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]
▲ Show 20 Lines • Show All 1,789 Lines • ▼ Show 20 Lines	entry:
%cmp = icmp slt <8 x i64> %a, %b		%cmp = icmp slt <8 x i64> %a, %b
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test154(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test154(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test154:		; SSE2-LABEL: test154:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
		; SSE2-NEXT: movdqa %xmm11, %xmm8
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
		qcolombetUnsubmitted Not Done Reply Inline Actions Ditto. qcolombet: Ditto.
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
▲ Show 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test156(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test156(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test156:		; SSE2-LABEL: test156:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,0,2147483648,0]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
; SSE2-NEXT: pand %xmm12, %xmm0		; SSE2-NEXT: pand %xmm12, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]
▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	entry:
%cmp = icmp ult <8 x i64> %a, %b		%cmp = icmp ult <8 x i64> %a, %b
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test158(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test158(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test158:		; SSE2-LABEL: test158:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm8, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
		; SSE2-NEXT: movdqa %xmm11, %xmm8
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm7, %xmm0		; SSE2-NEXT: movdqa %xmm7, %xmm0
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
▲ Show 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	entry:
%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a		%sel = select <8 x i1> %cmp, <8 x i64> %b, <8 x i64> %a
ret <8 x i64> %sel		ret <8 x i64> %sel
}		}

define <8 x i64> @test160(<8 x i64> %a, <8 x i64> %b) {		define <8 x i64> @test160(<8 x i64> %a, <8 x i64> %b) {
; SSE2-LABEL: test160:		; SSE2-LABEL: test160:
; SSE2: # BB#0: # %entry		; SSE2: # BB#0: # %entry
; SSE2-NEXT: movdqa %xmm7, %xmm11		; SSE2-NEXT: movdqa %xmm7, %xmm11
; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: movdqa %xmm3, %xmm7		; SSE2-NEXT: movdqa %xmm3, %xmm7
; SSE2-NEXT: movdqa %xmm2, %xmm3		; SSE2-NEXT: movdqa %xmm2, %xmm3
; SSE2-NEXT: movdqa %xmm1, %xmm2		; SSE2-NEXT: movdqa %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm0, %xmm9		; SSE2-NEXT: movdqa %xmm0, %xmm9
; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]		; SSE2-NEXT: movdqa {{.*#+}} xmm10 = [2147483648,2147483648,2147483648,2147483648]
; SSE2-NEXT: movdqa %xmm7, %xmm8		; SSE2-NEXT: movdqa %xmm7, %xmm8
; SSE2-NEXT: pxor %xmm10, %xmm8		; SSE2-NEXT: pxor %xmm10, %xmm8
; SSE2-NEXT: movdqa %xmm11, %xmm0		; SSE2-NEXT: movdqa %xmm11, %xmm0
		; SSE2-NEXT: movdqa %xmm11, -{{[0-9]+}}(%rsp) # 16-byte Spill
; SSE2-NEXT: pxor %xmm10, %xmm0		; SSE2-NEXT: pxor %xmm10, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm11		; SSE2-NEXT: movdqa %xmm0, %xmm11
; SSE2-NEXT: pcmpgtd %xmm8, %xmm11		; SSE2-NEXT: pcmpgtd %xmm8, %xmm11
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[0,0,2,2]
; SSE2-NEXT: pcmpeqd %xmm8, %xmm0		; SSE2-NEXT: pcmpeqd %xmm8, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
; SSE2-NEXT: pand %xmm12, %xmm0		; SSE2-NEXT: pand %xmm12, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm12 = xmm11[1,1,3,3]
▲ Show 20 Lines • Show All 2,533 Lines • Show Last 20 Lines

test/CodeGen/X86/win-catchpad.ll

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	; X86: jmp [[contbb]]			; X86: jmp [[contbb]]

	; X86: "?catch$[[catch1bb:[0-9]+]]@?0?try_catch_catch@4HA":			; X86: "?catch$[[catch1bb:[0-9]+]]@?0?try_catch_catch@4HA":
	; X86: LBB0_[[catch1bb]]: # %catch.dispatch{{$}}			; X86: LBB0_[[catch1bb]]: # %catch.dispatch{{$}}
	; X86: pushl %ebp			; X86: pushl %ebp
	; X86: subl $8, %esp			; X86: subl $8, %esp
	; X86: addl $12, %ebp			; X86: addl $12, %ebp
	; X86: movl %esp, -[[sp_offset]](%ebp)			; X86: movl %esp, -[[sp_offset]](%ebp)
	; X86: leal -[[local_offs]](%ebp), %[[addr_reg:[a-z]+]]
	; X86: movl -32(%ebp), %[[e_reg:[a-z]+]]			; X86: movl -32(%ebp), %[[e_reg:[a-z]+]]
	; X86: movl $1, -{{[0-9]+}}(%ebp)			; X86: movl $1, -{{[0-9]+}}(%ebp)
				; X86: leal -[[local_offs]](%ebp), %[[addr_reg:[a-z]+]]
	; X86-DAG: movl %[[addr_reg]], 4(%esp)			; X86-DAG: movl %[[addr_reg]], 4(%esp)
	; X86-DAG: movl %[[e_reg]], (%esp)			; X86-DAG: movl %[[e_reg]], (%esp)
	; X86: calll _f			; X86: calll _f
	; X86-NEXT: movl $[[restorebb1]], %eax			; X86-NEXT: movl $[[restorebb1]], %eax
	; X86-NEXT: addl $8, %esp			; X86-NEXT: addl $8, %esp
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
	; X86-NEXT: retl			; X86-NEXT: retl

	; X86: "?catch$[[catch2bb:[0-9]+]]@?0?try_catch_catch@4HA":			; X86: "?catch$[[catch2bb:[0-9]+]]@?0?try_catch_catch@4HA":
	; X86: LBB0_[[catch2bb]]: # %catch.dispatch.2{{$}}			; X86: LBB0_[[catch2bb]]: # %catch.dispatch.2{{$}}
	; X86: pushl %ebp			; X86: pushl %ebp
	; X86: subl $8, %esp			; X86: subl $8, %esp
	; X86: addl $12, %ebp			; X86: addl $12, %ebp
	; X86: movl %esp, -[[sp_offset]](%ebp)			; X86: movl %esp, -[[sp_offset]](%ebp)
	; X86: leal -[[local_offs]](%ebp), %[[addr_reg:[a-z]+]]
	; X86: movl $1, -{{[0-9]+}}(%ebp)			; X86: movl $1, -{{[0-9]+}}(%ebp)
				; X86: leal -[[local_offs]](%ebp), %[[addr_reg:[a-z]+]]
	; X86-DAG: movl %[[addr_reg]], 4(%esp)			; X86-DAG: movl %[[addr_reg]], 4(%esp)
	; X86-DAG: movl $3, (%esp)			; X86-DAG: movl $3, (%esp)
	; X86: calll _f			; X86: calll _f
	; X86-NEXT: movl $[[restorebb2]], %eax			; X86-NEXT: movl $[[restorebb2]], %eax
	; X86-NEXT: addl $8, %esp			; X86-NEXT: addl $8, %esp
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
	; X86-NEXT: retl			; X86-NEXT: retl

	▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Greedy regalloc] Replace analyzeSiblingValues with something newClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 46011

lib/CodeGen/InlineSpiller.cpp

lib/CodeGen/RegAllocGreedy.cpp

lib/CodeGen/Spiller.h

test/CodeGen/AArch64/aarch64-deferred-spilling.ll

test/CodeGen/ARM/subreg-remat.ll

test/CodeGen/SPARC/spill.ll

test/CodeGen/X86/avx512-bugfix-25270.ll

test/CodeGen/X86/fold-push.ll

test/CodeGen/X86/hoist-spill.ll

test/CodeGen/X86/new-remat.ll

test/CodeGen/X86/ragreedy-hoist-spill.ll

test/CodeGen/X86/vselect-minmax.ll

test/CodeGen/X86/win-catchpad.ll

[Greedy regalloc] Replace analyzeSiblingValues with something new
ClosedPublic