This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/
-
Scalar.h
-
lib/Transforms/
-
Transforms/
-
IPO/
-
PassManagerBuilder.cpp
-
Scalar/
-
CMakeLists.txt
4
LiveRangeShrink.cpp
-
Scalar.cpp
-
test/Transforms/Reassociate/
-
Transforms/
-
Reassociate/
-
hoist.ll

Differential D32563

Add LiveRangeShrink pass to shrink live range within BB.
ClosedPublic

Authored by danielcdh on Apr 26 2017, 2:46 PM.

Download Raw Diff

Details

Reviewers

davidxl
wmi
hfinkel
MatzeB
andreadb

Commits

rG6b737ddce760: Add LiveRangeShrink pass to shrink live range within BB.
rG65dd23e273c5: Add LiveRangeShrink pass to shrink live range within BB.
rL304371: Add LiveRangeShrink pass to shrink live range within BB.
rL302938: Add LiveRangeShrink pass to shrink live range within BB.

Summary

LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB.

Diff Detail

Build Status

Buildable 5988
Build 5988: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Update the patch to have a separate pass to handle live range shrinking within BB.

Herald added subscribers: mgorny, mehdi_amini, MatzeB. · View Herald TranscriptApr 28 2017, 10:13 AM

danielcdh retitled this revision from Improve code placement algorithm in Reassociate pass. to Add LiveRangeShrink pass to shrink live range within BB..Apr 28 2017, 10:17 AM

danielcdh edited the summary of this revision. (Show Details)

A couple of comments:

Moving it out of the reassociation pass is probably the right direction to go.
having this pass allows future enhancement
This actually allows you to do more aggressive expression operand re-ordering (to enable more shrinking) -- but to void offsetting reassocation's effect (to enable cse, constant folding, loop invariant code motion etc), the shrink pass should be pushed very late in the pipeline.

move the pass till the end of the optimizations.

andreadb added a subscriber: andreadb.Apr 28 2017, 11:29 AM

RKSimon added a subscriber: RKSimon.Apr 29 2017, 6:23 AM

spatel added a subscriber: spatel.Apr 29 2017, 1:26 PM

d = foo();
c = foo();
t = d + c;
b = foo();
t = t + b;
a = foo();
t = t + a;

The MachineCombiner pass will try to reverse this transform to increase throughput. See comments above TargetInstrInfo::getMachineCombinerPatterns(). If it's not doing it currently, it's probably only because some intervening instructions are causing the pattern to not match.

Why would a transform that is concerned with register pressure and live ranges not be better accomplished at the machine instruction level?

In all the new test cases, the problem is indirectly caused by the presence of function calls in the reassociated expression.
Your patch is essentially trying to solve a register pressure problem at LLVM IR level by pre-scheduling instructions. However, as Sanjay pointed out, this design can be a bit fragile since later passes may undo your canonicalization.

My understanding is that the pre-RA-scheduler pass should be responsible for finding a good schedule that minimizes register pressure in preparation for regalloc.
Do you know why pre-RA-scheduler is unable to find a good schedule for your particular case?

If I pass ‘test2()’ to llc (with flag -pre-RA-sched=list-ilp -mtriple=x86_64-unknown-unknown) then I get the expected schedule of call/add instructions.

test2:
        pushq   %r14
        pushq   %rbx
        pushq   %rax
        callq   foo
        movl    %eax, %r14d
        callq   foo
        movl    %eax, %ebx
        addq    %r14, %rbx
        callq   foo
        movl    %eax, %eax
        addq    %rbx, %rax
        addq    $8, %rsp
        popq    %rbx
        popq    %r14
        retq

Yes, adding -pre-RA-sched=list-ilp can produce the expected result without this patch for my testcase. I'm not familiar with -pre-RA-sched, any quick insight why this option is not on by default?

Thanks,
Dehao

In D32563#742146, @andreadb wrote:
In all the new test cases, the problem is indirectly caused by the presence of function calls in the reassociated expression.
Your patch is essentially trying to solve a register pressure problem at LLVM IR level by pre-scheduling instructions. However, as Sanjay pointed out, this design can be a bit fragile since later passes may undo your canonicalization.

My understanding is that the pre-RA-scheduler pass should be responsible for finding a good schedule that minimizes register pressure in preparation for regalloc.
Do you know why pre-RA-scheduler is unable to find a good schedule for your particular case?

If I pass ‘test2()’ to llc (with flag -pre-RA-sched=list-ilp -mtriple=x86_64-unknown-unknown) then I get the expected schedule of call/add instructions.
test2:
        pushq   %r14
        pushq   %rbx
        pushq   %rax
        callq   foo
        movl    %eax, %r14d
        callq   foo
        movl    %eax, %ebx
        addq    %r14, %rbx
        callq   foo
        movl    %eax, %eax
        addq    %rbx, %rax
        addq    $8, %rsp
        popq    %rbx
        popq    %r14
        retq

+ Andy, for the history on pre-RA-sched and misched.

lib/Transforms/Scalar/LiveRangeShrink.cpp
54–56	Maybe use `BasicBlock::getFirstInsertionPt` instead. Set `Insert` to be `BasicBlock::getFirstInsertionPt` if `Insert` is before `BasicBlock::getFirstInsertionPt`.
58–60	Can we remove the update loop of InstOrderMap? It leads to O(N^2) complexity. One way is just to let InstOrderMap[I] = InstOrderMap[Insert] and keep the rest the same.
66	Can we use a function pass? which will be easier for future extension.

In D32563#742848, @wmi wrote:

+ Andy, for the history on pre-RA-sched and misched.

Selection-DAG scheduling (-pre-ra-sched=XXX) should only be concerned about bringing the SelectionDAG nodes into a linear order, they are not supposed to perform register pressure minimiziation, latency modeling, etc.
The main scheduling is left to the MachineScheduler nowadays (which runs between copy coalescing and register allocation).

Is the proposed pass necessary because the machine scheduler doesn't schedule instructions accross calls? Then https://reviews.llvm.org/D15667 may be related...

wmi added inline comments.May 1 2017, 4:58 PM

lib/Transforms/Scalar/LiveRangeShrink.cpp
58–60	Please ignore this for now. Sorry. It is an incomplete comment sent out accidently.

Why are the adds "sunk down" in the first place? Is this reassociation at work?

Machine global code motion, using MachineTraceMetrics and RegisterPressure would be awesome. Nobody really followed through with implementing it so unfortunately there's no switch to turn on to try it.

In D32563#743777, @atrick wrote:

Why are the adds "sunk down" in the first place? Is this reassociation at work?

Yes, reassociation for an expression tree inserts the intermediate result directly before the final root. That is how the adds sunk down.

Machine global code motion, using MachineTraceMetrics and RegisterPressure would be awesome. Nobody really followed through with implementing it so unfortunately there's no switch to turn on to try it.

The machine global code motion you mention here is misched or a separate pass? We found existing machine scheduling can do some live range shrinking in some cases, but we don't know whether depending on existing register pressure aware scheduing and improving it is a good way to solve the live range shrinking problem generally. Could you shed some light on how existing misched choose between maximizing ILP and minimizing register pressure?

Thanks,
Wei.

Ideally there should be a separate pass that runs on the SSA machine code (before register coalescing) to minimize register pressure and hide latency for chains of loads or FP ops. It should work across calls and loop boundaries. You could even do a kind of poor-man's cyclic scheduling this way.

MISched works at the lower level of instructions groups and CPU pipeline hazards. It would be nice if MISched worked at the level of extended basic blocks (it would be easy to implement and has been done out of tree). I don't think it makes as much sense for it to work across call sites though. That is not hard to implement but seems it will generate large DAGS and will be bad compile-time tradeoff.

MISched is not a scheduling algorithm, it's a scheduling framework. The generic scheduler is a pile of heuristics that exercise most of the functionality and seems to be working ok for several popular targets. The strategy that it takes is:

Make a single scheduling pass handling all heuristics at once. Don't reorder the instructions at all unless the heuristics identify a register pressure or latency problem.

Try to determine, before scheduling a block, whether register pressure or latency is likely to become a problem. This avoids the scheduler backing itself into a corner (we don't want the scheduler to backtrack).

You'll notice that this is very conservative with respect to managing compile time and preserving the decisions made by earlier passes.

You could follow that basic strategy and simply adjust the priority of heuristics for your target. You can add better up-front analysis to detect code patterns for each block before prioritizing heuristics. Or you could implement a completely different strategy. For example, schedule for register pressure first, then reschedule for ILP.

I probably won't be able to help you must more than that. Matthias has taken over maintenance of MISched. I think it would help if you give more background on your situation (sorry if I haven't paid attention or have forgotten). Is this PowerPC? Inorder/out-of-order?

In D32563#743962, @atrick wrote:

Ideally there should be a separate pass that runs on the SSA machine code (before register coalescing) to minimize register pressure and hide latency for chains of loads or FP ops. It should work across calls and loop boundaries. You could even do a kind of poor-man's cyclic scheduling this way.

MISched works at the lower level of instructions groups and CPU pipeline hazards. It would be nice if MISched worked at the level of extended basic blocks (it would be easy to implement and has been done out of tree). I don't think it makes as much sense for it to work across call sites though. That is not hard to implement but seems it will generate large DAGS and will be bad compile-time tradeoff.

MISched is not a scheduling algorithm, it's a scheduling framework. The generic scheduler is a pile of heuristics that exercise most of the functionality and seems to be working ok for several popular targets. The strategy that it takes is:

Make a single scheduling pass handling all heuristics at once. Don't reorder the instructions at all unless the heuristics identify a register pressure or latency problem.

Try to determine, before scheduling a block, whether register pressure or latency is likely to become a problem. This avoids the scheduler backing itself into a corner (we don't want the scheduler to backtrack).

You'll notice that this is very conservative with respect to managing compile time and preserving the decisions made by earlier passes.

You could follow that basic strategy and simply adjust the priority of heuristics for your target. You can add better up-front analysis to detect code patterns for each block before prioritizing heuristics. Or you could implement a completely different strategy. For example, schedule for register pressure first, then reschedule for ILP.

I probably won't be able to help you must more than that. Matthias has taken over maintenance of MISched. I think it would help if you give more background on your situation (sorry if I haven't paid attention or have forgotten). Is this PowerPC? Inorder/out-of-order?

Thank you so much for introducing the design philosophy of MISched. It really helps.

About the background on our situation, the problem we are trying to solve is that reassociation pass introduces a lot of live range interference unnecessarily because of the way how it inserts the intermediate results. For the testcase exposing the problem, it is quite special in that the instructions we care about are all in the same BB. Most of the defs have only one use and most of the uses have only one def. So Dehao proposed a limited live range shrinking pass in llvm IR phase to solve it. We may extend the live range shrinking in the future because we have seen other cases before that would be benefited from more general live range shrinking.

Sanjay and Andrea made us notice that scheduling may already achieve the same result as the shrinking pass we proposed, at least for the motivational testcase, so we looked into MISched. Dehao has found that even we enabled existing generic scheduling to cross call, the more complex DAG can lead to a better but still unoptimal solution, and the unoptimal issue is not easy to solve without a global picture of the whole basicblock. This makes us incline to implement it in a separate pass.

The problem was found on x86, which is the target we care the most right now, but I think the problem is platform independent.

I think the algorithm needs to be explained. It seems reasonable to argue that single def/use live ranges should be kept short. But is there any reason at all to run this before other IR passes? It seems natural to run this after machine LICM and Sinking.

The time to shrink live ranges is after all the global machine code
motion is done if the goal is handle as many cases as possible.

Doing code motion in machine code is tricky because you have to worry
about physical registers, flags and such. But those are exactly the
things you need to worry about to avoid generating terrible code. For
example, you don't want to hoist something over a call that defines a
physreg or is used by a physreg copy!

That said, if reassociate is making silly decisions about where to
place operations, I wouldn't see a problem fixing that directly.

In D32563#744175, @atrick wrote:

The time to shrink live ranges is after all the global machine code
motion is done if the goal is handle as many cases as possible.

Doing code motion in machine code is tricky because you have to worry
about physical registers, flags and such. But those are exactly the
things you need to worry about to avoid generating terrible code. For
example, you don't want to hoist something over a call that defines a
physreg or is used by a physreg copy!

That said, if reassociate is making silly decisions about where to
place operations, I wouldn't see a problem fixing that directly.

We initially tried to fix it in reassociate, but decided to have it fixed separate because we do not want reassociate to become too complicated. And also this should also be able to fix live-range issues introduced by user. I'll try to move this pass to Machine level.

change to do live range shrinking at machine level.

Herald added subscribers: krytarowski, javed.absar, qcolombet, jholewinski. · View Herald TranscriptMay 4 2017, 4:21 PM

Harbormaster completed remote builds in B6168: Diff 97891.May 4 2017, 4:21 PM

I've updated the patch to implement this in Machine Level. The down side is that it needs to update a lot of unittests especially those autogenerated llc tests.

At this point, the patch is well tested with internal google workload, and seems to be valid. But I may ignore other subtle cases that's not covered by my testing. I'd appreciate if someone can help point out.

Thanks,
Dehao

gberry added a subscriber: gberry.May 5 2017, 8:27 AM

I had a quick look at the code, and I noticed that you never check for the presence of DBG_VALUE instructions in a basic block.
I think that your pass would not work as expected with debug info.

More in general:

When you check the number of uses of a register, you should skip debug values.
When you move an instruction, you should also move its associated debug values.

A lot of tests have changed as a result of this patch. However, I think there is a value in adding your original test cases involving function calls.
Could you please add a new MIR test with code from (at least one of) your original small test cases?

lib/CodeGen/LiveRangeShrink.cpp
112 ↗	(On Diff #97891)	You should probably skip debug values.
115 ↗	(On Diff #97891)	s/instrctuion/instruction
126 ↗	(On Diff #97891)	s/coalesed/coalesced
142 ↗	(On Diff #97891)	We want to skip uses that are debug instructions.
147 ↗	(On Diff #97891)	Same. We don't want to count debug uses.

update

Harbormaster completed remote builds in B6187: Diff 97979.May 5 2017, 10:05 AM

In D32563#747164, @andreadb wrote:

I had a quick look at the code, and I noticed that you never check for the presence of DBG_VALUE instructions in a basic block.
I think that your pass would not work as expected with debug info.

More in general:

When you check the number of uses of a register, you should skip debug values.

When you move an instruction, you should also move its associated debug values.

A lot of tests have changed as a result of this patch. However, I think there is a value in adding your original test cases involving function calls.
Could you please add a new MIR test with code from (at least one of) your original small test cases?

Done

When you move an instruction, you should also move its associated debug values.

See for example how this is done by MachineSink in method MachineSinking::SinkInstruction() (see file MachineSink.cpp).
For every instruction INSN that is a candidate for sinking, MachineSink collects all the DBG_VALUE instructions which are associated to INSN.
When INSN is moved to a different location, all the associated debug values (if any) are moved too.
I think you should do something similar.

About your new test case. I think a MIR test would be even better. That said, I don't have a strong opinion (I am okay with your new test).
In case, could you please regenerate the CHECK lines using `update_llc_test_checks.py? The test should be small, so I don't expect too many CHECK lines inserted by the script.

gberry added inline comments.May 5 2017, 11:01 AM

lib/CodeGen/LiveRangeShrink.cpp
112 ↗	(On Diff #97979)	Why are you skipping compares?

In D32563#747393, @andreadb wrote:

When you move an instruction, you should also move its associated debug values.

See for example how this is done by MachineSink in method MachineSinking::SinkInstruction() (see file MachineSink.cpp).
For every instruction INSN that is a candidate for sinking, MachineSink collects all the DBG_VALUE instructions which are associated to INSN.
When INSN is moved to a different location, all the associated debug values (if any) are moved too.
I think you should do something similar.

Sorry, did not address this in the last patch. Done now. Note that I do not want to touch unnecessary files in this change. So I simply copied the code from MachineSink.cpp, and will refactor it to a utility function in MachineInstr.h/.cpp in a follow-up patch.

About your new test case. I think a MIR test would be even better. That said, I don't have a strong opinion (I am okay with your new test).
In case, could you please regenerate the CHECK lines using `update_llc_test_checks.py? The test should be small, so I don't expect too many CHECK lines inserted by the script.

Done, but I'm not sure if that's the right thing to do. update_llc_test_checks makes the test too fragile: any subtle change of the codegen would break it. And it hides what the author really want to test and follow-up patch writers may simply run the script and regenerate the test (without understanding what it's testing, or they need to spend much time on what the original purpose is). BTW, what's MIR test? could you point me to one example? Thanks.

lib/CodeGen/LiveRangeShrink.cpp
112 ↗	(On Diff #97979)	In many architectures, cmp should stay together with jmp instruction for performance purpose. We do not want to break this in this pass.

update

Harbormaster completed remote builds in B6190: Diff 97991.May 5 2017, 11:26 AM

In D32563#747423, @danielcdh wrote:

In D32563#747393, @andreadb wrote:

Sorry, did not address this in the last patch. Done now. Note that I do not want to touch unnecessary files in this change. So I simply copied the code from MachineSink.cpp, and will refactor it to a utility function in MachineInstr.h/.cpp in a follow-up patch.

No problem. Thanks!

About your new test case. I think a MIR test would be even better. That said, I don't have a strong opinion (I am okay with your new test).
In case, could you please regenerate the CHECK lines using `update_llc_test_checks.py? The test should be small, so I don't expect too many CHECK lines inserted by the script.

Done, but I'm not sure if that's the right thing to do. update_llc_test_checks makes the test too fragile: any subtle change of the codegen would break it. And it hides what the author really want to test and follow-up patch writers may simply run the script and regenerate the test (without understanding what it's testing, or they need to spend much time on what the original purpose is). BTW, what's MIR test? could you point me to one example? Thanks.

Right, I think it your test is okay.

About MIR tests, you can have a look at this page: http://llvm.org/docs/MIRLangRef.html
In particular, section "MIR Testing Guide" (and the next two sub-sections) are quite useful imho.

For example, you could do something like:

llc my-new-test.ll -stop-after=machine-sink -o test.mir

Then, test.mir with a RUN line which only tests your new pass
For example:

llc -o - %s -mtriple=x86_64-- -run-pass=lrshrink | FileCheck %s

The advantage is that you test your new pass in isolation.
But, as I wrote, I think your test is okay.

gberry added inline comments.May 5 2017, 12:02 PM

lib/CodeGen/LiveRangeShrink.cpp
153 ↗	(On Diff #97991)	Typo: 'physicall' -> 'physical'
112 ↗	(On Diff #97979)	For targets that don't have allocatable flag regsiters, your later check for physical regs should prevent moving these compares (and any other non-compare flag setting instructions). For targets that keep the results of compares in general purpose registers, this seems like a missed optimization. Also, if it is important that the compare be just before the branch, the scheduler should be taking care of that. If you don't agree that this check should be removed can you put a comment explaining why this check for compares is being done?

In D32563#747436, @andreadb wrote:

In D32563#747423, @danielcdh wrote:

In D32563#747393, @andreadb wrote:

Sorry, did not address this in the last patch. Done now. Note that I do not want to touch unnecessary files in this change. So I simply copied the code from MachineSink.cpp, and will refactor it to a utility function in MachineInstr.h/.cpp in a follow-up patch.

No problem. Thanks!

About your new test case. I think a MIR test would be even better. That said, I don't have a strong opinion (I am okay with your new test).
In case, could you please regenerate the CHECK lines using `update_llc_test_checks.py? The test should be small, so I don't expect too many CHECK lines inserted by the script.

Done, but I'm not sure if that's the right thing to do. update_llc_test_checks makes the test too fragile: any subtle change of the codegen would break it. And it hides what the author really want to test and follow-up patch writers may simply run the script and regenerate the test (without understanding what it's testing, or they need to spend much time on what the original purpose is). BTW, what's MIR test? could you point me to one example? Thanks.

Right, I think it your test is okay.

About MIR tests, you can have a look at this page: http://llvm.org/docs/MIRLangRef.html
In particular, section "MIR Testing Guide" (and the next two sub-sections) are quite useful imho.

For example, you could do something like:

llc my-new-test.ll -stop-after=machine-sink -o test.mir

Then, test.mir with a RUN line which only tests your new pass
For example:

llc -o - %s -mtriple=x86_64-- -run-pass=lrshrink | FileCheck %s

The advantage is that you test your new pass in isolation.
But, as I wrote, I think your test is okay.

Thanks for the explanation. I've reverted the test to its original form. And after checking the mir test output, I think it's better to keep it simple in its original phone, and make sure that later scheduler does not undo the change in this patch.

lib/CodeGen/LiveRangeShrink.cpp
112 ↗	(On Diff #97979)	That makes sense. I've removed the "compare" check.

update

Can you collect spill code stats and benchmark scores from the test suite?
Can you get someone to run benchmarks on arm64 as well?

lib/CodeGen/LiveRangeShrink.cpp
162 ↗	(On Diff #98035)	Why do you care how many instructions are using the register being defined here?

The reordering code here is reminding me of the reordering code in ImplicitNullChecks. Is there an opportunity to share code here?

+CC @skatkov

lib/CodeGen/LiveRangeShrink.cpp
55 ↗	(On Diff #98035)	s/Othwise/Otherwise/
85 ↗	(On Diff #98035)	Range for?

sanjoy added a subscriber: sanjoy.May 7 2017, 1:02 PM

In D32563#747859, @atrick wrote:

Can you collect spill code stats and benchmark scores from the test suite?
Can you get someone to run benchmarks on arm64 as well?

The patch did not change test-suite benchmark scores.

The patch reduced the total static spill number for the entire test suite from 110283 to 110232

Unfortunately we do not have arm64 machine to run the performance test. But based on its impact on x86, llvm testsuite does not seem to suffer from register pressure, thus it should have minimal perf impact.

lib/CodeGen/LiveRangeShrink.cpp
162 ↗	(On Diff #98035)	Right, I should not. Removed the check.

rebase and update

Herald added a subscriber: jyknight. · View Herald TranscriptMay 8 2017, 6:09 PM

danielcdh marked 2 inline comments as done.May 8 2017, 6:10 PM

andreadb added inline comments.May 9 2017, 5:55 AM

lib/CodeGen/InlineSpiller.cpp
43–50 ↗	(On Diff #98240)	I think that you have uploaded a wrong patch. This code shouldn't be there :-).
1031–1032 ↗	(On Diff #98240)	Same.

update

danielcdh marked 2 inline comments as done.May 9 2017, 8:15 AM

danielcdh added inline comments.

lib/CodeGen/InlineSpiller.cpp
43–50 ↗	(On Diff #98240)	Oops, forgot to revert it... Thanks for catching this :)

MatzeB requested changes to this revision.May 9 2017, 10:58 AM

MatzeB added inline comments.

lib/CodeGen/LiveRangeShrink.cpp
10–12 ↗	(On Diff #98292)	Use `/// \file`, see http://llvm.org/docs/CodingStandards.html#file-headers It would also be a good idea to add a sentence describing the rules/heuristic that decide what instructions get moved.
34 ↗	(On Diff #98292)	This seems like a bold statement for a pass that moves instructions around. I would only expect to see `AU.setPreservesCFG()` here.
52 ↗	(On Diff #98292)	We tend to use doxygen `///` to document functions.
57–58 ↗	(On Diff #98292)	Use references for things that cannot be nullptr. That also saves you `assert(New != nullptr)`.
79 ↗	(On Diff #98292)	You probably want `MachineBasicBlock::iterator` here and in several other places as you do not plan to break bundles apart.
93–94 ↗	(On Diff #98292)	Then why not do so in this patch.
101 ↗	(On Diff #98292)	I'm surprised this compiles. In each case you should use `MI.getIterator()` for clarity. Could also use `DI = std::next(...)`
102–103 ↗	(On Diff #98292)	how about `for(MachineInstr &DMI : make_range(DI, DE)) { ... }`
111 ↗	(On Diff #98292)	`// end anonymous namespace`
114–115 ↗	(On Diff #98292)	This seems unnecessary the for loop below will abort shortly enough if the function is empty. Completely empty functions are also extremely rare so that this shortcut probably takes more time than it saves.
133–134 ↗	(On Diff #98292)	Use `MachineBasicBlock::iterator`
135 ↗	(On Diff #98292)	Use a reference for things that cannot be `nullptr`.
142 ↗	(On Diff #98292)	No need for `auto` here just write `MachineOperand`
156–157 ↗	(On Diff #98292)	You could move this check into the `!isSafeToMove()` path.
166 ↗	(On Diff #98292)	`unsigned` for counters.
168 ↗	(On Diff #98292)	Avoid `auto`.
173–174 ↗	(On Diff #98292)	Maybe assign MO.getReg() to a variable it is used 5 times here.

MatzeB added inline comments.May 9 2017, 10:58 AM

lib/CodeGen/LiveRangeShrink.cpp
50 ↗	(On Diff #98292)	You never seem to be using negative numbers, consider using `unsigned` instead of `int`.
64 ↗	(On Diff #98292)	`M[Old]`?
65 ↗	(On Diff #98292)	`M[New]` or better avoid the double lookup with `M.count(New)` above and use an M.find(New) for both.
84 ↗	(On Diff #98292)	Why not pass a `MachineBasicBlock &MBB` as parameter to the function instead of the iterator and doing the `Start->getParent()->instr_end()` dance? You could have just used `for (MachineInstr &I : MBB) {}` in that case.
124 ↗	(On Diff #98292)	I would have expected this outside the loop so it can be reused (considering that you also went through the trouble of `clear()`ing it inside BuildInstOrderMap.
125 ↗	(On Diff #98292)	Collecting the SawStore value is unnecessary. The only user appears to be `MachineInstr::isSafeToMove()` below which writes(!) the variable.
150 ↗	(On Diff #98292)	Barrier.
211–215 ↗	(On Diff #98292)	There's a variant of splice that can move a sequence of instructions which should work nicely here. It is also not clear to me if moving the debug instructions is legal here. They are not considered when checking how early an instruction can move so you are possibly moving the debug instruction above a point where the value its operand is defined.

This revision now requires changes to proceed.May 9 2017, 10:58 AM

hiraditya added a subscriber: hiraditya.May 9 2017, 2:21 PM

Thanks for the review.

lib/CodeGen/LiveRangeShrink.cpp
64 ↗	(On Diff #98292)	M is const map, thus cannot use M[Old].
84 ↗	(On Diff #98292)	Because Start may not be MBB.begin(), but could also be instruction that has side effect. The reason I used Start->getParent()->instr_end() instead of passing in MBB and use MBB.instr_end() is because I want to pass one less parameter to make the function clearer.
93–94 ↗	(On Diff #98292)	Code updated to inline the logic into caller to make the logic clearer.
125 ↗	(On Diff #98292)	SawStore is both input and output of MachineInstr::isSafeToMove().
211–215 ↗	(On Diff #98292)	Done For the Debug Value, it should be legal as the same logic is used by MachineLICM.

update

Harbormaster completed remote builds in B6298: Diff 98351.May 9 2017, 2:40 PM

skatkov added inline comments.May 9 2017, 10:27 PM

lib/CodeGen/LiveRangeShrink.cpp
135 ↗	(On Diff #98351)	Why don't you refresh SawStore here?

andreadb added inline comments.May 10 2017, 12:21 PM

lib/CodeGen/LiveRangeShrink.cpp
94–95 ↗	(On Diff #98351)	You should early exit if this function must be skipped. if (skipFunction(*MF.getFunction()) return false; We want to be able to 'opt-bisect' this pass. Also, this pass shouldn't be run on optnone functions.
178 ↗	(On Diff #98351)	I think that you should also skip debug vaue instructions. The loop at line 192 works under the assumption that a MI's DBG_VALUE always follows directly after it. This is also suggested by your comment at line 189. If you don't skip debug values, then you potentially break the above-mentioned assumption.

update

Harbormaster completed remote builds in B6325: Diff 98529.May 10 2017, 1:56 PM

danielcdh added inline comments.May 10 2017, 1:56 PM

lib/CodeGen/LiveRangeShrink.cpp
94–95 ↗	(On Diff #98351)	skipFunction added. Thanks for the comment. This will not be run in optnone as in: if (getOptLevel() != CodeGenOpt::None) addPass(&LiveRangeShrinkID);
135 ↗	(On Diff #98351)	Not sure if I understand the comment. SawStore is updated in MI.isSafeToMove. Do you mean that I can remove the previous "SawStore = BuildInstOrderMap", and set SawStore's initial value to "false"?
178 ↗	(On Diff #98351)	Done. Great catch!

skatkov added inline comments.May 10 2017, 8:11 PM

lib/CodeGen/LiveRangeShrink.cpp
135 ↗	(On Diff #98351)	My understanding is that if you meet that MI has side effect and it is a barrier you will not move later instruction through this one. SawStore is also a kind of barrier and you cannot move instruction which intersects with the store instruction. After you set the new barrier you can reset SawStore. Say if you saw store before new barrier and there is no stores after the barrier you are free to move instruction which do not intersect with stores. However if SawStore has been set to true once I do not see the place where it can be to reset to false. Do I miss anything? However the overall logic around SawStore is not clear to me. SawStore is set in BuildInstOrderMap. So it detects where there is a store in the tail. It would prevent us re-ordering instruction (we move instruction up always) even if store is located later. Is it expected?

Currently, we set 'SawStore' if there is at least one 'mayStore' instruction in the scheduling region.

However, we could provide a much more accurate information about where 'mayStore' instructions are in the code.
For example, 'BuildInstOrderMap' could populate a (ordered) map '<order_number, MI*>', where MI is a 'mayStore' instruction, and 'order_map' is the instruction number in the sequence.

Later on, you could query that map to check if it is safe to move a load from loc_B to loc_A. If the map contains a store with an order_number in range [loc_A, loc_B], then you know that it is not safe to move it.

This is just an idea for a future improvement.

lib/CodeGen/LiveRangeShrink.cpp
81–85 ↗	(On Diff #98529)	We can early exit from the loop in BuildInstOrderMap as soon as we encounter an instruction with unmodeled side effects. An instruction with unmodeled side effects is essentially a scheduling barrier. We don't need to keep numbering instructions after a scheduling barrier because the algorithm would force a recomputation of OrderMap (see line 138).
192–200 ↗	(On Diff #98529)	It would be nice to have a STATISTIC that counts the number of instructions reordered by this pass.
135 ↗	(On Diff #98351)	You are right. When BuildInstrOrderMap() is called, SawStore should be updated too. Otherwise we risk to prevent the reordering of loads in the new scheduling region.

update

Thanks for the reviews!

lib/CodeGen/LiveRangeShrink.cpp
81–85 ↗	(On Diff #98529)	That would actually make the compile time slower by adding additional check on the frequent path (BuildInstOrderMap), as instructions with unmodeled side effects is extremely rare in real programs.
135 ↗	(On Diff #98351)	I've updated the logic to update SawStore on the fly and reset to false when an instruction with unmodeled side effect is met.

Accepting this now as it looks reasonably safe and fast to me (nitpicks below).

It is a conservative/simple heuristic but given the fact that we lack other means of global code motion (in the sense of inter-basic block) in CodeGen today this is fine.

Please wait for some of the other reviewers which showed interest here to accept before committing.

lib/CodeGen/LiveRangeShrink.cpp
112 ↗	(On Diff #98635)	Maybe move UseMap outside the loop and `clear()` it each iteration instead so there is a chance that memory allocations can be reused.
10–12 ↗	(On Diff #98292)	Needs three slashes to be recognized as doxygen. (I know this is wrong in a lot of existing code, but we should get it right in new code)
211–215 ↗	(On Diff #98292)	I'm not sure why this would work. Maybe you should double check by compiling the llvm-testsuite (or some other set of programs) with a debug compiler and `-g`...

This revision is now accepted and ready to land.May 11 2017, 10:52 AM

In D32563#752429, @MatzeB wrote:

Accepting this now as it looks reasonably safe and fast to me (nitpicks below).

It is a conservative/simple heuristic but given the fact that we lack other means of global code motion (in the sense of inter-basic block) in CodeGen today this is fine.

Please wait for some of the other reviewers which showed interest here to accept before committing.

LGTM too.

Thanks!

update

Harbormaster completed remote builds in B6360: Diff 98677.May 11 2017, 2:25 PM

danielcdh added inline comments.May 11 2017, 2:25 PM

lib/CodeGen/LiveRangeShrink.cpp
211–215 ↗	(On Diff #98292)	glad that I run this with -g and actually caught a bug: when I move instructions together with debug info, I should increment the Next pointer, otherwise the iterator will iterate to instructions that have already been processed. Updated the implementation and added test coverage.

Thanks all for the reviews. Any other comments/concerns? If not, I'll commit the patch tomorrow.

Thanks for fixing the issue with debug values.

The patch still looks good to me.
A couple of comments can be slightly improved (see below).

lib/CodeGen/LiveRangeShrink.cpp
166 ↗	(On Diff #98677)	Please remember to add a period at the end of each sentence.
182 ↗	(On Diff #98677)	s/shrinked/shrunk Also, please add a period at the end of this sentence.
185 ↗	(On Diff #98677)	You should update this comment since we also skip debug values now.

update

Harbormaster completed remote builds in B6383: Diff 98774.May 12 2017, 8:35 AM

I'll submit the patch now to allow rest of today to catch potential buildbot breakage. Thanks again for all the reviews.

Dehao

danielcdh closed this revision.May 12 2017, 12:42 PM

Sorry about the regression. As Quentin pointed out, moving instructions close to phi should not be a problem as the algorithm checks the moved instruction uses >1 single-use defs. Looking at the example of the regression case, the transformation looks reasonable if we only consider shrinking live-ranges. Could you provide an IR that I can try reproduce the issue and see why register allocation is worse?

During the mean time, I can prepare a patch to move this to x86-only, or revert the patch if necessary.

patch to move the pass to x86 is proposed: https://reviews.llvm.org/D33294

Will have a separate patch to fix the REG_SEQUENCE handling once I get a test that I can include in unittesting. I suppose the fix is: ignore instruction whose def register size is no less than the sum of the use register sizes?

Thanks,
Dehao

In D32563#757717, @danielcdh wrote:

patch to move the pass to x86 is proposed: https://reviews.llvm.org/D33294

Will have a separate patch to fix the REG_SEQUENCE handling once I get a test that I can include in unittesting. I suppose the fix is: ignore instruction whose def register size is no less than the sum of the use register sizes?

REG_SEQUENCE is typically used to construct tuple register classes. So you get a set of values in consecutively numbered registers (you may watch my llvm dev meeting talk to see some introduction).

In any case I think the safest solution here would be to NOT trigger on REG_SEQUENCE but on register classes; For the REG_SEQUENCES in question a number of small 1-register values are shrunk while extending a larger multi-register value, so the initial assumption that everything is well when you shrink more ranges than you extend is wrong. So a conservative answer here would be to simply bail out if the register classes of the vregs you are shrinking/extended don't match up.

Hi, escha,

Could you send me an IR file or snippet that could trigger this performance issue? I can include it in the patch as unittest.

Thanks,
Dehao

The patch was reverted because it breaks tests at runtime.

This revision is now accepted and ready to land.May 18 2017, 6:26 PM

Updated the implementation and tests:

Do not hoist instructions if its def MO does not share the same register class with the use MO.
Include both instruction order and MI in the use map. Using instruction order alone is not good enough because when instruction get hoisted, its order will be the same as the next instruction. In this case, we need to ensure the insertion point is after the last barrier, i.e. last instruction MO (implicitly defined by MI) has been used.

PTAL, Thanks!

clang-format...

lib/CodeGen/LiveRangeShrink.cpp
164–165 ↗	(On Diff #99642)	This should better be `!TargetRegisterInfo::isVirtualRegister(Reg)` which is not the same as `isPhysicalRegister()` because a register operand can also be NoReg (=0) which shouldn't stop movement.
177–178 ↗	(On Diff #99642)	As we got surprised by it in the first try, it would be good to document the necessity for the regClass() check. Something along the lines of: /// The heuristic does not handle different register classes yet (registers of different sizes, looser/tighter constraints). getRegClass() only works for virtual registers, as far as I understand this needs a check as we could get here with a constant physreg.

update

lib/CodeGen/LiveRangeShrink.cpp
177–178 ↗	(On Diff #99642)	Updated the logic to make sure that DefMO and MO will always be virtual register.

LGTM, with nitpicks below.

lib/CodeGen/LiveRangeShrink.cpp
165–166 ↗	(On Diff #100804)	You can probably use this so instructions are still moved even if they use %noreg. if (!Reg \|\| MRI.isConstantPhysReg(Reg)) continue;
167–168 ↗	(On Diff #100804)	Remove commented code.

update

Harbormaster completed remote builds in B6882: Diff 100809.May 30 2017, 5:48 PM

danielcdh marked an inline comment as done.May 30 2017, 5:48 PM

update test

Harbormaster completed remote builds in B6900: Diff 100917.May 31 2017, 1:44 PM

danielcdh closed this revision.May 31 2017, 4:25 PM

For the record: This landed in r304371

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

2 lines

Scalar/

CMakeLists.txt

1 line

LiveRangeShrink.cpp

93 lines

Scalar.cpp

5 lines

test/

Transforms/

Reassociate/

hoist.ll

92 lines

Diff 97125

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	void initializeLegacyLoopSinkPassPass(PassRegistry&);			void initializeLegacyLoopSinkPassPass(PassRegistry&);
	void initializeLazyValueInfoPrinterPass(PassRegistry&);			void initializeLazyValueInfoPrinterPass(PassRegistry&);
	void initializeLegalizerPass(PassRegistry&);			void initializeLegalizerPass(PassRegistry&);
	void initializeLibCallsShrinkWrapLegacyPassPass(PassRegistry&);			void initializeLibCallsShrinkWrapLegacyPassPass(PassRegistry&);
	void initializeLintPass(PassRegistry&);			void initializeLintPass(PassRegistry&);
	void initializeLiveDebugValuesPass(PassRegistry&);			void initializeLiveDebugValuesPass(PassRegistry&);
	void initializeLiveDebugVariablesPass(PassRegistry&);			void initializeLiveDebugVariablesPass(PassRegistry&);
	void initializeLiveIntervalsPass(PassRegistry&);			void initializeLiveIntervalsPass(PassRegistry&);
				void initializeLiveRangeShrinkLegacyPassPass(PassRegistry&);
	void initializeLiveRegMatrixPass(PassRegistry&);			void initializeLiveRegMatrixPass(PassRegistry&);
	void initializeLiveStacksPass(PassRegistry&);			void initializeLiveStacksPass(PassRegistry&);
	void initializeLiveVariablesPass(PassRegistry&);			void initializeLiveVariablesPass(PassRegistry&);
	void initializeLoadCombinePass(PassRegistry&);			void initializeLoadCombinePass(PassRegistry&);
	void initializeLoadStoreVectorizerPass(PassRegistry&);			void initializeLoadStoreVectorizerPass(PassRegistry&);
	void initializeLoaderPassPass(PassRegistry&);			void initializeLoaderPassPass(PassRegistry&);
	void initializeLocalStackSlotPassPass(PassRegistry&);			void initializeLocalStackSlotPassPass(PassRegistry&);
	void initializeLoopAccessLegacyAnalysisPass(PassRegistry&);			void initializeLoopAccessLegacyAnalysisPass(PassRegistry&);
	▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createIPConstantPropagationPass();		(void) llvm::createIPConstantPropagationPass();
(void) llvm::createIPSCCPPass();		(void) llvm::createIPSCCPPass();
(void) llvm::createInductiveRangeCheckEliminationPass();		(void) llvm::createInductiveRangeCheckEliminationPass();
(void) llvm::createIndVarSimplifyPass();		(void) llvm::createIndVarSimplifyPass();
(void) llvm::createInstructionCombiningPass();		(void) llvm::createInstructionCombiningPass();
(void) llvm::createInternalizePass();		(void) llvm::createInternalizePass();
(void) llvm::createLCSSAPass();		(void) llvm::createLCSSAPass();
(void) llvm::createLICMPass();		(void) llvm::createLICMPass();
		(void) llvm::createLiveRangeShrinkPass();
(void) llvm::createLoopSinkPass();		(void) llvm::createLoopSinkPass();
(void) llvm::createLazyValueInfoPass();		(void) llvm::createLazyValueInfoPass();
(void) llvm::createLoopExtractorPass();		(void) llvm::createLoopExtractorPass();
(void) llvm::createLoopInterchangePass();		(void) llvm::createLoopInterchangePass();
(void) llvm::createLoopPredicationPass();		(void) llvm::createLoopPredicationPass();
(void) llvm::createLoopSimplifyPass();		(void) llvm::createLoopSimplifyPass();
(void) llvm::createLoopSimplifyCFGPass();		(void) llvm::createLoopSimplifyCFGPass();
(void) llvm::createLoopStrengthReducePass();		(void) llvm::createLoopStrengthReducePass();
▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines
	// %Z = add int 1, %Y			// %Z = add int 1, %Y
	// into:			// into:
	// %Z = add int 2, %X			// %Z = add int 2, %X
	//			//
	FunctionPass *createInstructionCombiningPass(bool ExpensiveCombines = true);			FunctionPass *createInstructionCombiningPass(bool ExpensiveCombines = true);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// LiveRangeShrink - This pass moves the instruction near the def of its
				// operands inside the basic block to shrink live range.
				//
				BasicBlockPass *createLiveRangeShrinkPass();

				//===----------------------------------------------------------------------===//
				//
	// LICM - This pass is a loop invariant code motion and memory promotion pass.			// LICM - This pass is a loop invariant code motion and memory promotion pass.
	//			//
	Pass *createLICMPass();			Pass *createLICMPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopSink - This pass sinks invariants from preheader to loop body where			// LoopSink - This pass sinks invariants from preheader to loop body where
	// frequency is lower than loop preheader.			// frequency is lower than loop preheader.
	▲ Show 20 Lines • Show All 428 Lines • Show Last 20 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 667 Lines • ▼ Show 20 Lines	if (OptLevel > 1) {
MPM.add(createGlobalDCEPass()); // Remove dead fns and globals.		MPM.add(createGlobalDCEPass()); // Remove dead fns and globals.
MPM.add(createConstantMergePass()); // Merge dup global constants		MPM.add(createConstantMergePass()); // Merge dup global constants
}		}
}		}

if (MergeFunctions)		if (MergeFunctions)
MPM.add(createMergeFunctionsPass());		MPM.add(createMergeFunctionsPass());

		MPM.add(createLiveRangeShrinkPass());

// LoopSink pass sinks instructions hoisted by LICM, which serves as a		// LoopSink pass sinks instructions hoisted by LICM, which serves as a
// canonicalization pass that enables other optimizations. As a result,		// canonicalization pass that enables other optimizations. As a result,
// LoopSink pass needs to be a very late IR pass to avoid undoing LICM		// LoopSink pass needs to be a very late IR pass to avoid undoing LICM
// result too early.		// result too early.
MPM.add(createLoopSinkPass());		MPM.add(createLoopSinkPass());
// Get rid of LCSSA nodes.		// Get rid of LCSSA nodes.
MPM.add(createInstructionSimplifierPass());		MPM.add(createInstructionSimplifierPass());
addExtensionsToPM(EP_OptimizerLast, MPM);		addExtensionsToPM(EP_OptimizerLast, MPM);
▲ Show 20 Lines • Show All 309 Lines • Show Last 20 Lines

lib/Transforms/Scalar/CMakeLists.txt

Show All 13 Lines	add_llvm_library(LLVMScalarOpts
GVN.cpp		GVN.cpp
GVNHoist.cpp		GVNHoist.cpp
IVUsersPrinter.cpp		IVUsersPrinter.cpp
InductiveRangeCheckElimination.cpp		InductiveRangeCheckElimination.cpp
IndVarSimplify.cpp		IndVarSimplify.cpp
InferAddressSpaces.cpp		InferAddressSpaces.cpp
JumpThreading.cpp		JumpThreading.cpp
LICM.cpp		LICM.cpp
		LiveRangeShrink.cpp
LoopAccessAnalysisPrinter.cpp		LoopAccessAnalysisPrinter.cpp
LoopSink.cpp		LoopSink.cpp
LoadCombine.cpp		LoadCombine.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopDataPrefetch.cpp		LoopDataPrefetch.cpp
LoopDistribute.cpp		LoopDistribute.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LiveRangeShrink.cpp

This file was added.

				//===- LiveRangeShrink.cpp - Shrink life range within BB ------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass moves instructions close to the definition of its operands to
				// shrink live range of the def instruction.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/Pass.h"
				#include "llvm/Transforms/Scalar.h"
				using namespace llvm;

				#define DEBUG_TYPE "lrshrink"

				namespace {
				void ShrinkLifeRange(BasicBlock &BB) {
				DenseMap<Instruction *, int> InstOrderMap;
				int i = 0;
				for (Instruction &I : BB)
				InstOrderMap[&I] = i++;

				Instruction *Next = BB.getFirstNonPHI();
				for (Instruction *I = Next; !isa<TerminatorInst>(I); I = Next) {
				Next = I->getNextNode();
				if (!isa<BinaryOperator>(I) && !isa<CastInst>(I) && !isa<SelectInst>(I) &&
				!isa<GetElementPtrInst>(I) && !isa<CmpInst>(I) &&
				!isa<InsertElementInst>(I) && !isa<ExtractElementInst>(I) &&
				!isa<ShuffleVectorInst>(I) && !isa<ExtractValueInst>(I) &&
				!isa<InsertValueInst>(I))
				continue;
				if (I->getNumUses() > 1)
				continue;
				Instruction *Insert = nullptr;
				for (Value *V : I->operands()) {
				if (!V->hasOneUse()) {
				Insert = nullptr;
				break;
				}
				if (Instruction *Inst = dyn_cast<Instruction>(V))
				if (Inst->getParent() == &BB &&
				(Insert == nullptr \|\| InstOrderMap[Inst] > InstOrderMap[Insert]))
				Insert = Inst;
				}
				if (Insert == nullptr)
				continue;
				do {
				Insert = Insert->getNextNode();
				} while (isa<PHINode>(Insert));
				wmiUnsubmitted Not Done Reply Inline Actions Maybe use `BasicBlock::getFirstInsertionPt` instead. Set `Insert` to be `BasicBlock::getFirstInsertionPt` if `Insert` is before `BasicBlock::getFirstInsertionPt`. wmi: Maybe use `BasicBlock::getFirstInsertionPt` instead. Set `Insert` to be `BasicBlock…
				if (Insert != I) {
				InstOrderMap[I] = InstOrderMap[Insert];
				for (Instruction *U = Insert; U != I; U = U->getNextNode())
				InstOrderMap[U]++;
				wmiUnsubmitted Not Done Reply Inline Actions Can we remove the update loop of InstOrderMap? It leads to O(N^2) complexity. One way is just to let InstOrderMap[I] = InstOrderMap[Insert] and keep the rest the same. wmi: Can we remove the update loop of InstOrderMap? It leads to O(N^2) complexity. One way is just…
				wmiUnsubmitted Not Done Reply Inline Actions Please ignore this for now. Sorry. It is an incomplete comment sent out accidently. wmi: Please ignore this for now. Sorry. It is an incomplete comment sent out accidently.
				I->moveBefore(Insert);
				}
				}
				}

				class LiveRangeShrinkLegacyPass : public BasicBlockPass {
				wmiUnsubmitted Not Done Reply Inline Actions Can we use a function pass? which will be easier for future extension. wmi: Can we use a function pass? which will be easier for future extension.
				public:
				static char ID; // Pass identification, replacement for typeid
				LiveRangeShrinkLegacyPass() : BasicBlockPass(ID) {
				initializeLiveRangeShrinkLegacyPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnBasicBlock(BasicBlock &BB) override {
				if (skipBasicBlock(BB))
				return false;

				ShrinkLifeRange(BB);
				return false;
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				}
				};
				} // namespace

				char LiveRangeShrinkLegacyPass::ID = 0;
				INITIALIZE_PASS(LiveRangeShrinkLegacyPass, "lrshrink", "Live Range Shrinking",
				false, false)

				BasicBlockPass *llvm::createLiveRangeShrinkPass() {
				return new LiveRangeShrinkLegacyPass();
				}

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeGVNHoistLegacyPassPass(Registry);		initializeGVNHoistLegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyLegacyPassPass(Registry);		initializeIndVarSimplifyLegacyPassPass(Registry);
initializeInferAddressSpacesPass(Registry);		initializeInferAddressSpacesPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLegacyLICMPassPass(Registry);		initializeLegacyLICMPassPass(Registry);
initializeLegacyLoopSinkPassPass(Registry);		initializeLegacyLoopSinkPassPass(Registry);
		initializeLiveRangeShrinkLegacyPassPass(Registry);
initializeLoopDataPrefetchLegacyPassPass(Registry);		initializeLoopDataPrefetchLegacyPassPass(Registry);
initializeLoopDeletionLegacyPassPass(Registry);		initializeLoopDeletionLegacyPassPass(Registry);
initializeLoopAccessLegacyAnalysisPass(Registry);		initializeLoopAccessLegacyAnalysisPass(Registry);
initializeLoopInstSimplifyLegacyPassPass(Registry);		initializeLoopInstSimplifyLegacyPassPass(Registry);
initializeLoopInterchangePass(Registry);		initializeLoopInterchangePass(Registry);
initializeLoopPredicationLegacyPassPass(Registry);		initializeLoopPredicationLegacyPassPass(Registry);
initializeLoopRotateLegacyPassPass(Registry);		initializeLoopRotateLegacyPassPass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
void LLVMAddInstructionCombiningPass(LLVMPassManagerRef PM) {		void LLVMAddInstructionCombiningPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createInstructionCombiningPass());		unwrap(PM)->add(createInstructionCombiningPass());
}		}

void LLVMAddJumpThreadingPass(LLVMPassManagerRef PM) {		void LLVMAddJumpThreadingPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createJumpThreadingPass());		unwrap(PM)->add(createJumpThreadingPass());
}		}

		void LLVMAddLiveRangeShrinkPass(LLVMPassManagerRef PM) {
		unwrap(PM)->add(createLiveRangeShrinkPass());
		}

void LLVMAddLoopSinkPass(LLVMPassManagerRef PM) {		void LLVMAddLoopSinkPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopSinkPass());		unwrap(PM)->add(createLoopSinkPass());
}		}

void LLVMAddLICMPass(LLVMPassManagerRef PM) {		void LLVMAddLICMPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLICMPass());		unwrap(PM)->add(createLICMPass());
}		}

▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

test/Transforms/Reassociate/hoist.ll

This file was added.

				; RUN: opt < %s -reassociate -lrshrink -S \| FileCheck %s

				declare i32 @foo()
				declare void @bar(i64)

				; The add should stay close of its operand def.
				;CHECK-LABEL: @test1
				;CHECK: call
				;CHECK: call
				;CHECK: add
				;CHECK: call
				;CHECK: add
				define i64 @test1() {
				%1 = call i32 @foo()
				%2 = zext i32 %1 to i64
				%3 = mul i64 4, %2
				%4 = add i64 0, %3
				%5 = call i32 @foo()
				%6 = zext i32 %5 to i64
				%7 = mul i64 4, %6
				%8 = add i64 %4, %7
				%9 = call i32 @foo()
				%10 = zext i32 %9 to i64
				%11 = mul i64 4, %10
				%12 = add i64 %8, %11
				ret i64 %12
				}

				; There are 2 uses of %2, thus the add will not be hoisted.
				;CHECK-LABEL: @test1_nohoist
				;CHECK: call
				;CHECK: call
				;CHECK: call
				;CHECK: add
				;CHECK: add
				define i64 @test1_nohoist() {
				%1 = call i32 @foo()
				%2 = zext i32 %1 to i64
				%3 = mul i64 4, %2
				%4 = add i64 0, %3
				%5 = call i32 @foo()
				%6 = zext i32 %5 to i64
				%7 = mul i64 4, %6
				%8 = add i64 %4, %7
				%9 = call i32 @foo()
				%10 = zext i32 %9 to i64
				%11 = mul i64 4, %10
				%12 = add i64 %8, %11
				call void @bar(i64 %2)
				ret i64 %12
				}

				; The add should stay close of its operand def.
				;CHECK-LABEL: @test2
				;CHECK: call
				;CHECK: call
				;CHECK: add
				;CHECK: call
				;CHECK: add
				define i64 @test2() {
				%1 = call i32 @foo()
				%2 = zext i32 %1 to i64
				%3 = add i64 0, %2
				%4 = call i32 @foo()
				%5 = zext i32 %4 to i64
				%6 = add i64 %3, %5
				%7 = call i32 @foo()
				%8 = zext i32 %7 to i64
				%9 = add i64 %6, %8
				ret i64 %9
				}

				; There are 2 uses of %2, thus the add will not be hoisted.
				;CHECK-LABEL: @test2_nohoist
				;CHECK: call
				;CHECK: call
				;CHECK: call
				;CHECK: add
				;CHECK: add
				define i64 @test2_nohoist() {
				%1 = call i32 @foo()
				%2 = zext i32 %1 to i64
				%3 = add i64 0, %2
				%4 = call i32 @foo()
				%5 = zext i32 %4 to i64
				%6 = add i64 %3, %5
				%7 = call i32 @foo()
				%8 = zext i32 %7 to i64
				%9 = add i64 %6, %8
				call void @bar(i64 %2)
				ret i64 %9
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add LiveRangeShrink pass to shrink live range within BB.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 97125

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/LiveRangeShrink.cpp

lib/Transforms/Scalar/Scalar.cpp

test/Transforms/Reassociate/hoist.ll

Add LiveRangeShrink pass to shrink live range within BB.
ClosedPublic