This is an archive of the discontinued LLVM Phabricator instance.

Speed up creation of live ranges for physical registers by using a segment set
ClosedPublic

Authored by gasiunas on Oct 28 2014, 7:12 AM.

Details

Summary

The patch addresses a compile-time performance regression in the LiveIntervals analysis pass (see http://llvm.org/bugs/show_bug.cgi?id=18580). This regression is especially critical when compiling long functions. Our analysis had shown that the most of time is taken for generation of live intervals for physical registers. Insertions in the middle of the array of live ranges cause quadratic algorithmic complexity, which is apparently the main reason for the slow-down.

Overview of changes:

  • The patch introduces an additional std::set<Segment>* member in LiveRange for storing segments in the phase of initial creation. The set is used if this member is not NULL, otherwise everything works the old way.
  • The set of operations on LiveRange used during initial creation (i.e. used by createDeadDefs and extendToUses) have been reimplemented to use the segment set if it is available.
  • After a live range is created the contents of the set are flushed to the segment vector, because the set is not as efficient as the vector for the later uses of the live range. After the flushing, the set is deleted and cannot be used again.
  • The set is only for live ranges computed in LiveIntervalAnalysis::computeLiveInRegUnits() and getRegUnit() but not in computeVirtRegs(), because I did not bring any performance benefits to computeVirtRegs() and for some examples even brought a slow down.

Diff Detail

Repository
rL LLVM

Event Timeline

gasiunas updated this revision to Diff 15522.Oct 28 2014, 7:12 AM
gasiunas retitled this revision from to Speed up creation of live ranges for physical registers by using a segment set.
gasiunas updated this object.
gasiunas edited the test plan for this revision. (Show Details)
gasiunas added reviewers: qcolombet, atrick, Gerolf.
gasiunas added a subscriber: Unknown Object (MLST).

Couple of questions:

a) Compile time results?
b) Why didn't it help/produce a slowdown for computeVirtRegs?
c) What's the slowdown look like for using it all the time? Is it because we're using find instead of set::upper_bound? Where are we running into issues?
d) It seems like we're duplicating a lot of code to do this, is there any way you can think of to remove that duplication here?

Couple of clean up comments:

a) Comments are a full sentence including captialization.
b) Pull the asserts out of the else blocks.
c) Single statement blocks shouldn't have braces.

-eric

qcolombet edited edge metadata.Nov 3 2014, 4:13 PM

Hi Vaidas,

I share Eric’s concerns/questions.
In particular, I think we should put more effort to reduce the amount of code we duplicate here. I gave some thoughts in the inlined comments.

Also, I think for performance evaluation it would be nice to be able to use or not use it every with a command line option. That would help to figure out what is the problem with computeVirtRegs. Moreover that would allow to completely disable it if we believe it has some problem.

After a live range is created the contents of the set are flushed to the segment vector, because the set is not as efficient as the vector for the later uses of the live range. After the flushing, the set is deleted and cannot be used again.

If we really want to use the set internally like this, we shouldn’t expose the createSegmentSet method in the public API. Therefore, I think we should try to completely expose this interface. I have thrown some ideas in the inlined comments.

Thanks for working on this!
-Quentin

include/llvm/CodeGen/LiveInterval.h
194 ↗(On Diff #15522)

LLVM policy is to use variable names that start with a capital letter.
That said, the surrounded code already use such a naming convention, so I’d say it is fine.
Anyway, the formatting is not right, because ‘*’ should be with the variable name.

Use clang-format on your patch, please.

196 ↗(On Diff #15522)

Use nullptr.

200 ↗(On Diff #15522)

You do not need to check for NULL before calling delete.

419 ↗(On Diff #15522)

Use nullptr (or nothing).

553 ↗(On Diff #15522)

Typo: of/ => of

lib/CodeGen/LiveInterval.cpp
342 ↗(On Diff #15522)

I believe this shouldn’t be necessary with some abstractions.
See LiveRange::segsetExtendInBlock.

610 ↗(On Diff #15522)
622 ↗(On Diff #15522)

Please try to share this code between the vector and the set versions.
You could stick to the existing implementation (with some abstract on the iterator type) and use an abstraction for the final insertion for instance.

656 ↗(On Diff #15522)

Same thing as segsetCreateDeadDef, you should share more logic between the implementations.
For instance, you could abstract begin(), upper_bound(), empty() and you should be able to reuse all the code.

677 ↗(On Diff #15522)

Ditto.
Abstract end(), erase().

948 ↗(On Diff #15522)

Is this code path used during createDeadDefs and extendToUses?

lib/CodeGen/LiveIntervalAnalysis.cpp
297 ↗(On Diff #15522)

I would prefer this to be an internal detail of the constructor of LiveRange.

315 ↗(On Diff #15522)

What about pushing the flush into computeRegUnitRange?

Even better, could we automatically decide when to do the flush internally in the LiveRange class? E.g., the first time we actually need the vector capabilities.
That would remove this kind of explicit calls to flush that are easy to forget or misplace.

gasiunas added a comment.EditedNov 4 2014, 6:49 AM

Hi Eric and Quentin,

Thanks for the comments! At first, I will give some info on performance.

To measure compliation time I used an example generated by the script from http://llvm.org/bugs/show_bug.cgi?id=18580 with the parameter 10000. This generates a C program with 10000 string objects and 10000 small if statements using these variables. The C program is compiled to IR by clang. This IR I then use to measure the compilation time of llc. I run llc with -01 to skip the optimization passes.

So with this example, the original llc runs about 19s and the llc with the patch runs about 10s.

Here is the result of profiling the llc without the patch (exported as CSV from vtune):
"Function Stack","CPU Time: Total by Utilization","CPU Time: Self by Utilization","Overhead and Spin Time: Total","Overhead and Spin Time: Self","Module","Function (Full)","Source File","Start Address"
" llvm::FPPassManager::runOnFunction","18.55","0","0","0","llc-no","llvm::FPPassManager::runOnFunction(llvm::Function&)","","0x1366d50"
" llvm::LiveIntervals::runOnMachineFunction","9.07199","0","0","0","llc-no","llvm::LiveIntervals::runOnMachineFunction(llvm::MachineFunction&)","","0xe8cb00"
" llvm::LiveIntervals::computeLiveInRegUnits","8.95999","0","0","0","llc-no","llvm::LiveIntervals::computeLiveInRegUnits(void)","","0xe8a1a0"
" llvm::LiveIntervals::computeRegUnitRange","8.95999","0","0","0","llc-no","llvm::LiveIntervals::computeRegUnitRange(llvm::LiveRange&, unsigned int)","","0xe89fd0"
" llvm::LiveRangeCalc::extendToUses","4.77199","0.0180017","0","0","llc-no","llvm::LiveRangeCalc::extendToUses(llvm::LiveRange&, unsigned int)","","0xe94640"
" llvm::LiveRangeCalc::extend","4.75399","0.00799807","0","0","llc-no","llvm::LiveRangeCalc::extend(llvm::LiveRange&, llvm::SlotIndex, unsigned int)","","0xe94420"
" llvm::LiveRange::extendInBlock","4.74599","0.0199952","0","0","llc-no","llvm::LiveRange::extendInBlock(llvm::SlotIndex, llvm::SlotIndex)","","0xe85f60"
" llvm::LiveRange::extendSegmentEndTo","4.726","4.726","0","0","llc-no","llvm::LiveRange::extendSegmentEndTo(llvm::LiveRange::Segment*, llvm::SlotIndex)","","0xe84a30"
" llvm::LiveRangeCalc::createDeadDefs","4.18799","0.0279962","0","0","llc-no","llvm::LiveRangeCalc::createDeadDefs(llvm::LiveRange&, unsigned int)","","0xe91650"
" llvm::LiveRange::createDeadDef","4.16","4.128","0","0","llc-no","llvm::LiveRange::createDeadDef(llvm::SlotIndex, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, (unsigned long)4096, (unsigned long)4096>&)","","0xe880a0"
" llvm::LiveRange::find","0.0319956","0.0319956","0","0","llc-no","llvm::LiveRange::find(llvm::SlotIndex)","","0xe84530"
" llvm::LiveIntervals::computeVirtRegs","0.0920017","0","0","0","llc-no","llvm::LiveIntervals::computeVirtRegs(void)","","0xe8c9e0"
" llvm::LiveIntervals::computeVirtRegInterval","0.0840081","0","0","0","llc-no","llvm::LiveIntervals::computeVirtRegInterval(llvm::LiveInterval&)","","0xe8a6a0"
" llvm::LiveRangeCalc::reset","0.0440009","0.0440009","0","0","llc-no","llvm::LiveRangeCalc::reset(llvm::MachineFunction const*, llvm::SlotIndexes*, llvm::MachineDominatorTree*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, (unsigned long)4096, (unsigned long)4096>*)","","0xe94ba0"
" llvm::LiveRangeCalc::extendToUses","0.0320052","0.0320052","0","0","llc-no","llvm::LiveRangeCalc::extendToUses(llvm::LiveRange&, unsigned int)","","0xe94640"
" llvm::LiveRangeCalc::createDeadDefs","0.00800203","0","0","0","llc-no","llvm::LiveRangeCalc::createDeadDefs(llvm::LiveRange&, unsigned int)","","0xe91650"
" llvm::LiveIntervals::createInterval","0.00799364","0.00799364","0","0","llc-no","llvm::LiveIntervals::createInterval(unsigned int)","","0xe89f40"
" llvm::LiveIntervals::computeRegMasks","0.02","0.02","0","0","llc-no","llvm::LiveIntervals::computeRegMasks(void)","","0xe8b720"
" llvm::SelectionDAGISel::runOnMachineFunction","4.08","0.0160085","0","0","llc-no","llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&)","","0xd60d50"
" (anonymous namespace)::MachineBlockPlacement::runOnMachineFunction","1.77799","0","0","0","llc-no","(anonymous namespace)::MachineBlockPlacement::runOnMachineFunction(llvm::MachineFunction&)","","0xebb050"
" llvm::LiveVariables::runOnMachineFunction","0.370002","0","0","0","llc-no","llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&)","","0xea1610"

Here is the profiler result of the llc with the patch:
"Function Stack","CPU Time: Total by Utilization","CPU Time: Self by Utilization","Overhead and Spin Time: Total","Overhead and Spin Time: Self","Module","Function (Full)","Source File","Start Address"
" llvm::FPPassManager::runOnFunction","9.72","0","0","0","llc-pr","llvm::FPPassManager::runOnFunction(llvm::Function&)","","0x1367190"
" llvm::SelectionDAGISel::runOnMachineFunction","4.08","0.0320187","0","0","llc-pr","llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&)","","0xd60e80"
" (anonymous namespace)::MachineBlockPlacement::runOnMachineFunction","1.57","0","0","0","llc-pr","(anonymous namespace)::MachineBlockPlacement::runOnMachineFunction(llvm::MachineFunction&)","","0xebb1e0"
" llvm::LiveVariables::runOnMachineFunction","0.378","0.0360253","0","0","llc-pr","llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&)","","0xea17a0"
" (anonymous namespace)::VerifierLegacyPass::runOnFunction","0.29999","0.0160001","0","0","llc-pr","(anonymous namespace)::VerifierLegacyPass::runOnFunction(llvm::Function&)","","0x1392df0"
" llvm::LiveIntervals::runOnMachineFunction","0.267978","0","0","0","llc-pr","llvm::LiveIntervals::runOnMachineFunction(llvm::MachineFunction&)","","0xe8cc70"
" llvm::LiveIntervals::computeLiveInRegUnits","0.159983","0","0","0","llc-pr","llvm::LiveIntervals::computeLiveInRegUnits(void)","","0xe8a2e0"
" llvm::LiveIntervals::computeVirtRegs","0.0879867","0.0120113","0","0","llc-pr","llvm::LiveIntervals::computeVirtRegs(void)","","0xe8cb50"
" llvm::LiveIntervals::computeRegMasks","0.0200084","0.0200084","0","0","llc-pr","llvm::LiveIntervals::computeRegMasks(void)","","0xe8b890"
" (anonymous namespace)::MachineScheduler::runOnMachineFunction","0.25997","0","0","0","llc-pr","(anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&)","","0xf074f0"
" llvm::MachineDominatorTree::runOnMachineFunction","0.255892","0","0","0","llc-pr","llvm::MachineDominatorTree::runOnMachineFunction(llvm::MachineFunction&)","","0xed3ca0"
" llvm::X86AsmPrinter::runOnMachineFunction","0.239961","0","0","0","llc-pr","llvm::X86AsmPrinter::runOnMachineFunction(llvm::MachineFunction&)","","0xb43390"
" (anonymous namespace)::MachineCSE::PerformCSE","0.219981","0.0440057","0","0","llc-pr","(anonymous namespace)::MachineCSE::PerformCSE(llvm::DomTreeNodeBase<llvm::MachineBasicBlock>*)","","0xec0620"
" (anonymous namespace)::RAGreedy::runOnMachineFunction","0.181984","0","0","0","llc-pr","(anonymous namespace)::RAGreedy::runOnMachineFunction(llvm::MachineFunction&)","","0xf4c440"
" llvm::SlotIndexes::runOnMachineFunction","0.144008","0.144008","0","0","llc-pr","llvm::SlotIndexes::runOnMachineFunction(llvm::MachineFunction&)","","0xf7f3f0"
" (anonymous namespace)::RegisterCoalescer::runOnMachineFunction","0.130013","0.0520054","0","0","llc-pr","(anonymous namespace)::RegisterCoalescer::runOnMachineFunction(llvm::MachineFunction&)","","0xf60390"
" (anonymous namespace)::StackColoring::runOnMachineFunction","0.128002","0.00800791","0","0","llc-pr","(anonymous namespace)::StackColoring::runOnMachineFunction(llvm::MachineFunction&)","","0xf914b0"

As you see, the optimization reduces the time of the LiveIntervals pass from 9s to 0.27s.

If I also activate the segment set in computeVirtRegInterval, then the time of the LiveIntervals pass goes slightly up to 0.32s. This difference is of course insignificant compared to the total time, but since this optimization is optional anyway, I don't see a reason why should we activate in computeVirtRegInterval.

Here is the profiler result for llc with the patch + segment set activated computeVirtRegInterval:
"Function Stack","CPU Time: Total by Utilization","CPU Time: Self by Utilization","Overhead and Spin Time: Total","Overhead and Spin Time: Self","Module","Function (Full)","Source File","Start Address"
" llvm::LiveIntervals::runOnMachineFunction","0.319991","0","0","0","llc-vr","llvm::LiveIntervals::runOnMachineFunction(llvm::MachineFunction&)","","0xe8cc80"
" llvm::LiveIntervals::computeLiveInRegUnits","0.167986","0","0","0","llc-vr","llvm::LiveIntervals::computeLiveInRegUnits(void)","","0xe8a2e0"
" llvm::LiveIntervals::computeRegUnitRange","0.147984","0","0","0","llc-vr","llvm::LiveIntervals::computeRegUnitRange(llvm::LiveRange&, unsigned int)","","0xe8a110"
" llvm::LiveRange::flushSegmentSet","0.0120047","0.0120047","0","0","llc-vr","llvm::LiveRange::flushSegmentSet(void)","","0xe86810"
" llvm::LiveRange::createDeadDef","0.00799748","0","0","0","llc-vr","llvm::LiveRange::createDeadDef(llvm::SlotIndex, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, (unsigned long)4096, (unsigned long)4096>&)","","0xe881e0"
" llvm::LiveIntervals::computeVirtRegs","0.131992","0.0119954","0","0","llc-vr","llvm::LiveIntervals::computeVirtRegs(void)","","0xe8cb60"
" llvm::LiveIntervals::computeVirtRegInterval","0.0920111","0","0","0","llc-vr","llvm::LiveIntervals::computeVirtRegInterval(llvm::LiveInterval&)","","0xe8a810"
" llvm::LiveRangeCalc::createDeadDefs","0.0560275","0.0560275","0","0","llc-vr","llvm::LiveRangeCalc::createDeadDefs(llvm::LiveRange&, unsigned int)","","0xe917f0"
" llvm::LiveRangeCalc::extendToUses","0.0199875","0.00799175","0","0","llc-vr","llvm::LiveRangeCalc::extendToUses(llvm::LiveRange&, unsigned int)","","0xe947e0"
" llvm::LiveRangeCalc::reset","0.015996","0.015996","0","0","llc-vr","llvm::LiveRangeCalc::reset(llvm::MachineFunction const*, llvm::SlotIndexes*, llvm::MachineDominatorTree*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, (unsigned long)4096, (unsigned long)4096>*)","","0xe94d40"
" llvm::LiveRange::flushSegmentSet","0.0199879","0.0199879","0","0","llc-vr","llvm::LiveRange::flushSegmentSet(void)","","0xe86810"
" llvm::LiveIntervals::createInterval","0.00799796","0.00799796","0","0","llc-vr","llvm::LiveIntervals::createInterval(unsigned int)","","0xe8a080"
" llvm::LiveIntervals::computeRegMasks","0.0200124","0.0200124","0","0","llc-vr","llvm::LiveIntervals::computeRegMasks(void)","","0xe8b8a0"

My explanation of the performance differences is that the optimization is beneficial for very large live ranges with thousands of segments, because then insertion of new segments into the middle of the array is very expensive. For small live ranges using array is slightly more efficient that using set. However, in these cases LiveIntervals pass does not take much time anyway.

Vaidas

c) What's the slowdown look like for using it all the time? Is it because we're using find instead of set::upper_bound? Where are we running into issues?

I was experimenting with it almost a year ago, so I don't remember precisely which operations were slower. I think it was not lookup, but rather some range operations like splitting or merging of ranges, that work much faster with the array. Unfortunately, I cannot try out that again with the current patch, because now only the operations that are used in the initial computation of the live range are implemented for set.

d) It seems like we're duplicating a lot of code to do this, is there any way you can think of to remove that duplication here?

Yes I also don't like the code duplication. This was actually the reason why I hesitated to contribute that patch. I am open to suggestions.

b) Pull the asserts out of the else blocks.

Which ones do you mean? Note that most of the new code is a copy of analogous operations with vector.

Also, I think for performance evaluation it would be nice to be able to use or not use it every with a command line option. That would help to figure out what is the problem with computeVirtRegs. Moreover that would allow to completely disable it if we believe it has some problem.

Note that computeVirtRegs is not really a problem, it simply does not really need that optimization, but yes it makes sense to have an switch for the optimization as whole.

Vaidas

lib/CodeGen/LiveInterval.cpp
622 ↗(On Diff #15522)

I don't see how it would work. Could you give some more details? Do you suggest having some adapter iterator wrapping one of the two iterators?

656 ↗(On Diff #15522)

How can I abstract them, if the two iterator types are not compatible with each other?

948 ↗(On Diff #15522)

It is used in LiveRangeCalc::extendToUses through LiveRangeCalc::findReachingDefs

lib/CodeGen/LiveIntervalAnalysis.cpp
315 ↗(On Diff #15522)

At first I put it to computeRegUnitRange, but I thought that is a more clear design if we have calls to createSegmentSet() and flushSegmentSet() at the same level.

Flushing automatically sounds good, I will try it out.

qcolombet added inline comments.Nov 4 2014, 10:08 AM
lib/CodeGen/LiveInterval.cpp
622 ↗(On Diff #15522)

Yes something like that, or you can use a template argument for the iterator type. Though that can be messy quite quickly.

656 ↗(On Diff #15522)

A new type on top of them, like you suggested ;).

948 ↗(On Diff #15522)

Ah right! Thanks.

Gerolf added inline comments.Nov 12 2014, 5:46 PM
include/llvm/CodeGen/LiveInterval.h
427 ↗(On Diff #15522)

We don't repeat the function names in comments anymore.

include/llvm/CodeGen/LiveIntervalAnalysis.h
386 ↗(On Diff #15522)

Since the focus is on speeding up computeRegUnitRange() is seems cleaner to handle both the segment set allocation and flushing locally within that function. This will also remove the asymmetry in your code where some calls are enclosed by createSegment() + flush(), while for others (ok, I only see one more) creation and flush are separate.

lib/CodeGen/LiveInterval.cpp
55 ↗(On Diff #15522)

This is for the createDeadDef() call in computeRegUnitRange() only, right? Why not replace the call there directly with the call to the segmentSet function and remove this check?

340 ↗(On Diff #15522)

Same as for createDeadDef().

606 ↗(On Diff #15522)

I guess you could have templates for some of the segment set/segment utility routines if you wanted to push your design a bit. That would take care of some of the static code bloat.

945 ↗(On Diff #15522)

Hm. This looks like more changes than just adding a new data structure to handle live ranges calcs more efficiently. The existing updater checks for coalescing and spilling, while addSegment() does not. Please add a more detailed comment about the differences. Up until this point I was optimistic that the implementation could be cleaned up quite a bit, but now I'm not so sure anymore.

gasiunas added inline comments.Nov 13 2014, 12:58 AM
include/llvm/CodeGen/LiveIntervalAnalysis.h
386 ↗(On Diff #15522)

The problem is that LiveIntervals::computeLiveInRegUnits calls createDeadDef before calling computeRegUnitRange.

lib/CodeGen/LiveInterval.cpp
55 ↗(On Diff #15522)

Yes, I think it should work. That would simplify a bit.

606 ↗(On Diff #15522)

Yes, templates is the only thing that comes to my mind for reduction of code repetition. Other techniques like object-oriented abstraction would be probably too heavyweight, because it is performance critical code. The problem with templates is however that they would introduce quite some additional complexity. I think I will need some helper classes for template instantiation. I will think about it.

945 ↗(On Diff #15522)

Are these coalescing and spilling checks also relevant for the LiveIntervals phase? Because in other cases, the segmentSet is not used.

gasiunas updated this revision to Diff 16601.Nov 25 2014, 2:28 AM
gasiunas edited edge metadata.
  • Implementations of createDeadDef, extendInBlock, addSegment are templatized to make them reusable for the segment set and the segment vector.
  • The segment set is now created in LiveRange constructor, and flushSegmentSet is called in computeRegUnitRange
  • Comments regarding coding style are addressed

Hi Vaidas,

Thanks for the refactoring, this is looking good!

I just have a couple of nitpicks (see inline comments) and one request:

  • Could you add a cl::opt to override the use of segment set?

Thanks again,
-Quentin

include/llvm/CodeGen/LiveIntervalAnalysis.h
385 ↗(On Diff #16601)

For debugging purposes, I would like we have a command line option to override this behavior.
This means we will have to update flushSegmentSet accordingly or not calling it when the option is false.

lib/CodeGen/LiveInterval.cpp
63 ↗(On Diff #16601)

I believe this comment is out of date.

92 ↗(On Diff #16601)

Ditto.

125 ↗(On Diff #16601)

Using ’S' instead of ‘I' may be clearer here also it does not matter for correctness.
On a second thought, do we actually need S?

140 ↗(On Diff #16601)

Ditto.

158 ↗(On Diff #16601)

Do we really need to use segmentAt?
Couldn’t we use MergeTo directly?

162 ↗(On Diff #16601)

Ditto.

232 ↗(On Diff #16601)

Couldn’t this be a typedef?

259 ↗(On Diff #16601)

Ditto.

320 ↗(On Diff #16601)
469 ↗(On Diff #16601)
480 ↗(On Diff #16601)
874 ↗(On Diff #16601)

Capital letter and period to make a sentence.

lib/CodeGen/LiveIntervalAnalysis.cpp
299 ↗(On Diff #16601)

For debugging purposes, I would like we have a command line option to override this behavior.
This means we will have to update flushSegmentSet accordingly or not calling it when the option is false.

gasiunas updated this revision to Diff 17038.Dec 8 2014, 6:39 AM
  • An option to turn-off use of the segment set for the computation of the live ranges of physregs.
  • Various small changes addressing the comments of Quentin.

Hi Quentin,

Thanks a lot for the comments. I added the option to turn-off the optimization and fixed formatting and comments.

Best regards,
Vaidas

lib/CodeGen/LiveInterval.cpp
63 ↗(On Diff #16601)

Indeed

125 ↗(On Diff #16601)

The set iterator supports only const access to elements, therefore I need S = segmentAt(I) for const_cast.

Set iterators are read-only, because changing a set element may potentially affect is order in the set. Thus, we must be careful to change the segments in a way that preserves their order. Here it is no problem, because these algorithms were designed to take care of correct ordering.

232 ↗(On Diff #16601)

Yes, it works. I just need to forward declare CalcLiveRangeUtilVector .

320 ↗(On Diff #16601)

Done.

874 ↗(On Diff #16601)

Done

lib/CodeGen/LiveIntervalAnalysis.cpp
299 ↗(On Diff #16601)

Done

qcolombet accepted this revision.Dec 11 2014, 11:13 AM
qcolombet edited edge metadata.

Hi Vaidas,

LGTM.

Thanks,
-Quentin

This revision is now accepted and ready to land.Dec 11 2014, 11:13 AM

Does anyone have further comments, or is the patch now ready to be merged?

Regards,
Vaidas

atrick accepted this revision.Dec 16 2014, 8:53 AM
atrick edited edge metadata.

RubberStamp: I'm happy with the reviews and the code looks reasonable. Thanks reviewers.

MatzeB added a subscriber: MatzeB.Dec 17 2014, 3:56 PM

Sorry for coming so late to the review, no need to address these issues before commit.

  • Why is the SegmentSet part of the LiveRange? My understanding is that you only use it during initial live calculation anyway, and these calculations are done one another for each live range. So I'd assume it is enough to have the set as a member of the LiveRangeCalc class and not having an extra pointer in each LiveRange.
  • for the createDeadDefs() part it may be interesting to simply append the new segments to the vector in any order and later run a sorting algorithm which also eliminates duplicates on the vector. This should reduce the O(n**2) complexity to O(n log n) and I'd imagine it to perform well in the case where only a few segments are used.
  • It would be nice if the decision on whether to use the set or sorted vector based techniques could be based on the actual number of Defs/Uses of a register (deciding on whether it is a physreg seems a bit arbitrary to me although it's probably a good heuristic in practice). Unfortunately I don't see an easy way to determine the actual number of defs/uses for a vreg as that is only maintained as a double linked list without a counter at the moment, so it is probably okay as is.

Hi Matthias,

Thanks for your comments!

Why is the SegmentSet part of the LiveRange? My understanding is that you only use it during initial live calculation anyway, and these calculations are done one another for each live range. So I'd assume it is enough to have the set as a member of the LiveRangeCalc class and not having an extra pointer in each LiveRange.

At first I also thought that I could move the optimizations to LiveRangeCalc, but it is more difficult than it may appear. The methods addSegment and extendInBlock are used also outside the LiveRangeCalc, so I cannot move them completely out of LiveRange. Besides, the LiveRangeCalc is also used in the register allocation phase for splitting of live ranges. So we would need to maintain two modes of the algorithm in LiveRangeCalc in addition to a direct usage over LiveRange. Another difficulty is that the interface LiveRangeCalc is designed to work with multiple LiveRanges, so I cannot keep the segment set as a member of LiveRangeCalc, but rather pass it as an additional parameter from method to method. In some places the LiveRangeCalc uses the LiveRange indirectly over LiveRangeUpdater, so I would need to propagate the segment set to LiveRangeUpdater too or completely rewrite the code using the updater. It is all possible, but would require even more rewriting, and I don't really have a deep knowledge of the algorithm so that I would dare to rewrite it.

for the createDeadDefs() part it may be interesting to simply append the new segments to the vector in any order and later run a sorting algorithm which also eliminates duplicates on the vector. This should reduce the O(n**2) complexity to O(n log n) and I'd imagine it to perform well in the case where only a few segments are used.

Note that createDeadDef is also used outside the LiveIntervals phase, so we would still a version of createDeadDef that preserves the invariant of LiveRange.

It would be nice if the decision on whether to use the set or sorted vector based techniques could be based on the actual number of Defs/Uses of a register (deciding on whether it is a physreg seems a bit arbitrary to me although it's probably a good heuristic in practice). Unfortunately I don't see an easy way to determine the actual number of defs/uses for a vreg as that is only maintained as a double linked list without a counter at the moment, so it is probably okay as is.

Yet another approach would also be to rely always on the segment set for calculation of *all* live ranges in the LiveIntervals pass. Calculation of small live ranges is somewhat faster with vector, but it is only slightly faster and in that case the LiveIntervals pass runs very quickly anyway. I did not found examples with really large ranges for virtual registers, so I made a conservative decision to keep the computation of virtual registers as it is, because it is hard to choose a heuristic without a possibility to test it. But if anyone have some interesting examples I would be glad to evaluate them. Note that keeping the vector version of the algorithms is more important for later processing of the live ranges during register allocation, because insert operations are then not dominant anymore, and other operations work faster on the vector.

Best regards,
Vaidas

Hi all,

I agree with Vaidas that moving the set outside of the LiveRange may not be that easy.

Regarding the complexity improvements:

for the createDeadDefs() part it may be interesting to simply append the new segments to the vector in any order and later run a sorting algorithm which also eliminates duplicates on the vector. This should reduce the O(n**2) complexity to O(n log n) and I'd imagine it to perform well in the case where only a few segments are used.

I am not so sure that would be beneficial, indeed, we would have to do this sorting for each live range, i.e., O(m n log n) and I suspect that the (log n) factor will be bigger than the cost of the set for the cases we try to optimize.

Thanks,
-Quentin

As I said: Don't let my comments stop you from committing, fixing the performance regressions right now is important, cleaning the code up/avoiding the extra pointer can be dealt with later I guess.

Happy New Year to everyone!

Is there still an interest to merge this patch?

Best regards,
Vaidas

PCB added a subscriber: PCB.Jan 30 2015, 2:25 AM

Is this change about to be merged or is some input from us still required?

Since I am leaving SAP by the end of this next week, please contact Philipp Becker (reviewer PCB) if you have any further questions about that change.

Kind regards,
Vaidas

gasiunas updated this revision to Diff 19481.Feb 6 2015, 6:04 AM
gasiunas edited edge metadata.

Rebased on the latest llvm version and retested. It can be now commited.

Regards,
Vaidas

This revision was automatically updated to reflect the committed changes.