This is an archive of the discontinued LLVM Phabricator instance.

[LiveVariables] Switch Kill/Defs sets to be `DenseSet`
ClosedPublic

Authored by davide on May 10 2017, 8:20 PM.

Details

Summary

The testcase in https://bugs.llvm.org/show_bug.cgi?id=32984 went from 3 minutes to 10 minutes after a change that made the LoopUnroll pass more aggressive (increasing the threshold).

-time-passes reveals large part of the time is spent in PHI Elimination (in the backend).
My profiling shows all the time of PHI elimination goes to llvm::LiveVariables::addNewBlock. This is because we keep Defs/Kills registers in a SmallSet, and lvm/ADT/SmallSet.h -> VIterator vfind(const T &V); is O(N).

Switching to a DenseSet reduces the time spent in the pass from 297 seconds to 97 seconds. Profiling still shows a lot of time is spent iterating the data structure, so I guess there's room for improvement.

I tried a SparseBitVector and it's slightly slower than the DenseMap solution. I wanted to try a BitVector but I'm not sure I have an upper bound on the number of elements.

Diff Detail

Repository
rL LLVM

Event Timeline

davide created this revision.May 10 2017, 8:20 PM

+ @mzolotukhin/ @Gerolf (who showed interest in compile time regressions)

MatzeB edited edge metadata.May 11 2017, 10:06 AM

Not much to say about the patch itself which looks fine.

I wanted to try a BitVector but I'm not sure I have an upper bound on the number of elements.

An upper bound for virtual register numbers would be MachineRegisterInfo::getNumVirtRegs() (and TargetRegisterInfo::getNumReg() for physical registers though I think this set is only about vregs).

If this turns out to be more common and we would want to address this on a more fundamental level, this is what we could do:

  • Finally finish the TwoAddressInstructionPass rewrite; LiveVariables has been deprecated for a long time, we just cannot throw it out because of this one last pass.
  • Easier fix: The critical edge splitting appears to be a part of PHIElimination that happens before anything else. If we would pull that out into a separate pass and perform it before computing LiveVariables, then we wouldn't need to go through the trouble of updating LiveVariables.
MatzeB accepted this revision.May 11 2017, 10:11 AM

And I should mark this green. LGTM

This revision is now accepted and ready to land.May 11 2017, 10:11 AM
  • Easier fix: The critical edge splitting appears to be a part of PHIElimination that happens before anything else. If we would pull that out into a separate pass and perform it before computing LiveVariables, then we wouldn't need to go through the trouble of updating LiveVariables.

Reading the code a bit more it seems that the critical edge splitting logic is already depending on liveness information so it's not that easy to fix.

And also slightly related: For the sort of queries that LiveVariables supports "Fast liveness checking for ssa-form programs" / https://doi.org/10.1145/1356058.1356064 would be a perfect fit. However given that the long term goal would be to replace all of this with LiveIntervalAnalysis, it's probably not worth pursuing that here.

And also slightly related: For the sort of queries that LiveVariables supports "Fast liveness checking for ssa-form programs" / https://doi.org/10.1145/1356058.1356064 would be a perfect fit. However given that the long term goal would be to replace all of this with LiveIntervalAnalysis, it's probably not worth pursuing that here.

Coincidence, I was right now reading the same paper https://hal.inria.fr/inria-00192219/document (Google helped me finding it).
If this still turns out to be a problem for my org (the original bugreport is reduced from a case we hit in the wild) I'll take a look at it, but it's nowhere near the top of my list, unfortunately.

This revision was automatically updated to reflect the committed changes.
sanjoy added a subscriber: sanjoy.May 11 2017, 5:10 PM

How big does the set get? Did you try using SmallDenseSet instead?

How big does the set get? Did you try using SmallDenseSet instead?

I don't have numbers off-hand, but I'll run my experiments and follow up here.

How big does the set get? Did you try using SmallDenseSet instead?

Hi Sanjoy, I tried to switch to SmallDenseSet and that slows down things a little
2min38sec -> 2min55sec`

My set doesn't grow too much, it's < 100 elements, roughly.

Feel free to follow-up if you have any questions, and thanks for your comment!

Davide

In particular, the time spent in EliminatePHI goes from

92.7229 ( 59.4%)   0.0100 (  1.1%)  92.7329 ( 59.0%)  92.7333 ( 59.0%)  Eliminate PHI nodes for register allocation

to

107.5286 ( 62.0%)   0.0090 (  1.1%)  107.5375 ( 61.7%)  107.5381 ( 61.7%)  Eliminate PHI nodes for register allocation