This is an archive of the discontinued LLVM Phabricator instance.

[dsymutil] Convert recursion in lookForDIEsToKeep into worklist.
ClosedPublic

Authored by JDevlieghere on Jul 3 2018, 2:55 PM.

Details

Summary

The functions lookForDIEsToKeep and keepDIEAndDependencies can have some very deep recursion. This tackles part of this problem by removing the recursion from lookForDIEsToKeep by turning it into a worklist. The difficulty in doing so is the computation of incompleteness, which depends on the incompleteness of its children. To compute this, we insert "continuation markers" into the worklist. This informs the work loop to (re)compute the Incompleteness of the DIE associated with it (i.e. the parent of the last processed DIE).

Diff Detail

Repository
rL LLVM

Event Timeline

JDevlieghere created this revision.Jul 3 2018, 2:55 PM
JDevlieghere retitled this revision from [WIP][dsymutil] Convert recursion in lookForDIEsToKeep into worklist. to [dsymutil] Convert recursion in lookForDIEsToKeep into worklist. .Jul 5 2018, 8:01 AM
friss added a comment.Jul 5 2018, 8:32 AM

This looks like it should work and keep the current semantics. Did you find that this gives identical output as the current algorithm in your testing?

llvm/tools/dsymutil/DwarfLinker.cpp
877 ↗(On Diff #153986)

I would add a comment here explaining that the worklist is LIFO. It took me a moment to realize why you would enqueue the continuation first.

878–880 ↗(On Diff #153986)

Can you use something like make_reverse_iterator to avoid copying the list?

JDevlieghere marked an inline comment as done.
  • Add FIFO comment.
JDevlieghere marked an inline comment as done.Jul 5 2018, 9:20 AM

This looks like it should work and keep the current semantics. Did you find that this gives identical output as the current algorithm in your testing?

The output is not always binary identical. For example for clang we emit more than we did before (i.e. we consider more things incomplete). So far I don't have a good explanation for this...

llvm/tools/dsymutil/DwarfLinker.cpp
878–880 ↗(On Diff #153986)

The iterator is a forward iterator (it uses getSibling to advance) so unfortunately not.

JDevlieghere marked an inline comment as done.
  • Improve comments.
  • Remove redundant continuation at beginning. Wwe always add the continuation if we add the DIE's children to the worklist.
  • Use SmallVectors to improve performance. Still this path causes a ~5% performance regression (avg over 5 runs).
dblaikie added inline comments.
llvm/tools/dsymutil/DwarfLinker.cpp
883–884 ↗(On Diff #154325)
for (... : reverse(Children))

perhaps?

878–880 ↗(On Diff #153986)

I think, maybe, getPreviousSibling could be implemented without great cost & this could be made into a bidirectional iterator? (looks like getSibling works by scanning forward in the list of DIEs until it finds another DIE with the same depth (or bailing out if it finds a DIE with a shallower depth) - the reverse search coudl be done the same way, I think?)

JDevlieghere marked an inline comment as done.Jul 11 2018, 3:36 AM
JDevlieghere added inline comments.
llvm/tools/dsymutil/DwarfLinker.cpp
878–880 ↗(On Diff #153986)

Makes sense! Implemented your suggestion in D49173

JDevlieghere marked an inline comment as done.
  • Feedback Dave

With P8092 applied to ToT dsymutil I'm getting binary identical results between the recursive and worklist implementation. I'm not sure why updating the incompleteness after every child matters, especially since we end up being more conservative.

friss added inline comments.Jul 11 2018, 6:33 PM
llvm/tools/dsymutil/DwarfLinker.cpp
882 ↗(On Diff #154965)

I think this is broken. A reverse_iterator operator* is implemented as such:

reference operator*() const {_Iter __tmp = current; return *--__tmp;}

(_Iter is the base iterator type).
Unfortunately, the DWARFDie iterators contain a DWARFDie (rather than holding a reference), so this returns a reference inside a temporary.
DWARFDies are just temporary helper structures to make sure we always keep a DIE and its compile unit in sync, the iterators can't point to a permanent copy. I think you should just abandon the range-based for-loop and use getPreviousSibling() directly.

JDevlieghere added inline comments.Jul 12 2018, 7:41 AM
llvm/tools/dsymutil/DwarfLinker.cpp
882 ↗(On Diff #154965)

I created D49234 as this can also occur with the forward iterator.

  • Update incompleteness after *every* child.
  • Don't iterate over all the children but update incompleteness based on new information.
JDevlieghere added a comment.EditedJul 12 2018, 11:06 AM

I ran the latest version of this patch on clang. Performance-wise this is the best I could do, which is a 4.32% regression with byte-identical output to what is currently generated by trunk. (i.e. without P8092)

Worklist

md5 hash: f6679a4f5fa759a1d3f1a4595b9945ef

runrealusersys
180.64129.788.11
280.83129.818.12
380.86129.588.18
480.70129.608.05
581.20129.968.24
AVG80.85129.758.14

Recursion

md5 hash: f6679a4f5fa759a1d3f1a4595b9945ef

runrealusersys
177.69126.407.24
277.56126.127.20
377.53126.247.11
477.28125.907.16
577.42126.057.19
AVG77.50126.147.18
friss accepted this revision.Jul 23 2018, 9:58 AM

Once the iterator story is sorted out, this is good to go.

This revision is now accepted and ready to land.Jul 23 2018, 9:58 AM
This revision was automatically updated to reflect the committed changes.
vsk added a subscriber: vsk.Aug 1 2018, 1:49 PM
vsk added inline comments.
llvm/trunk/tools/dsymutil/DwarfLinker.cpp
760

nit, its

772

Could reducing WorklistItem's size improve the performance at all? If so, would packing Flags and IsContinuation into a bitfield and storing the child's DIEIndex be a way of accomplishing that?