The functions lookForDIEsToKeep and keepDIEAndDependencies can have some very deep recursion. This tackles part of this problem by removing the recursion from lookForDIEsToKeep by turning it into a worklist. The difficulty in doing so is the computation of incompleteness, which depends on the incompleteness of its children. To compute this, we insert "continuation markers" into the worklist. This informs the work loop to (re)compute the Incompleteness of the DIE associated with it (i.e. the parent of the last processed DIE).
Details
Diff Detail
Event Timeline
This looks like it should work and keep the current semantics. Did you find that this gives identical output as the current algorithm in your testing?
llvm/tools/dsymutil/DwarfLinker.cpp | ||
---|---|---|
877 | I would add a comment here explaining that the worklist is LIFO. It took me a moment to realize why you would enqueue the continuation first. | |
878–880 | Can you use something like make_reverse_iterator to avoid copying the list? |
The output is not always binary identical. For example for clang we emit more than we did before (i.e. we consider more things incomplete). So far I don't have a good explanation for this...
llvm/tools/dsymutil/DwarfLinker.cpp | ||
---|---|---|
878–880 | The iterator is a forward iterator (it uses getSibling to advance) so unfortunately not. |
- Improve comments.
- Remove redundant continuation at beginning. Wwe always add the continuation if we add the DIE's children to the worklist.
- Use SmallVectors to improve performance. Still this path causes a ~5% performance regression (avg over 5 runs).
llvm/tools/dsymutil/DwarfLinker.cpp | ||
---|---|---|
878–880 | I think, maybe, getPreviousSibling could be implemented without great cost & this could be made into a bidirectional iterator? (looks like getSibling works by scanning forward in the list of DIEs until it finds another DIE with the same depth (or bailing out if it finds a DIE with a shallower depth) - the reverse search coudl be done the same way, I think?) | |
883–884 | for (... : reverse(Children)) perhaps? |
With P8092 applied to ToT dsymutil I'm getting binary identical results between the recursive and worklist implementation. I'm not sure why updating the incompleteness after every child matters, especially since we end up being more conservative.
llvm/tools/dsymutil/DwarfLinker.cpp | ||
---|---|---|
882 | I think this is broken. A reverse_iterator operator* is implemented as such: reference operator*() const {_Iter __tmp = current; return *--__tmp;} (_Iter is the base iterator type). |
- Update incompleteness after *every* child.
- Don't iterate over all the children but update incompleteness based on new information.
I ran the latest version of this patch on clang. Performance-wise this is the best I could do, which is a 4.32% regression with byte-identical output to what is currently generated by trunk. (i.e. without P8092)
Worklist
md5 hash: f6679a4f5fa759a1d3f1a4595b9945ef
run | real | user | sys |
---|---|---|---|
1 | 80.64 | 129.78 | 8.11 |
2 | 80.83 | 129.81 | 8.12 |
3 | 80.86 | 129.58 | 8.18 |
4 | 80.70 | 129.60 | 8.05 |
5 | 81.20 | 129.96 | 8.24 |
AVG | 80.85 | 129.75 | 8.14 |
Recursion
md5 hash: f6679a4f5fa759a1d3f1a4595b9945ef
run | real | user | sys |
---|---|---|---|
1 | 77.69 | 126.40 | 7.24 |
2 | 77.56 | 126.12 | 7.20 |
3 | 77.53 | 126.24 | 7.11 |
4 | 77.28 | 125.90 | 7.16 |
5 | 77.42 | 126.05 | 7.19 |
AVG | 77.50 | 126.14 | 7.18 |
I would add a comment here explaining that the worklist is LIFO. It took me a moment to realize why you would enqueue the continuation first.