This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Use union-find set and doubly linked list in Call-Chain Clustering (C³) heuristic
ClosedPublic

Authored by MaskRay on Apr 28 2018, 2:50 PM.

Details

Summary

Before, SecToClusters[*] was used to track the belonged cluster.
During a merge (From -> Into), every element of From has to be updated.
Use a union-find set to speed up this use case.

Also, replace std::vector<int> Sections; with a doubly-linked
pointers: int Next, Prev;

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.Apr 28 2018, 2:50 PM

Hi Raphael, sorry for hearing that... But may I beg your last review at this revision, considering it was created before your announcement?

Sorry for the delay. What change in performance does this give?

MaskRay added a comment.EditedJun 16 2018, 11:02 AM

Glad to know you are still around :) (your IRC nick seems to be immune to PRIVMSG.. usermode +g or +j? nevermind)

This revision mostly optimizes the cluster number reassignment loop (expected O(n log n), but can be O(n^2) at worst):

// line 210
    for (int SI : C.Sections)
      SecToCluster[SI] = PredC;

to union-find set (fast in practice, O(n log n) at worst (not union-by-rank) but expected O(n α(n)) (I believe)) + list splice (linear)

grimar added a subscriber: grimar.Jun 18 2018, 6:41 AM

I mean performance of actual cases. I'm interested in if the cache effects of list vs vector matter.

std::vector<int> Sections; This also calls malloc when the first element is pushed.

MaskRay added a comment.EditedJun 19 2018, 9:32 PM

I want to use a data structure supporting O(1) splice and can be pre-allocated (1 malloc call instead of N calls) but I cannot find one (and I don't think hand-rolled next/prev doubly linked list justifies the amount of code) :( vector<int> isn't good either (ok you can argue you could replace it with a SmallVector but note the all intermediate vectors are counted in memory usage)

MaskRay updated this revision to Diff 168925.Oct 9 2018, 5:43 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a subscriber: grimar.

Update description and also serves as a ping

MaskRay updated this revision to Diff 168926.Oct 9 2018, 5:45 PM
MaskRay edited the summary of this revision. (Show Details)
MaskRay removed a reviewer: dblaikie.

clear() -> not destructed

ruiu added inline comments.Oct 12 2018, 2:39 PM
ELF/CallGraphSort.cpp
161 ↗(On Diff #168926)

I perhaps didn't spent enough effort to understand this function, but it feels something is wrong with this function. The function name implies this function "finds" something, but looks like it actually mutates a given vector.

MaskRay updated this revision to Diff 169497.Oct 12 2018, 2:48 PM

Add a comment about findLeader which uses path-halving technique

ruiu added inline comments.Oct 12 2018, 2:52 PM
ELF/CallGraphSort.cpp
161 ↗(On Diff #168926)

I don't think any function that has a side effect (other than caching a result) have a function name that starts with "find". "Find" implies a pure function. Please give it a different name.

MaskRay updated this revision to Diff 169502.Oct 12 2018, 3:08 PM

Rename findLeader to getLeader (not sure it conveys the meaning better that it is mutable. I'm not good at naming things)

ruiu added a comment.Oct 12 2018, 3:13 PM

Honestly I do not understand the code. What is this code doing? Can you explain it for me?

Honestly I do not understand the code. What is this code doing? Can you explain it for me?

Sure. A cluster is a list of sections (C.Sections). This loop iterates over sections by descending density and tries grouping sections together (grouped sections are linearly ordered for locality).

The loop body needs a way to find the cluster a section belongs. What it used to do is the lookup table SecToCluster which is maintained after each merge operation. The merge operation is slow (thus the old comment "NOTE: Consider using a disjoint-set to track section -> cluster mapping"). Since a section can only belong to a cluster and clusters are disjoined, union-find algorithm naturally fits here.

I use the path-halving technique (one-liner Leaders[V] = Leaders[Leaders[V]];) to decrease complexity and it is extremely simple to implement. Other techniques can achieve the same goal but they would have more code (https://www.cs.princeton.edu/courses/archive/spring13/cos423/lectures/UnionFind.pdf).

MaskRay updated this revision to Diff 169507.Oct 12 2018, 3:42 PM

Rename old SortedSec as the name was confusing (it referrs to cluster but a name about section was chosen)
Change B < A to A > B to be consistent with the next stable_sort condition

Greeting from a dev meeting attendee:)

Do you have real world performance numbers (both time and memory) for this? Computational complexity isn't always the best predictor of actual performance because of e.g. cache characteristics (as @Bigcheese mentioned).

Do you have real world performance numbers (both time and memory) for this? Computational complexity isn't always the best predictor of actual performance because of e.g. cache characteristics (as @Bigcheese mentioned).

I think I've made it clear in the description that std::vector does not do a better job here. (the vectors are not even destroyed to free intermediate memory usage)

Pretty sure I understand the algorithm now, but I still want to see what performance impact it has to see if it's worth the (minor) complexity. I'll benchmark it on some large links and see what happens.

MaskRay updated this revision to Diff 191058.Mar 18 2019, 1:53 AM
MaskRay retitled this revision from [ELF] Use union-find set in Call-Chain Clustering (C³) heuristic to improve worst-case time complexity. to [ELF] Use union-find set and doubly linked list in Call-Chain Clustering (C³) heuristic.
MaskRay edited the summary of this revision. (Show Details)

std::list<int> Sections; -> int Next, Prev;

Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2019, 1:53 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
MaskRay updated this revision to Diff 221897.Sep 26 2019, 1:25 AM

Rebase after one year since there is not interest in maintaining Call-Chain Clustering.

This file has not been touched much.

  • Adapt variable renaming
  • Use DenseMap::try_emplace
MaskRay updated this revision to Diff 222091.Sep 27 2019, 12:44 AM

mergeclusters -> mergeClusters

This revision is now accepted and ready to land.Sep 27 2019, 10:38 AM
This revision was automatically updated to reflect the committed changes.