This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Sorting nodes in a cycle of profiled call graph.
ClosedPublic

Authored by hoy on Nov 18 2021, 4:14 PM.

Details

Summary

For nodes that are in a cycle of a profiled call graph, the current order the underlying scc_iter computes purely depends on how those nodes are reached from outside the SCC and inside the SCC, based on the Tarjan algorithm. This does not honor profile edge hotness, thus does not gurantee hot callsites to be inlined prior to cold callsites. To mitigate that, I'm adding an extra sorter on top of scc_iter to sort scc functions in the order of callsite hotness, instead of changing the internal of scc_iter.

Sorting on callsite hotness can be optimally based on detecting cycles on a directed call graph, i.e, to remove the coldest edge until a cycle is broken. However, detecting cycles isn't cheap. I'm using an MST-based approach which is faster and appear to deliver some performance wins.

Diff Detail

Event Timeline

hoy created this revision.Nov 18 2021, 4:14 PM
hoy requested review of this revision.Nov 18 2021, 4:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2021, 4:14 PM
hoy updated this revision to Diff 388993.Nov 22 2021, 12:11 PM

Do the sorting for non-CS AutoFDO as well.

wenlei added inline comments.Nov 24 2021, 1:27 PM
llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
48

Can target still be the same, in which case do we want to check source name?

181

This is only for sorting SCC nodes in an SCC and it doesn't order all nodes in CG, so perhaps it's better to call it SCCNodeSorter. This also feels like a generic functionality that doesn't tie to ProfiledCallGraph, so we can perhaps make it a generic template similar to scc_iterator, e.g. scc_member_iterator.

llvm/lib/Transforms/IPO/ProfiledCallGraph.cpp
127 ↗(On Diff #388993)

This is not trivial, some comment explain 1) prerequisite and scope, i.e. within scc for scc members; 2) the high level objective of the sorting, i.e. visit hot edges first 3) the implementation and how that is guaranteed would be helpful.

Either comment on the sort function or the class definition.

138 ↗(On Diff #388993)

nit: make this an explicit insert for readability

141–146 ↗(On Diff #388993)

There's another ProfiledCallGraphEdgeComparer defined inside ProfiledCallGraphNode and that one also has tie breaker for equal weights. Perhaps we need a single global one?

157 ↗(On Diff #388993)

compute the Minimum Weight Spanning Tree

I think you meant *Maximum* weight spanning tree.

using union-find algorithm.

on a high level this is Kruskal's algorithm, union-find is just one way to detect cycles.

172 ↗(On Diff #388993)

Comment on how this makes sure we visit hot edges first

hoy added inline comments.Nov 29 2021, 3:58 PM
llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
48

Edges to be compared are from same caller, so the source should be the same. Call targets can also be the same since a caller may have different callsites to the same callee. The current model allows such edges with same callee but different weights to coexist. I think we should optimize it to just keep the edge with largest weight.

181

A generic template class sounds better. That will require the GraphTraits definition for nodes to come with a weighted edge list. Nodes with that can use this sorter.

llvm/lib/Transforms/IPO/ProfiledCallGraph.cpp
127 ↗(On Diff #388993)

Comment added.

138 ↗(On Diff #388993)

This is specifically used to construct a NodeInfo object in place. An insert operation will involve a copy construction which invalidate the initial value of the Group field which should be this. An emplace operation may be able to construct the object in place but I haven't figured out how to pass a none parameter into to.

141–146 ↗(On Diff #388993)

This one is used to sort edges. That one is used to deduplicate edges, ie. two callsites to same target with same weight are deduplicated, thus is more expansive.

157 ↗(On Diff #388993)

Maximum Weight Spanning tree is more accurate.

172 ↗(On Diff #388993)

Comment added.

hoy updated this revision to Diff 390511.Nov 29 2021, 3:58 PM

Updating D114204: [CSSPGO] Sorting nodes in a cycle of profiled call graph.

hoy updated this revision to Diff 390539.Nov 29 2021, 6:06 PM

Updating D114204: [CSSPGO] Sorting nodes in a cycle of profiled call graph.

hoy updated this revision to Diff 390540.Nov 29 2021, 6:10 PM

Updating D114204: [CSSPGO] Sorting nodes in a cycle of profiled call graph.

wenlei added inline comments.Nov 29 2021, 6:34 PM
llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
48

Ok, then add a comment please so others can understand why only weight is used and tier breaker is not needed.

llvm/lib/Transforms/IPO/ProfiledCallGraph.cpp
138 ↗(On Diff #388993)

I thought emplace() without a parameter should just work, no? but yeah, I meant emplace.

llvm/lib/Transforms/IPO/SampleProfile.cpp
177

nit: sort-profiled-scc-member?

hoy added inline comments.Nov 29 2021, 7:59 PM
llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
48

comment added.

llvm/lib/Transforms/IPO/ProfiledCallGraph.cpp
138 ↗(On Diff #388993)

No, neither NodeInfoMap.emplace(Node); nor NodeInfoMap.emplace(Node, {}); works.

llvm/lib/Transforms/IPO/SampleProfile.cpp
177

fixed.

hoy updated this revision to Diff 390556.Nov 29 2021, 7:59 PM

Addressing Wenlei's comment.

wenlei added inline comments.Nov 29 2021, 8:08 PM
llvm/lib/Transforms/IPO/ProfiledCallGraph.cpp
138 ↗(On Diff #388993)

I think what you need is NodeInfoMap.emplace(Node, NodeInfo()). See https://godbolt.org/z/Wrx61rYcY.

hoy added inline comments.Nov 29 2021, 8:16 PM
llvm/lib/Transforms/IPO/ProfiledCallGraph.cpp
138 ↗(On Diff #388993)

That’ll construct a temp object whose Groups points to itself. The temp object will be used to copy construct the final object whose Groups will point to the temp object which isn’t what we want. Also this still cannot avoid a copy construction.

wenlei accepted this revision.Nov 29 2021, 8:35 PM

lgtm, thanks.

llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
152–153

Do you actually need to erase/insert, instead of updating the weight?

llvm/lib/Transforms/IPO/ProfiledCallGraph.cpp
138 ↗(On Diff #388993)

Ok, now I get what you're trying to do.. copy-constructor would be a move-constructor but the this pointer won't be updated still which is the trick you need. I guess it's fine with a comment. Thanks for clarification.

This revision is now accepted and ready to land.Nov 29 2021, 8:35 PM
hoy added inline comments.Nov 29 2021, 8:53 PM
llvm/include/llvm/Transforms/IPO/ProfiledCallGraph.h
152–153

Weight is a field of the key so it is immutable.

This revision was automatically updated to reflect the committed changes.