This is an archive of the discontinued LLVM Phabricator instance.

[LCG] Update and expand comments to properly document the design motivation, tradeoffs, and constraints.
Needs ReviewPublic

Authored by chandlerc on Jul 13 2016, 1:52 AM.

Download Raw Diff

Details

Reviewers

davidxl
• dberlin
hfinkel
silvas
sanjoy

Summary

This is essentially attempting to embed the living document parts of
a design document into the doxygen comments for the analysis. I can
separate these docs into a restructured text file, but personally
I prefer keeping it as close to the code as possible.

The first section here outlines the high-level motivation, constraints,
and resulting tradeoffs of the design. These are expressed in the
file-level comment as they don't *directly* pertain to the API itself.

The second section is the class comment that tries to give a more
comprehensive but still high-level description of the implementation
strategy for the design.

This doesn't (yet) update the mutation API comments. I'd like to do that
too, but I think its a bit lower priority and I want to try to draw some
ASCII-art diagrams to go with it which will take a while. I didn't want
the higher level stuff to wait on that.

The only really big section of a traditional design document that isn't
really covered here are detailed discussions of the alternatives
considered. I'm open to suggestions about whether that's really useful,
and where within this it would be useful. My personal inclination is to
discuss status, process, and alternatives in commit logs rather than
code, but I'm happy to talk about other places where such discussion can
live.

Also, thanks to Daniel Jasper who provided a ton of informal review for
me as someone who had no idea what any of this did to make sure I wasn't
assuming too much.

Diff Detail

Event Timeline

chandlerc updated this revision to Diff 63783.Jul 13 2016, 1:52 AM

chandlerc retitled this revision from to [LCG] Update and expand comments to properly document the design motivation, tradeoffs, and constraints..

chandlerc updated this object.

chandlerc added reviewers: sanjoy, hfinkel, • dberlin, davidxl, silvas.

chandlerc added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptJul 13 2016, 1:52 AM

Some initial comments.

include/llvm/Analysis/LazyCallGraph.h
35	Something like outlining will also add new nodes. You may want to mention that somewhere.
117	You may want to clarify the exact sense in which "potential" is meant here. The types of edges in a traditional call graph could also be considered as "potential". Maybe one wording could be "current direct calls" and "references to other functions that might be turned into 'current direct calls' during static optimization"
138–146	You may want to spell out what you mean by "pruned" here (or is this common terminology I'm unaware of?). Since you define the exact circumstances in which an edge exists, I'm not sure that there's much benefit to saying "pruned" here though.
140	Is the "call" graph not a subgraph of the "reference" graph? If it isn't, you probably want to show a concrete example of a situation where it isn't.
148–149	small nit: this should probably be "escaped into external global variables"
153	Does this sentence mean it uses two implementations? That it uses one which is both straight-forward and lazy? I don't see what "straight-forward" adds to this sentence nor how an implementation of Tarjan's algorithm that is lazy can be considered "straight-forward". Is there a specific thing you're trying to communicate?
172	Tarjan's algorithm per se doesn't deal with updates and so saying that the updates are handled with"using straightforward versions of Tarjan's SCC finding algorithm" doesn't provide much information. Could you maybe give a concrete explanation? E.g. that you perform a DFS from ... reusing the LowLink's that have already been computed in such and such a way. Then you can draw parallels to Tarjan's algorithm.
174	Could you elaborate here about what "impacted" / "potentially impacted" means for the various kinds of updates?

Update with various improvements including based on code review.

Fix another typo.

davidxl added inline comments.Jul 15 2016, 10:31 AM

include/llvm/Analysis/LazyCallGraph.h
49	Perhaps describing how this is handled (with an example?)
60	Perhaps add an example here to show 'this is accomplished by ..'
72	This depends how secondary order constraints are formed (i.e., reference edges to potential targets are originated from callers of indirect calls or from constructor methods that reference vtables etc). Perhaps an example to demonstrate this?
136	This does not seem precise. The SCC nodes in RefGraph does not necessarily form a DAG (only RefSCCs), there can be cycles.
142	Is Laziness essential to the algorithm? If not, we don't need to emphasize it here -- just a footnote at the end should be enough.
150	Perhaps just define 'roots' as those callgraph nodes that can potentially called by external functions without body of IR that are available to the compilation.
167	Probably needs more description of updates : various scenarios (small example) and description on how each scenario is handled.

Inactive, as far as I can tell.

silvas resigned from this revision.Mar 25 2020, 6:27 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

LazyCallGraph.h

182 lines

Diff 64098

include/llvm/Analysis/LazyCallGraph.h

	//===- LazyCallGraph.h - Analysis of a Module's call graph ------- C++ --===//			//===- LazyCallGraph.h - Analysis of a Module's call graph ------- C++ --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// \file			/// \file
	///			///
	/// Implements a lazy call graph analysis and related passes for the new pass			/// Implements a lazy call graph analysis for the new pass manager.
	/// manager.
	///			///
	/// NB: This is not a traditional call graph! It is a graph which models both			/// This analysis is used to decompose an LLVM Module into components useful
	/// the current calls and potential calls. As a consequence there are many			/// for a specific subset of interprocedural optimization (IPO) techniques:
	/// edges in this call graph that do not correspond to a 'call' or 'invoke'
	/// instruction.
	///			///
	/// The primary use cases of this graph analysis is to facilitate iterating			/// 1) Call site based pairwise interprocedural transforms such as inlining,
	/// across the functions of a module in ways that ensure all callees are			/// argument promotion, interprocedural constant propagation, etc.
	/// visited prior to a caller (given any SCC constraints), or vice versa. As			/// 2) Reachable set interprocedural analyses and inference such as function
	/// such is it particularly well suited to organizing CGSCC optimizations such			/// attribute inference, exception handling pruning, etc.
	/// as inlining, outlining, argument promotion, etc. That is its primary use			///
	/// case and motivates the design. It may not be appropriate for other			/// Both of these benefit from the optimization of each function in the program
	/// purposes. The use graph of functions or some other conservative analysis of			/// in post-order to ensure that callees are fully optimized when analyzing
	/// call instructions may be interesting for optimizations and subsequent			/// their callers and the call edge. To support this, we need an analysis that
	/// analyses which don't work in the context of an overly specified			/// provides a post-order traversal of the strongly connected components (SCCs)
	/// potential-call-edge graph.			/// of program call graph.
				///
				/// However, during this post-order traversal, transformations in #1 as well
				/// the basic function-local optimizations can mutate and refine the graph.
				/// These graph mutations need to be immediately reflected to achieve both of
				/// the above goals. Both transformations (#1 and #2) and analyses (#2) benefit
				/// directly from a more precise and refined call graph structure, so we want
				/// to incorporate graph updates immediately during the traversal. Put
				/// differently, we want a post-order traversal of the graph including all
				/// updates and refinements made to it. There are two primary forms of call
				/// graph refinement we care about:
				silvasUnsubmitted Not Done Reply Inline Actions Something like outlining will also add new nodes. You may want to mention that somewhere. silvas: Something like outlining will also add new nodes. You may want to mention that somewhere.
				///
				/// a) Removing edges from the graph which provides smaller, more precise
				/// reachable sets and SCCs.
				/// b) Transforming indirect calls (that we must treat conservatively) into
				/// direct calls through constant propagation (typically this is thought of in
				/// the context of devirtualization).
				///
				/// These kinds of on-line graph updates make maintaining the graph
				/// significantly more complex in several ways. First, they require that the
				/// representation be updatable at all. They also require the graph update
				/// operations preserve enough information that any necessary changes to
				/// optimization techniques in #1 and #2 can be performed in a way that
				/// reflects the nature of the update. For example, if an update splits one SCC
				/// into three SCCs, we also need to know the post-order traversal of those new
				davidxlUnsubmitted Not Done Reply Inline Actions Perhaps describing how this is handled (with an example?) davidxl: Perhaps describing how this is handled (with an example?)
				/// SCCs.
				///
				/// The refinement in (b) introduces an unusual requirement on the graph. For
				/// optimizations such as those in category #1 above, the graph must provide an
				/// ordering that ensures (to the extent possible) that potential call edge
				/// target functions are traversed prior to any (b)-style refinement which
				/// could introduce an actual call edge to that function. This is essentially
				/// a secondary traversal order constraint in addition to the post-order over
				/// SCCs. This is accomplished by providing a layer of function reference edges
				/// in addition to the layer of explicit function call edges. Because
				/// function-local and pairwise interprocedural transformations primarily turn
				davidxlUnsubmitted Not Done Reply Inline Actions Perhaps add an example here to show 'this is accomplished by ..' davidxl: Perhaps add an example here to show 'this is accomplished by ..'
				/// an existing (possibly indirect) reference to a function into a direct call
				/// to a function, a post-ordering of this "reference graph" satisfies the
				/// additional constraint.
				///
				/// Further, even for transformations which refine indirect calls to direct
				/// calls without an intrinsic "reference" in the IR, we can force them to be
				/// based upon some encoding of such references. For example, a profile-guided
				/// indirect call promotion pass might insist on embedding the candidate
				/// targets from the profile into the IR as references to functions in order to
				/// have them be included in the above ordering constraint.
				///
				/// The extra ordering constraint also serves a secondary but closely related
				davidxlUnsubmitted Not Done Reply Inline Actions This depends how secondary order constraints are formed (i.e., reference edges to potential targets are originated from callers of indirect calls or from constructor methods that reference vtables etc). Perhaps an example to demonstrate this? davidxl: This depends how secondary order constraints are formed (i.e., reference edges to potential…
				/// purpose. It allows a pass manager to identify regions of the call graph
				/// which can be optimized independently from each other, even in the face of
				/// transformations which mutate the call graph. As a consequence, it can be
				/// used to drive safe parallelism across a subset of the interprocedural
				/// optimizations on a module of the IR.
				///
				/// This analysis also tries to form these graph structures in a cache-friendly
				/// way. As a consequence, it works hard to form the graph lazily as the
				/// traversal visits the functions in the module. It also needs the update
				/// algorithms described above to work in a context of a lazily formed graph
				/// rather than a complete graph.
	///			///
	/// To understand the specific rules and nature of this call graph analysis,			/// To understand the specific rules and nature of this call graph analysis,
	/// see the documentation of the \c LazyCallGraph below.			/// see the documentation of the \c LazyCallGraph below.
	///			///
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_ANALYSIS_LAZYCALLGRAPH_H			#ifndef LLVM_ANALYSIS_LAZYCALLGRAPH_H
	#define LLVM_ANALYSIS_LAZYCALLGRAPH_H			#define LLVM_ANALYSIS_LAZYCALLGRAPH_H
	Show All 16 Lines
	#include <utility>			#include <utility>

	namespace llvm {			namespace llvm {
	class PreservedAnalyses;			class PreservedAnalyses;
	class raw_ostream;			class raw_ostream;

	/// A lazily constructed view of the call graph of a module.			/// A lazily constructed view of the call graph of a module.
	///			///
	/// With the edges of this graph, the motivating constraint that we are			/// NB: This is not a traditional call graph! It is a graph which models both
	/// attempting to maintain is that function-local optimization, CGSCC-local			/// the current direct calls and references to functions which could be turned
				silvasUnsubmitted Done Reply Inline Actions You may want to clarify the exact sense in which "potential" is meant here. The types of edges in a traditional call graph could also be considered as "potential". Maybe one wording could be "current direct calls" and "references to other functions that might be turned into 'current direct calls' during static optimization" silvas: You may want to clarify the exact sense in which "potential" is meant here. The types of edges…
	/// optimizations, and optimizations transforming a pair of functions connected			/// into direct calls during optimization. As a consequence there are many
	/// by an edge in the graph, do not invalidate a bottom-up traversal of the SCC			/// edges in this call graph that do not correspond to a 'call' or 'invoke'
	/// DAG. That is, no optimizations will delete, remove, or add an edge such			/// instruction.
	/// that functions already visited in a bottom-up order of the SCC DAG are no			///
	/// longer valid to have visited, or such that functions not yet visited in			/// This analysis is designed to serve a complex set of constraints faced by
	/// a bottom-up order of the SCC DAG are not required to have already been			/// the optimizer when decomposing an LLVM Module for call-graph aware
	/// visited.			/// optimizations and analyses. See the file comment for the detailed
	///			/// motivations and context that leads to the particular design.
	/// Within this constraint, the desire is to minimize the merge points of the			///
	/// SCC DAG. The greater the fanout of the SCC DAG and the fewer merge points			/// The analysis builds a simple directed graph of calls with an edge between
	/// in the SCC DAG, the more independence there is in optimizing within it.			/// two functions if there exists at least one direct call from one to the
	/// There is a strong desire to enable parallelization of optimizations over			/// other. The analysis also provides a simple directed graph of references
	/// the call graph, and both limited fanout and merge points will (artificially			/// with an edge between two functions if there exists at least one
	/// in some cases) limit the scaling of such an effort.			/// (potentially transitive, through multiple layers of global constants)
	///			/// reference from one function to the other. A call edge is trivially also
	/// To this end, graph represents both direct and any potential resolution to			/// a reference edge, and so we nest the call graph's SCCs (\c SCC below)
	/// an indirect call edge. Another way to think about it is that it represents			/// within the reference graph's SCCs (\c RefSCC below). Put differently, the
	/// both the direct call edges and any direct call edges that might be formed			/// call graph is a subgraph of the reference graph, and thus the DAG of SCCs
	/// through static optimizations. Specifically, it considers taking the address			/// in the call graph is a subgraph of the DAG of SCCs in the reference graph.
				davidxlUnsubmitted Not Done Reply Inline Actions This does not seem precise. The SCC nodes in RefGraph does not necessarily form a DAG (only RefSCCs), there can be cycles. davidxl: This does not seem precise. The SCC nodes in RefGraph does not necessarily form a DAG (only…
	/// of a function to be an edge in the call graph because this might be			///
	/// forwarded to become a direct call by some subsequent function-local			/// Both of these graphs are directed graphs with potential cycles, and this
	/// optimization. The result is that the graph closely follows the use-def			/// analysis specifically works to establish the SCCs of these directed graphs
	/// edges for functions. Walking "up" the graph can be done by looking at all			/// and support the post-order traversal of the resulting SCC DAG. The SCC
				silvasUnsubmitted Not Done Reply Inline Actions Is the "call" graph not a subgraph of the "reference" graph? If it isn't, you probably want to show a concrete example of a situation where it isn't. silvas: Is the "call" graph not a subgraph of the "reference" graph? If it isn't, you probably want to…
	/// of the uses of a function.			/// formation is done lazily and on-demand during the traversal, including any
				/// necessary inspection of the IR to build the graph itself, in order to
				davidxlUnsubmitted Not Done Reply Inline Actions Is Laziness essential to the algorithm? If not, we don't need to emphasize it here -- just a footnote at the end should be enough. davidxl: Is Laziness essential to the algorithm? If not, we don't need to emphasize it here -- just a…
				/// maximize locality when walking a module of the IR. The formation uses both
				/// a lazy and a normal implementation of Tarjan's SCC detection algorithm (for
				/// the reference and call graphs respectively), and so is linear in the number
				/// of edges plus nodes.
				silvasUnsubmitted Not Done Reply Inline Actions You may want to spell out what you mean by "pruned" here (or is this common terminology I'm unaware of?). Since you define the exact circumstances in which an edge exists, I'm not sure that there's much benefit to saying "pruned" here though. silvas: You may want to spell out what you mean by "pruned" here (or is this common terminology I'm…
	///			///
	/// The roots of the call graph are the external functions and functions			/// The roots of the call graph are the external functions and functions
	/// escaped into global variables. Those functions can be called from outside			/// escaped into external global variables. Those functions can be called from
				silvasUnsubmitted Done Reply Inline Actions small nit: this should probably be "escaped into external global variables" silvas: small nit: this should probably be "escaped into external global variables"
	/// of the module or via unknowable means in the IR -- we may not be able to			/// outside of the module or via unknowable means in the IR -- we may not be
				davidxlUnsubmitted Not Done Reply Inline Actions Perhaps just define 'roots' as those callgraph nodes that can potentially called by external functions without body of IR that are available to the compilation. davidxl: Perhaps just define 'roots' as those callgraph nodes that can potentially called by external…
	/// form even a potential call edge from a function body which may dynamically			/// able to form even a potential call edge from a function body which may
	/// load the function and call it.			/// dynamically load the function and call it.
	///			///
				silvasUnsubmitted Not Done Reply Inline Actions Does this sentence mean it uses two implementations? That it uses one which is both straight-forward and lazy? I don't see what "straight-forward" adds to this sentence nor how an implementation of Tarjan's algorithm that is lazy can be considered "straight-forward". Is there a specific thing you're trying to communicate? silvas: Does this sentence mean it uses two implementations? That it uses one which is both straight…
	/// This analysis still requires updates to remain valid after optimizations			/// The graph supports online updates, including within the region of the graph
	/// which could potentially change the set of potential callees. The			/// traversed to form SCCs (both reference and call edge SCCs). These updates,
	/// constraints it operates under only make the traversal order remain valid.			/// when within traversed regions of the graph, and thus potentially mutating
	///			/// the SCC structure in addition to mutating the underlying graph, are also
	/// The entire analysis must be re-computed if full interprocedural			/// written in a way that updates the SCC structure and surfaces detailed
	/// optimizations run at any point. For example, globalopt completely			/// information about the nature of those updates to the caller so it can
	/// invalidates the information in this analysis.			/// observe the new state and incorporate any changes. See the mutation API on
				/// the \c RefSCC class below for details.
				///
				/// Updates (both to call and reference edges) are also implemented using
				/// minorly tweaked versions of Tarjan's SCC finding algorithm (with the same
				/// complexity as above) over the subgraph whose SCCs could possibly change
				/// with the update. While in the worst case this includes the entire traversed
				/// region of the graph, hitting that worst case requires conneting a leaf of
				davidxlUnsubmitted Not Done Reply Inline Actions Probably needs more description of updates : various scenarios (small example) and description on how each scenario is handled. davidxl: Probably needs more description of updates : various scenarios (small example) and description…
				/// the graph to the traversed root in order to form a large cycle. If doing
				/// repeated mutations, the only reasonable access pattern is to mutate the
				/// just traversed SCC as that will then only run Tarjan's over the nodes
				/// (originally) in that SCC. In practice, this is exactly the set of mutations
				/// done by the optimizer. The entire analysis should be re-computed if full
				silvasUnsubmitted Not Done Reply Inline Actions Tarjan's algorithm per se doesn't deal with updates and so saying that the updates are handled with"using straightforward versions of Tarjan's SCC finding algorithm" doesn't provide much information. Could you maybe give a concrete explanation? E.g. that you perform a DFS from ... reusing the LowLink's that have already been computed in such and such a way. Then you can draw parallels to Tarjan's algorithm. silvas: Tarjan's algorithm per se doesn't deal with updates and so saying that the updates are handled…
				/// interprocedural optimizations run at any point and could trigger
				/// large-scale incremental updates.
				silvasUnsubmitted Not Done Reply Inline Actions Could you elaborate here about what "impacted" / "potentially impacted" means for the various kinds of updates? silvas: Could you elaborate here about what "impacted" / "potentially impacted" means for the various…
				///
				/// FIXME: There is a well understood superlinear space problem with the
				/// function reference graph as implemented: it de-normalizes indirect
				/// references through tables of function pointers. There are many solutions
				/// available to this, all of which have the effect of preserving some amount
				/// of normalization and indirection to reduce the space requirements. We
				/// expect to introduce one of these techniques as soon as this becomes
				/// a problem in practice.
	///			///
	/// FIXME: This class is named LazyCallGraph in a lame attempt to distinguish			/// FIXME: This class is named LazyCallGraph in a lame attempt to distinguish
	/// it from the existing CallGraph. At some point, it is expected that this			/// it from the existing CallGraph. At some point, it is expected that this
	/// will be the only call graph and it will be renamed accordingly.			/// will be the only call graph and it will be renamed accordingly.
	class LazyCallGraph {			class LazyCallGraph {
	public:			public:
	class Node;			class Node;
	class SCC;			class SCC;
	▲ Show 20 Lines • Show All 909 Lines • Show Last 20 Lines