Index: include/llvm/Analysis/LazyCallGraph.h
===================================================================
--- include/llvm/Analysis/LazyCallGraph.h
+++ include/llvm/Analysis/LazyCallGraph.h
@@ -8,24 +8,79 @@
 //===----------------------------------------------------------------------===//
 /// \file
 ///
-/// Implements a lazy call graph analysis and related passes for the new pass
-/// manager.
+/// Implements a lazy call graph analysis for the new pass manager.
 ///
-/// NB: This is *not* a traditional call graph! It is a graph which models both
-/// the current calls and potential calls. As a consequence there are many
-/// edges in this call graph that do not correspond to a 'call' or 'invoke'
-/// instruction.
+/// This analysis is used to decompose an LLVM Module into components useful
+/// for a specific subset of interprocedural optimization (IPO) techniques:
+///
+/// 1) Call site based pairwise interprocedural transforms such as inlining,
+///    argument promotion, interprocedural constant propagation, etc.
+/// 2) Reachable set interprocedural analyses and inference such as function
+///    attribute inference, exception handling pruning, etc.
+///
+/// Both of these benefit from the optimization of each function in the program
+/// in post-order to ensure that callees are fully optimized when analyzing
+/// their callers and the call edge. To support this, we need an analysis that
+/// provides a post-order traversal of the strongly connected components (SCCs)
+/// of program call graph.
+///
+/// However, during this post-order traversal, transformations in #1 as well
+/// the basic function-local optimizations can mutate and refine the graph.
+/// These graph mutations need to be immediately reflected to achieve both of
+/// the above goals. Both transformations (#1 and #2) and analyses (#2) benefit
+/// directly from a more precise and refined call graph structure, so we want
+/// to incorporate graph updates immediately *during* the traversal. Put
+/// differently, we want a post-order traversal of the graph *including* all
+/// updates and refinements made to it. There are two primary forms of call
+/// graph refinement we care about:
+///
+/// a) Removing edges from the graph which provides smaller, more precise
+///    reachable sets and SCCs.
+/// b) Transforming indirect calls (that we must treat conservatively) into
+///    direct calls through constant propagation (typically this is thought of in
+///    the context of devirtualization).
+///
+/// These kinds of on-line graph updates make maintaining the graph
+/// significantly more complex in several ways. First, they require that the
+/// representation be updatable at all. They also require the graph update
+/// operations preserve enough information that any necessary changes to
+/// optimization techniques in #1 and #2 can be performed in a way that
+/// reflects the nature of the update. For example, if an update splits one SCC
+/// into three SCCs, we also need to know the post-order traversal of those new
+/// SCCs.
 ///
-/// The primary use cases of this graph analysis is to facilitate iterating
-/// across the functions of a module in ways that ensure all callees are
-/// visited prior to a caller (given any SCC constraints), or vice versa. As
-/// such is it particularly well suited to organizing CGSCC optimizations such
-/// as inlining, outlining, argument promotion, etc. That is its primary use
-/// case and motivates the design. It may not be appropriate for other
-/// purposes. The use graph of functions or some other conservative analysis of
-/// call instructions may be interesting for optimizations and subsequent
-/// analyses which don't work in the context of an overly specified
-/// potential-call-edge graph.
+/// The refinement in (b) introduces an unusual requirement on the graph. For
+/// optimizations such as those in category #1 above, the graph must provide an
+/// ordering that ensures (to the extent possible) that *potential* call edge
+/// target functions are traversed prior to any (b)-style refinement which
+/// could introduce an actual call edge to that function. This is essentially
+/// a secondary traversal order constraint in addition to the post-order over
+/// SCCs. This is accomplished by providing a layer of function reference edges
+/// in addition to the layer of explicit function call edges. Because
+/// function-local and pairwise interprocedural transformations primarily turn
+/// an existing (possibly indirect) reference to a function into a direct call
+/// to a function, a post-ordering of this "reference graph" satisfies the
+/// additional constraint.
+///
+/// Further, even for transformations which refine indirect calls to direct
+/// calls without an intrinsic "reference" in the IR, we can force them to be
+/// based upon some encoding of such references. For example, a profile-guided
+/// indirect call promotion pass might insist on embedding the candidate
+/// targets from the profile into the IR as references to functions in order to
+/// have them be included in the above ordering constraint.
+///
+/// The extra ordering constraint also serves a secondary but closely related
+/// purpose. It allows a pass manager to identify regions of the call graph
+/// which can be optimized independently from each other, even in the face of
+/// transformations which mutate the call graph. As a consequence, it can be
+/// used to drive safe parallelism across a subset of the interprocedural
+/// optimizations on a module of the IR.
+///
+/// This analysis also tries to form these graph structures in a cache-friendly
+/// way. As a consequence, it works hard to form the graph lazily as the
+/// traversal visits the functions in the module. It also needs the update
+/// algorithms described above to work in a context of a lazily formed graph
+/// rather than a complete graph.
 ///
 /// To understand the specific rules and nature of this call graph analysis,
 /// see the documentation of the \c LazyCallGraph below.
@@ -58,46 +113,73 @@
 
 /// A lazily constructed view of the call graph of a module.
 ///
-/// With the edges of this graph, the motivating constraint that we are
-/// attempting to maintain is that function-local optimization, CGSCC-local
-/// optimizations, and optimizations transforming a pair of functions connected
-/// by an edge in the graph, do not invalidate a bottom-up traversal of the SCC
-/// DAG. That is, no optimizations will delete, remove, or add an edge such
-/// that functions already visited in a bottom-up order of the SCC DAG are no
-/// longer valid to have visited, or such that functions not yet visited in
-/// a bottom-up order of the SCC DAG are not required to have already been
-/// visited.
+/// NB: This is *not* a traditional call graph! It is a graph which models both
+/// the current direct calls and references to functions which could be turned
+/// into direct calls during optimization. As a consequence there are many
+/// edges in this call graph that do not correspond to a 'call' or 'invoke'
+/// instruction.
+///
+/// This analysis is designed to serve a complex set of constraints faced by
+/// the optimizer when decomposing an LLVM Module for call-graph aware
+/// optimizations and analyses. See the file comment for the detailed
+/// motivations and context that leads to the particular design.
 ///
-/// Within this constraint, the desire is to minimize the merge points of the
-/// SCC DAG. The greater the fanout of the SCC DAG and the fewer merge points
-/// in the SCC DAG, the more independence there is in optimizing within it.
-/// There is a strong desire to enable parallelization of optimizations over
-/// the call graph, and both limited fanout and merge points will (artificially
-/// in some cases) limit the scaling of such an effort.
+/// The analysis builds a simple directed graph of calls with an edge between
+/// two functions if there exists at least one direct call from one to the
+/// other. The analysis also provides a simple directed graph of references
+/// with an edge between two functions if there exists at least one
+/// (potentially transitive, through multiple layers of global constants)
+/// reference from one function to the other. A call edge is trivially also
+/// a reference edge, and so we nest the call graph's SCCs (\c SCC below)
+/// within the reference graph's SCCs (\c RefSCC below). Put differently, the
+/// call graph is a subgraph of the reference graph, and thus the DAG of SCCs
+/// in the call graph is a subgraph of the DAG of SCCs in the reference graph.
 ///
-/// To this end, graph represents both direct and any potential resolution to
-/// an indirect call edge. Another way to think about it is that it represents
-/// both the direct call edges and any direct call edges that might be formed
-/// through static optimizations. Specifically, it considers taking the address
-/// of a function to be an edge in the call graph because this might be
-/// forwarded to become a direct call by some subsequent function-local
-/// optimization. The result is that the graph closely follows the use-def
-/// edges for functions. Walking "up" the graph can be done by looking at all
-/// of the uses of a function.
+/// Both of these graphs are directed graphs with potential cycles, and this
+/// analysis specifically works to establish the SCCs of these directed graphs
+/// and support the post-order traversal of the resulting SCC DAG. The SCC
+/// formation is done lazily and on-demand during the traversal, including any
+/// necessary inspection of the IR to build the graph itself, in order to
+/// maximize locality when walking a module of the IR. The formation uses both
+/// a lazy and a normal implementation of Tarjan's SCC detection algorithm (for
+/// the reference and call graphs respectively), and so is linear in the number
+/// of edges plus nodes.
 ///
 /// The roots of the call graph are the external functions and functions
-/// escaped into global variables. Those functions can be called from outside
-/// of the module or via unknowable means in the IR -- we may not be able to
-/// form even a potential call edge from a function body which may dynamically
-/// load the function and call it.
+/// escaped into external global variables. Those functions can be called from
+/// outside of the module or via unknowable means in the IR -- we may not be
+/// able to form even a potential call edge from a function body which may
+/// dynamically load the function and call it.
+///
+/// The graph supports online updates, including within the region of the graph
+/// traversed to form SCCs (both reference and call edge SCCs). These updates,
+/// when within traversed regions of the graph, and thus potentially mutating
+/// the SCC structure in addition to mutating the underlying graph, are also
+/// written in a way that updates the SCC structure and surfaces detailed
+/// information about the nature of those updates to the caller so it can
+/// observe the new state and incorporate any changes. See the mutation API on
+/// the \c RefSCC class below for details.
 ///
-/// This analysis still requires updates to remain valid after optimizations
-/// which could potentially change the set of potential callees. The
-/// constraints it operates under only make the traversal order remain valid.
+/// Updates (both to call and reference edges) are also implemented using
+/// minorly tweaked versions of Tarjan's SCC finding algorithm (with the same
+/// complexity as above) over the subgraph whose SCCs could possibly change
+/// with the update. While in the worst case this includes the entire traversed
+/// region of the graph, hitting that worst case requires conneting a leaf of
+/// the graph to the traversed root in order to form a large cycle. If doing
+/// repeated mutations, the only reasonable access pattern is to mutate the
+/// just traversed SCC as that will then only run Tarjan's over the nodes
+/// (originally) in that SCC. In practice, this is exactly the set of mutations
+/// done by the optimizer. The entire analysis should be re-computed if full
+/// interprocedural optimizations run at any point and could trigger
+/// large-scale incremental updates.
 ///
-/// The entire analysis must be re-computed if full interprocedural
-/// optimizations run at any point. For example, globalopt completely
-/// invalidates the information in this analysis.
+/// FIXME: There is a well understood superlinear space problem with the
+/// function reference graph as implemented: it de-normalizes indirect
+/// references through tables of function pointers. There are many solutions
+/// available to this, all of which have the effect of preserving some amount
+/// of normalization and indirection to reduce the space requirements. We
+/// expect to introduce one of these techniques as soon as this becomes
+/// a problem in practice.
 ///
 /// FIXME: This class is named LazyCallGraph in a lame attempt to distinguish
 /// it from the existing CallGraph. At some point, it is expected that this