This is an archive of the discontinued LLVM Phabricator instance.

[DDG] Data Dependence Graph - Root Node
ClosedPublic

Authored by bmahjour on Sep 24 2019, 9:33 AM.

Details

Summary

This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges.

Previous related revisions:
https://reviews.llvm.org/D65350

Diff Detail

Repository
rL LLVM

Event Timeline

bmahjour created this revision.Sep 24 2019, 9:33 AM

Do I understand it correctly, that this node is added to avoid having to deal with forests?

Introducing a createAndConnectRootNode step creates the problem that if a node is added afterwards, it will not be connected. Could you add an assertion that checks that no nodes are added once the root node is created?

[suggestion] We could create a root node at the graph's construction and connect it when nodes are created. Instead of redundantly storing a list in DirectedGraph, we could use that root node's connections to iterate over all nodes.

[suggestion] Another special node could point from ever node in the graph. I have used this to represent dependencies from code before the analyzed region respectively to code after the region. Could it be useful to create such a node es well? (I called them \top and \bottom).

llvm/lib/Analysis/DependenceGraphBuilder.cpp
60 ↗(On Diff #221543)

The algorithm here seems to add redundant edges depending on the iteration order over the node list (for a graph "A -> B", an edge from the root node is added to both if B is visited before A).

Please add some comment lines about how this algorithm works. The way it is written is somewhat confusing. depth_first_ext will always visit N first if not yet in the visited list. This makes I = N is a test for whether N has been visited yet.

61 ↗(On Diff #221543)

[serious] It's not a def-use chain based dependency.

bmahjour marked 2 inline comments as done.Sep 30 2019, 11:30 AM

Do I understand it correctly, that this node is added to avoid having to deal with forests?

Kind of. The graph prior to creation of pi-blocks is not a DAG and may have disjoint components. The root node is needed to turn it into a "rooted digraph" making it easier to compute all the strongly connected components. Prior to creating the root, the graph is similar to a forest, but technically speaking it's not a forest because the components may not be trees.

Introducing a createAndConnectRootNode step creates the problem that if a node is added afterwards, it will not be connected. Could you add an assertion that checks that no nodes are added once the root node is created?

In general it would be unsafe to add new nodes after the root is established, but there is an exception. In the next patch I'll be adding pi-blocks which are hierarchical nodes representing strongly-connected components of the graph. Since these components are already reachable by root they are safe to be added after the root is established. Note also that the root node needs to be established before pi-blocks can be identified. I'll add an assert for now and a todo to exclude pi-blocks later.

[suggestion] We could create a root node at the graph's construction and connect it when nodes are created. Instead of redundantly storing a list in DirectedGraph, we could use that root node's connections to iterate over all nodes.

If we connect the root node as we add nodes we have two choices: 1) blindly connect the root to every new node 2) compute reachability and avoid creating an edge when the root can already reach the new node. The first approach obviously creates too many edges, and the second approach is less efficient than the algorithm used here (because we keep track of and ignore visited nodes in this patch). Also the list of nodes will be needed for topological sort of the graph anyway, so it's desirable to keep it.

[suggestion] Another special node could point from ever node in the graph. I have used this to represent dependencies from code before the analyzed region respectively to code after the region. Could it be useful to create such a node es well? (I called them \top and \bottom).

I cannot think of a utility for the bottom node. Even the root node has limited uses as the DDG is mainly used to reason about properties of dependence within a given set of program elements and ignores dependencies outside of that range.

llvm/lib/Analysis/DependenceGraphBuilder.cpp
60 ↗(On Diff #221543)

Yes it's possible to add redundant edges depending on the order. This algorithm does not result in minimal edges from the root node but it does try to reduce redundant edges to some extent. An algorithm that would result in truly minimal number of edges from the root would be more complicated requiring more cpu cycles. I used this algorithm because it seems to be a reasonable compromise between compute cost and number of edges created. Please let me know if you don't agree with this approach, otherwise I'll add a comment to clarify.

61 ↗(On Diff #221543)

I will create a dedicated type of edge and add it to the builder interface.

bmahjour added a subscriber: ppc-slack.
bmahjour updated this revision to Diff 222467.Sep 30 2019, 11:36 AM
bmahjour marked an inline comment as done.Sep 30 2019, 11:41 AM
Meinersbur accepted this revision.Sep 30 2019, 1:45 PM
Meinersbur added inline comments.
llvm/lib/Analysis/DependenceGraphBuilder.cpp
60 ↗(On Diff #221543)

Thanks for adding some documentation.

This revision is now accepted and ready to land.Sep 30 2019, 1:45 PM
This revision was automatically updated to reflect the committed changes.