Page MenuHomePhabricator

[DDG] Data Dependence Graph - Graph Simplification

Authored by bmahjour on Jan 7 2020, 10:55 AM.



This is the last functional patch affecting the representation of DDG. Here we try to simplify the DDG to reduce the number of nodes and edges by iteratively merging pairs of nodes that satisfy the following conditions, until no such pair can be identified. A pair of nodes consisting of a and b can be merged if:

  1. the only edge from a is a def-use edge to b and
  2. the only edge to b is a def-use edge from a and
  3. there is no cyclic edge from b to a and
  4. all instructions in a and b belong to the same basic block and
  5. both a and b are simple (single or multi instruction) nodes.

These criteria allow us to fold many uninteresting def-use edges that commonly exist in the graph while avoiding the risk of introducing dependencies that didn't exist before. For example consider a graph G defined as: { (a)->(b), (b)->(c), (a)->(d) }
The edge in between (b) and (c) can be folded to give us { (a)->(bc), (a)->(d) }
Note that folding any of the other edges is not legal. For example if the edge between (a) and (b) is folded, then a new (incorrect) dependence between (b) and (d) is introduced due to the dependency between (a) and (d).

Diff Detail

Event Timeline

bmahjour created this revision.Jan 7 2020, 10:55 AM
Meinersbur added inline comments.Jan 9 2020, 11:55 AM

[nit] Looks line an anti-pattern. Did you consider

return SimpleSrc->getLastInstruction()->getParent() == SimpleTgt->getFirstInstruction()->getParent();

or add an empty line before return true to make it distinct as the "all tests passed" action?


[serious] This is an O(V^2*max(E)) worst case algorithm (hasEdgeTo is a linear search) reinvoked over each fixpoint round, i.e. potentially quadratic (maybe less because of the EC.second > 1 bailout and fixpoint probably not linear over node count). To count the number of incoming edges, iterate over all edges, lookup the target node in CandidateSrcNodes and increase its counter. That's just O(E). To further reduce computation time, CandidateSrcNodes could be updated when nodes are merged instead of recomputing it.


I wonder whether a fixpoint is even necessary. In what cases would we need a second round?

bmahjour marked 2 inline comments as done.Jan 10 2020, 10:12 AM
bmahjour added inline comments.

The DirectedGraph implementation does not hold a list of all edges (to save space), so we first need to compute it. Note that in the worst case (where the graph is complete) walking over all edges has O(V^2) complexity, so the whole algorithm will have O(V^2 + V) complexity.

Note also that the CandidateSrcNodes map is a map from the candidate source nodes (not the target nodes), so we would need another map to store the incoming edge count of all nodes and then cross reference it as we walk CandidateSrcNodes and update it.

We would need to introduce a map and a list with worst-case space complexities O(V) and O(V^2) respectively. Having said that in the case of complete graphs, we have max(E) == V so in comparison with the above algorithm, it would still be an improvement in terms of computation.

If you still think the additional memory complexity increase is justified, I can change it as described.


It's meant to catch cases like this:

{(a)->(b), (b)->(c), (c)->(d), (e)->(d)}

the goal is to turn it into:

{(a,b,c)->(d), (e)->(d)}

I think I can get rid of the fixpoint algorithm by making use of the CandidateSrcNodes as a worklist. I'll post an update soon.

Meinersbur added inline comments.Jan 10 2020, 1:07 PM

I'd expect a typical dependence graph to be a sequence of use-def chains: a->b->c->d->e->f->... . This would already have O(V) for your algorithm as CandidateSrcNodes contains all nodes except the last. Compete graphs should be collapsed to a sing Pi node and we may consider removing transitive dependencies such that these chains are more likely.

Additional O(V) space allocations (A DenseMap<NodeType>) should be less important since already storing the nodes takes O(V) space. I don't understand where the O(V^2) comes from.

I think of the following:

DenseMap<NodeType> NumberOfIncomingEdges;
for (NodeType *N : Graph)
  for (EdgeType *E: N)
    NumberOfIncomingEdges[E->getTargetNode()] += 1;

and the requirement then be checked on the fly:

for (NodeType *Src : Graph) {
  if (Src->getEdges().size() != 1)
  EdgeType *E = Src.back();
  if (!E->isUseDef())
  NodeType *Tgt = E->getTargetNode();
  if (NumberOfIncomingEdges[Tgt] != 1)

First part is O(V+E) (or just E assuming that E>V), second is O(V).

We could even compute CandidateSrcNodes` using NumberOfIncomingEdges with fewer operations:

for (NodeType *N : Graph) {
  if (N->getEdges().size() != 1)
  for (EdgeType *E: N)
    CandidateSrcNodes[N] += NumberOfIncomingEdges[E->getTargetNode()].

with O(V+E).

But also also think you put more thoughts into it than I just did, so what am I missing?


In this example a->b and b->c can both be merged in the same round and with a bit of care, the merge of one of the would not inhibit merging the other in the same round.

bmahjour updated this revision to Diff 237423.Jan 10 2020, 1:09 PM
bmahjour marked 3 inline comments as done.

Address review comments.

bmahjour marked 3 inline comments as done.Jan 10 2020, 1:23 PM
bmahjour added inline comments.

Compete graphs should be collapsed to a sing Pi node and we may consider removing transitive dependencies such that these chains are more likely.

In this patch simplify() is done before pi-block creation, so that the SCC walks can benefit from the reduction of node/edges as a result of simplification.

I don't understand where the O(V^2) comes from

The number of edges in a complete graph is equal to V(V-1)/2, which leads to the quadratic complexity in number of nodes if we walk over it.


Please take a look at the updated patch, where I removed the fix point iteration. Does that look good to you?

bmahjour updated this revision to Diff 238394.Jan 15 2020, 4:53 PM
bmahjour marked an inline comment as done.

@Meinersbur this update should address your concern regarding time complexity. I've also tried to limit the additional memory usage by only keeping track of incoming edge counts for nodes that are targets of candidate source nodes.

@Meinersbur have you had a chance to see my latest changes yet?

Meinersbur added inline comments.Mon, Feb 17, 9:13 AM

[bikeshedding] areNodesMergable?


[style] This does two DenseMap lookups. Did you consider using the iterator returned by DenseMap::find to access the element?


[serious] Since DenseMap::erase replaces the item with a tombstone, searching for the next non-tombstone (DenseMap::begin) will be linear in time. That is, this while-loop takes O(n^2) just for iterating over the elements.

Maybe have a separate worklist+hastable for marking elements already processed?


[typo] cancidate

Meinersbur added inline comments.Mon, Feb 17, 10:18 AM

[style] Can you enclose the inner of LLVM_DEBUG into braces such that clang-format formats it nicer?

bmahjour updated this revision to Diff 245266.Tue, Feb 18, 2:31 PM
bmahjour marked 8 inline comments as done.
bmahjour added inline comments.

I was trying to avoid a second container to avoid memory waste, but since the worklist is transient and typically small, duplicating it should be fine. I've also found a bug where (depending on the order) merging nodes can trigger the "Expected a single edge from the candidate src node." assert. I've fixed this as well and added comments.

Meinersbur accepted this revision.Tue, Feb 18, 3:18 PM



It would have been a good idea of there wasn't the downside of O(n^2).

As you noticed, iterating over a DenseMap can lead to non-deterministic behavior (There's MapVector to avoid this).

This revision is now accepted and ready to land.Tue, Feb 18, 3:18 PM
This revision was automatically updated to reflect the committed changes.