This is an archive of the discontinued LLVM Phabricator instance.

[DDG] Data Dependence Graph - Pi Block
ClosedPublic

Authored by bmahjour on Oct 10 2019, 1:13 PM.

Details

Summary

This patch adds Pi Blocks to the DDG. A pi-block represents a group of DDG nodes that are part of a strongly-connected component of the graph. Replacing all the SCCs with pi-blocks results in an acyclic representation of the DDG. For example if we have:

{a -> b}, {b -> c, d}, {c -> a}

the cycle a -> b -> c -> a is abstracted into a pi-block "p" as follows:

{p -> d} with "p" containing: {a -> b}, {b -> c}, {c -> a}

In this implementation the edges between nodes that are part of the pi-block are preserved. The crossing edges (edges where one end of the edge is in the set of nodes belonging to an SCC and the other end is outside that set) are replaced with corresponding edges to/from the pi-block node instead.

Diff Detail

Repository
rL LLVM

Event Timeline

bmahjour created this revision.Oct 10 2019, 1:13 PM
etiotto added inline comments.Oct 11 2019, 6:34 AM
llvm/include/llvm/Analysis/DependenceGraphBuilder.h
115

When would be not desired to create pi-blocks? Is this member function really at this point?

llvm/lib/Analysis/DDG.cpp
16

Is probably overkill to have an option to disable the creation of pi-blocks. At least at this point. Less is more :-)

215

This makes me think what happens when a node that is part of a pi-block is removed from the graph (via DirectedGraph::removeNode). We should update the PiBlockMap in that case to reflect that the node no longer exists in the graph.

bmahjour marked 2 inline comments as done.Oct 17 2019, 10:34 AM
bmahjour added inline comments.
llvm/include/llvm/Analysis/DependenceGraphBuilder.h
115

I added this API to allow client code to control creation of pi-blocks for debugging and experimentation purposes. Furthermore since the graph builder is meant to be more generic than just a DDG builder, having the option to control certain steps of the graph build may be useful in future, for example when we use it to create PDG or other types of dependence graphs where certain steps do not apply.

llvm/lib/Analysis/DDG.cpp
215

There is more to be considered than the map if nodes are being removed from pi-blocks. Removing a node from the pi-block may actually remove the cycle or change the SCC, but I cannot think of a practical scenario where doing so would be necessary. Do you have a practical scenario in mind?

createPiBlocks should not handle every case with its own boilerplate code.

llvm/lib/Analysis/DDG.cpp
40

[style] Unless the type is too complex, try to avoid auto.

84

[style] Unless the type is too complex, try to avoid auto.

222

[style] Unless the type is too complex, try to avoid auto.

237–238

[style] Unless the type is too complex, try to avoid auto.

llvm/lib/Analysis/DependenceGraphBuilder.cpp
103

[suggestion] Use ListsOfSCCs.emplace_back?

106

[style] Unless the type is too complex, try to avoid auto.

110

[style] Unless the type is too complex, try to avoid auto.

125

[style] Unless the type is too complex, please try to avoid auto.

141

[serious] Should use DDGEdge::EdgeKind

159

[style] Unless the type is too complex, please try to avoid auto.

160

AFAICS, OldEdge is already of type EdgeType*, i.e. the static_cast is unnecessary.

160

[serious] Isn't there a way to simplify this chain of ifs? Such as

Kind = OldEdge->getKind();
bool &AlreadyCreated = EdgeAlreadyCreated[Dir][Kind];
if (!AlreadyCreated) 
  createEdgeOfKind(*Src, *New, Kind);
  AlreadyCreated = true;
bmahjour marked 17 inline comments as done.Oct 29 2019, 2:43 PM

Addressed all comments.

llvm/lib/Analysis/DDG.cpp
16

See reply above.

llvm/lib/Analysis/DependenceGraphBuilder.cpp
141

The reason I introduced a local enum in this implementation without exposing it to other parts is two fold:

  1. In order to avoid type casting the enum to an integral type that can be used as an index into the boolean array, the enum used here should be an unscoped enum. The DDGEdge::EdgeKind enum is a scoped enum and it should stay that way.
  2. This builder is supposed to build other types of graphs too, so it's better not to rely on the implementation specifics of one graph (DDG) to build other types of graphs (such as PDG). For example, while the def-use, memory and rooted edges are common concepts between DDG and PDG, the edge names, number of edge kinds or their orderings might be different.
bmahjour updated this revision to Diff 226975.Oct 29 2019, 2:45 PM
bmahjour marked 2 inline comments as done.

I am still unconvinced over the boilerplate createPiBlocks.

This method of de-duplication of edges assumes that edges are only defined by its kind. Is it correct that no additional properties, such as which definition of a value caused a UseDef dependency, are planned?

llvm/lib/Analysis/DependenceGraphBuilder.cpp
141
  1. In order to avoid type casting the enum to an integral type that can be used as an index into the boolean array, the enum used here should be an unscoped enum. The DDGEdge::EdgeKind enum is a scoped enum and it should stay that way.

It also means we have to maintain a mapping between the two enums which are basically identical. I'd prefer the cast over this.

One can abstract away the the type-casting into a type-safe class, such as proposed here:
https://stackoverflow.com/questions/12927951/array-indexing-converting-to-integer-with-scoped-enumeration#answer-35186573

  1. This builder is supposed to build other types of graphs too, so it's better not to rely on the implementation specifics of one graph (DDG) to build other types of graphs (such as PDG). For example, while the def-use, memory and rooted edges are common concepts between DDG and PDG, the edge names, number of edge kinds or their orderings might be different.

I cannot follow this argument. createPiBlocks() as-is relies on knowing all types of edges since there is no abstraction. When introducing a new type of edge, it will hit llvm_unreachable("Unsupported edge kind"). It also cannot create edges of arbitrary kind since it needs to call a createXYZEdge method. If you add a kind of edge to PDG only, it will need to call a createABCEdge method, which, if not also added to the DDG, will make the AbstractDependenceGraphBuilder uninstantiable for the DDG.

If the abstraction of node kind is important, I suggest to add a way to create edges without knowing its type, such as a clone() method on edges or promoting createEdgeOfKind to become a method of the graph builder such that subclasses can implement custom factories for their edges.

I am still unconvinced over the boilerplate createPiBlocks.

I'm not sure what you mean by "boilerplate". Is it in reference to your concern about de-duplication of edges based on their kinds? If not, could you please elaborate?

This method of de-duplication of edges assumes that edges are only defined by its kind. Is it correct that no additional properties, such as which definition of a value caused a UseDef dependency, are planned?

Yes, this abstraction loses information about the specific origin of edges coming from inside nodes (members of a pi-block) and the specific target of edges coming from outside nodes. The idea is that since members of a pi-block are bound together by their tight dependence relationship, there is less value in maintaining the coordinates of their relationship with respect to nodes outside of the pi-block. For example, suppose instruction A and B are part of a pi-block (P) and instruction C has a flow dependence with both A and B. For the purpose of instruction ordering, C will come before (P) by the virtue of a single edge from C to that pi-block. At this point the fact that both A and B are connected to C is irrelevant/redundant.
There are cases however, where the nature of the dependence relationship is important, for example when determining if a fusion-preventing edge exists between two loops. In such cases the existence of a particular dependence relationship needs to be recomputed, however the specifics of the origin or target of that dependence is not useful. In subsequent patches I'm going to add a function getDependencies() to the DDG where given two nodes (they could be pi-blocks) it walks through their instructions and returns a vector of all the possible dependence relationships between those two nodes.

Please also note that one of the primary goals of creating pi-blocks is to turn the graph into an DAG. If we maintain edges to individual nodes of a pi-block, the cycles in the pi-blocks become exposed again and the purpose of abstraction is defeated.

If the need arises in the future to obtain the lost information, it could be dealt with by

  1. recomputing the dependencies between subject nodes
  2. maintaining a side table in the builder and/or DDG
  3. keeping the original edges and somehow dealing with the DAG issue (not sure if possible).
  4. introducing an aggregate edge that keeps track of its origins and targets.

Each approach would have its own challenges of course, but (3) is the only one that would be harder to do in the future if we continue with the proposed implementation.

Do you have a common use-case in mind where it would be important to know exactly which nodes within a pi-block have relationships with nodes that are outside?

bmahjour marked an inline comment as done.Oct 31 2019, 8:20 AM
bmahjour added inline comments.
llvm/lib/Analysis/DependenceGraphBuilder.cpp
141

I think I understand your concern now. I like the idea of a custom factory for creating all types of edges that the client cares about without making the builder know about those edges, however I think a default implementation that only handles DefUse, Memory and Rooted edges would be useful. Do you agree?
I'll work on it and upload a patch soon.

I'm not sure what you mean by "boilerplate". Is it in reference to your concern about de-duplication of edges based on their kinds? If not, could you please elaborate?

This was to clarify my reason to not accept this patch yet.

Do you have a common use-case in mind where it would be important to know exactly which nodes within a pi-block have relationships with nodes that are outside?

In Polly, the question about what caused a dependency comes up regularly (https://github.com/llvm/llvm-project/blob/master/polly/include/polly/DependenceInfo.h#L40). A use case is to privatize the memory causing a false dependency.

Just asking what is being considered for the future.

llvm/lib/Analysis/DependenceGraphBuilder.cpp
141

If this is just an implementation that have these three kinds of edges, there is even less of a reason to not use the EdgeKind that comes with it.

Do you think of a use case where there is an dependence graph with an EdgeKind that is never used (at least not before createPiBlocks?). The worst that happens here is you have unused elements in EdgeAlreadyCreated.

I generally don't like complicating things before the extra complexity becomes necessary.

bmahjour marked an inline comment as done.Oct 31 2019, 11:30 AM
bmahjour added inline comments.
llvm/lib/Analysis/DependenceGraphBuilder.cpp
141

The worst that happens here is you have unused elements in EdgeAlreadyCreated.

Unused elements can happen too, but the worst that can happen is edge kinds getting misinterpreted because they don't end up with the same enumeration value. eg:

class DDGEdge : public DDGEdgeBase {
public:
  enum class EdgeKind { Unknown, RegisterDefUse, MemoryDependence, Rooted };
...
};

class PDGEdge : public PDGEdgeBase {
public:
  enum class EdgeKind { Unknown, MemoryDependence, RegisterDefUse, Control, Rooted };
...
};
Meinersbur added inline comments.Oct 31 2019, 11:50 AM
llvm/lib/Analysis/DependenceGraphBuilder.cpp
141

Within AbstractDependenceGraphBuilder<GraphType>::createPiBlocks, only one EdgeType = typename GraphType::EdgeType would ever be valid.

Do you foresee functions using DDG and PDG nodes/edges at the same time? Even if so, mixing scoped enum values of different type is an error, except when casted to get an array index, which use case can be made type safe with EnumeratedArray. I think EnumeratedArray would be a great addition to ADT.

bmahjour updated this revision to Diff 227465.Nov 1 2019, 9:12 AM
bmahjour marked 3 inline comments as done.

Created EnumeratedArray ADT (with unit tests), added Count to the edge enums and changed EdgeAlreadyCreated matrix to use EnumeratedArray.

Meinersbur added inline comments.Nov 1 2019, 1:00 PM
llvm/include/llvm/ADT/EnumeratedArray.h
22

Thanks a lot!

Could we change Enumeration::Count to Enumaration::Last? The reason is that ::Count introduces a new element that compilers warn about if unhanded in switch statements. That is, instead of

enum class E {
  One,
  Two,
  Count
};

switch (e) {
case E::One:
case E::Two:
  break;
}

warning: enumerator 'E::Count' in switch of enum 'E' is not handled

use

enum class E {
  One,
  Two,
  Last = Two
};

switch (e) {
case E::One:
case E::Two:
  break;
}
bmahjour updated this revision to Diff 228285.Nov 7 2019, 11:45 AM
bmahjour marked 2 inline comments as done.

Changed the enum array to use "Last" instead of "Count"

llvm/include/llvm/ADT/EnumeratedArray.h
22

Sure, will do.

llvm/lib/Analysis/DependenceGraphBuilder.cpp
141

Ok, I've added EnumeratedArray and changed EdgesAlreadyCreated to use it.

Meinersbur accepted this revision.Nov 7 2019, 1:05 PM

LGTM

llvm/include/llvm/ADT/EnumeratedArray.h
22

@jdoerfert made me aware of the existence of IndexedMap. In contrast to EnumeratedArray it uses a SmallVector as storage (but 0 inline-storage), so I think EnumeratedArray has its own use-case.

This revision is now accepted and ready to land.Nov 7 2019, 1:05 PM
bmahjour closed this revision.Nov 8 2019, 2:20 PM