A desired property of the node order in Swing Modulo Scheduling is

that for nodes outside circuits the following holds: none of them is

scheduled after both a successor and a predecessor. We call

node orders that meet this property valid.

Although invalid node orders do not lead to the generation of incorrect

code, they can cause the pipeliner not being able to find a pipelined schedule

for arbitrary II. The reason is that after scheduling the successor and the

predecessor of a node, no room may be left to schedule the node itself.

For data flow graphs with 0-latency edges, the node ordering algorithm

of Swing Modulo Scheduling can generate such undesired invalid node orders.

This patch fixes that.

In the remainder of this commit message, I will give an example

demonstrating the issue, explain the fix, and explain how the the fix is tested.

Consider, as an example, the following data flow graph with all

edge latencies 0 and all edges pointing downward.

n0 / \ n1 n3 \ / n2 | n4

Consider the implemented node order algorithm in top-down mode. In that mode,

the algorithm orders the nodes based on greatest Height and in case of equal

Height on lowest Movability. Finally, in case of equal Height and

Movability, given two nodes with an edge between them, the algorithm prefers

the source-node.

In the graph, for every node, the Height and Movability are equal to 0.

As will be explained below, the algorithm can generate the order n0, n1, n2, n3, n4.

So, node n3 is scheduled after its predecessor n0 and after its successor n2.

The reason that the algorithm can put node n2 in the order before node n3,

even though they have an edge between them in which node n3 is the source,

is the following: Suppose the algorithm has constructed the partial node

order n0, n1. Then, the nodes left to be ordered are nodes n2, n3, and n4. Suppose

that the while-loop in the implemented algorithm considers the nodes in

the order n4, n3, n2. The algorithm will start with node n4, and look for

more preferable nodes. First, node n4 will be compared with node n3. As the nodes

have equal Height and Movability and have no edge between them, the algorithm

will stick with node n4. Then node n4 is compared with node n2. Again the

Height and Movability are equal. But, this time, there is an edge between

the two nodes, and the algorithm will prefer the source node n2.

As there are no nodes left to compare, the algorithm will add node n2 to

the node order, yielding the partial node order n0, n1, n2. In this way node n2

arrives in the node-order before node n3.

To solve this, this patch introduces the ZeroLatencyHeight (ZLH) property

for nodes. It is defined as the maximum unweighted length of a path from the

given node to an arbitrary node in which each edge has latency 0.

So, ZLH(n0)=3, ZLH(n1)=ZLH(n3)=2, ZLH(n2)=1, and ZLH(n4)=0

In this patch, the preference for a greater ZeroLatencyHeight

is added in the top-down mode of the node ordering algorithm, after the

preference for a greater Height, and before the preference for a

lower Movability.

Therefore, the two allowed node-orders are n0, n1, n3, n2, n4 and n0, n3, n1, n2, n4.

Both of them are valid node orders.

In the same way, the bottom-up mode of the node ordering algorithm is adapted

by introducing the ZeroLatencyDepth property for nodes.

The patch is tested by adding extra checks to the following existing

lit-tests:

test/CodeGen/Hexagon/SUnit-boundary-prob.ll

test/CodeGen/Hexagon/frame-offset-overflow.ll

test/CodeGen/Hexagon/vect/vect-shuffle.ll

Before this patch, the pipeliner failed to pipeline the loops in these tests

due to invalid node-orders. After the patch, the pipeliner successfully

pipelines all these loops.