In the case of more than one SDep between two successor SUnits in the Nodeset, the current implementation sums the latencies of the dependencies, which could create a larger RecMII than necessary.

for example, in case there is both a data dependency and an output dependency (with latency > 0) between successor nodes:

SU(1) inst1:

successors: SU(2): out latency = 1 SU(2): data latency = 1

SU(2) inst2:

successors: SU(3): out latency = 1 SU(3): data latency = 1

SU(3) inst3:

successors: SU(1): out latency = 1 SU(1): data latency = 1

the NodeSet latency returned would be 6, whereas it could be 3 if we take the max for each successor SUnit.

In general this can be extended to finding the shortest path in the recurrence..

thoughts?

Unfortunately I had a hard time creating a test for this in Hexagon/PowerPC, so help would be appreciated.