In the case of more than one SDep between two successor SUnits in the Nodeset, the current implementation sums the latencies of the dependencies, which could create a larger RecMII than necessary.
for example, in case there is both a data dependency and an output dependency (with latency > 0) between successor nodes:
SU(1) inst1:
successors: SU(2): out latency = 1 SU(2): data latency = 1
SU(2) inst2:
successors: SU(3): out latency = 1 SU(3): data latency = 1
SU(3) inst3:
successors: SU(1): out latency = 1 SU(1): data latency = 1
the NodeSet latency returned would be 6, whereas it could be 3 if we take the max for each successor SUnit.
In general this can be extended to finding the shortest path in the recurrence..
thoughts?
Unfortunately I had a hard time creating a test for this in Hexagon/PowerPC, so help would be appreciated.