Computing the remaining latency can be very expensive especially

on graphs of N nodes where the number of edges approaches N^2.

This reduces the compile time of a pathological case with the

AMDGPU backend from ~7.5 seconds to ~3 seconds. This test case has

a basic block with 2655 stores, each with somewhere between 500

and 1500 successors and predecessors.