Current fusion algorithm doesn't have full support for multi-store producers.
For some scenarios, this means that some loop nests may or may not be fused
depending on the order they are visited. For example, if we have A->B, A->C, B->C
and try to fuse A+B first, it'll fail due to A's multiple outgoing edges. However,
after fusing B+C, A would have a single outgoing edge and could be fused. This is
currently not happening if we visit A before fusing B+C because we do not visit A
again after B+C fusion.
This patch changes the fusion algorithm so that after fusing two loop nests (B+C)
we revisit previously visited nodes (A) so that they are considered again for
fusion in the context of the new fused loop nest.
Nice.