We should also check that the "bottom" basic block of a loop
is a successor of the "header" basic block, otherwise we don't
propagate the information correctly when the CFG is complex.
This fixes an important rendering problem with Wolfsentein 2,
because of one vector-memory wait was missing.
Please review,
Thanks!