This fixes a left-over efficiency issue in D8950.
As Andrew and Daniel suggested, we can store the candidates in a stack
and pop the top element when it does not dominate the current
instruction. This reduces the worst-case time complexity to O(n).