InstComine currently processes blocks in some ill-defined depth-first order. This can break the usual invariant that the operands of an instruction should be simplified before the instruction itself, if uses across basic blocks (particularly inside phi nodes) are involved.
This patch switches the initial worklist population to use RPO instead, which will ensure that predecessors are visited before successors (back-edges notwithstanding).
This allows us to fold more cases within a single InstCombine iteration. The broader context here is that I want to limit InstCombine to a single iteration in the future (removing the current fix-point iteration), which will give a large (5% end-to-end) compile-time improvement without substantial optimization impact. However, this requires eliminating as many cases where we fail to reach the fix point in one iteration as possible, and the current worklist population order is one of the main issues.
In the meantime, this does cause a small compile-time regression of about 0.1% (http://llvm-compile-time-tracker.com/compare.php?from=725fcf40c3e55b2c03a1ed2326375984c0a8560f&to=8d0280338cd9409ec6fa4dbf86f4c2a9dfa57c60&stat=instructions:u), because calculating RPO is more expensive than one would think.