The entries in VectorizableTree are not necessarily ordered by their
position in basic blocks. Collect them and order them by dominance so
later instructions are guaranteed to be visited first. For instructions
in different basic blocks, we only scan to the beginning of the block,
so their order does not matter, as long as all instructions in a basic
block are grouped together. Using dominance ensures a deterministic order.
The modified test case contains an example where we compute a wrong
spill cost (2) without this patch, even though there is no call between
any instruction in the bundle.
This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.
There is a problem with this predicate function.
MSFT STL implementation of stable_sort asserts that if predicate returned true
then it must return false when operands are swapped.
But (A dom B) == false does not necessarily mean that (B dom A) == true.
Instead: (A dom B) ==true means that (B dom A) == false.
Rewriting it like this solves this issue: