Both the hardware and LLVM have changed since 2012.
Now, load-based heuristic don't show big differences any more on OoO cores.
There is no regressons and improvements on spec2000/2006. (Cortex-A57, Core i5).
But There are two big improvements in test-suite.
Benchmark Name | Exe time Opt/Ori |
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl | 87.18% |
MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt | 83.82% |
And I didn't see notable regressions in test-suite.
I don't understand what's going on here. So what if they fold away? They could be bitcasts, or free extensions/truncations, etc. Wouldn't you need to recurse up the use/def chain to find the non-free operand and making some determination based on that?