This includes the intervening store and the load that we're trying to
forward from in the optimization remark for the missed load elimination.
This is hooked up under a new mode in ORE that allows for compile-time
budges for a bit more analysis to print more insightful messages. This
mode is currently enabled for -fsave-optimization-record (-Rpass is
trickier since it is controlled in the front-end).
With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446
This iteration order is going to be non-deterministic if there are multiple users; I don't think we can do it this way. Plus, this won't really give the best answer. Unfortunately, actually giving the best answer probably needs to wait for MemorySSA/NewGVN (because looking at other pointer uses won't pick up cases where a smaller load is replaced by part of a larger store, etc.). In the mean time, can we do something deterministic (e.g. sort the other instructions by dominance, pick the last instruction in the last block (perhaps making use of OrderedBasicBlock))?