This change refactors to decouple the zero store promotion from the narrow ld merge and add a flag (enable-narrow-ld-merge=true) to control the narrow ld merge optimization.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
In our internal tests, we found performance regressions with the narrow load merge in some cases. Initially, this optimization was driven by the +3% performance gain in spec2006/h264ref that has a load intensive hot loop. However, the gain I was targeting in h264ref is now completely covered by SLP vectorizer.
As this optimization converts two loads into one load with two shift instructions, it could potentially hurt performance if a loop is arithmetic operation intensive.
Through this change I want to let other people run performance test with/without the narrow load merge. If there is no objection I would like to disable the narrow load merge by default in separate patch.