This change refactors to decouple the zero store promotion from the narrow ld merge and add a flag (enable-narrow-ld-merge=true) to control the narrow ld merge optimization.
Details
Details
Diff Detail
Diff Detail
Event Timeline
Comment Actions
In our internal tests, we found performance regressions with the narrow load merge in some cases. Initially, this optimization was driven by the +3% performance gain in spec2006/h264ref that has a load intensive hot loop. However, the gain I was targeting in h264ref is now completely covered by SLP vectorizer.
As this optimization converts two loads into one load with two shift instructions, it could potentially hurt performance if a loop is arithmetic operation intensive.
Through this change I want to let other people run performance test with/without the narrow load merge. If there is no objection I would like to disable the narrow load merge by default in separate patch.