This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Decouple zero store promotion from narrow ld merge. NFC.
ClosedPublic

Authored by junbuml on May 3 2016, 12:52 PM.

Details

Summary

This change refactors to decouple the zero store promotion from the narrow ld merge and add a flag (enable-narrow-ld-merge=true) to control the narrow ld merge optimization.

Diff Detail

Repository
rL LLVM

Event Timeline

junbuml updated this revision to Diff 56048.May 3 2016, 12:52 PM
junbuml retitled this revision from to [AArch64] Decouple zero store promotion from narrow ld merge. NFC..
junbuml updated this object.
junbuml added a subscriber: llvm-commits.

In our internal tests, we found performance regressions with the narrow load merge in some cases. Initially, this optimization was driven by the +3% performance gain in spec2006/h264ref that has a load intensive hot loop. However, the gain I was targeting in h264ref is now completely covered by SLP vectorizer.

As this optimization converts two loads into one load with two shift instructions, it could potentially hurt performance if a loop is arithmetic operation intensive.

Through this change I want to let other people run performance test with/without the narrow load merge. If there is no objection I would like to disable the narrow load merge by default in separate patch.

mcrosier accepted this revision.May 3 2016, 1:37 PM
mcrosier edited edge metadata.

LGTM.

This revision is now accepted and ready to land.May 3 2016, 1:37 PM
This revision was automatically updated to reflect the committed changes.