This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Disable narrow load merge by default
ClosedPublic

Authored by junbuml on May 11 2016, 8:43 AM.

Details

Summary

As this optimization converts two loads into one load with two shift instructions,
it could potentially hurt performance if a loop is arithmetic operation intensive.

Diff Detail

Event Timeline

junbuml updated this revision to Diff 56920.May 11 2016, 8:43 AM
junbuml retitled this revision from to [AArch64] Disable narrow load merge by default.
junbuml updated this object.
junbuml added a subscriber: llvm-commits.
mcrosier edited edge metadata.May 11 2016, 8:48 AM

Based on our results and feedback from our SD colleagues, I'm fine with approving this patch. I know the performance results were neutral for Spec2006. Did you do any additional testing on Spec2000 or EEMBC by chance?

mcrosier accepted this revision.May 11 2016, 8:49 AM
mcrosier edited edge metadata.

Approving, but feel free to wait for feedback from Tim, James, or others before committing.

This revision is now accepted and ready to land.May 11 2016, 8:49 AM

No performance regression was found in spec200/2006. I will run EEMBC as well.

jmolloy edited edge metadata.May 11 2016, 8:56 AM
jmolloy added a subscriber: jmolloy.

Hi Jun,

Have you considered deciding this as a MachineCombiner pattern? This would be a good place to know if the loop is arithmetic or load/store heavy.

Cheers,

James

Have you considered deciding this as a MachineCombiner pattern? This would be a good place to know if the loop is arithmetic or load/store heavy.

Let me take a look if we can move this to MachineCombiner.
Thanks James!

WRT Exynos M1, this change is neutral.

LGTM

I had multiple EEMBC runs as my score were somewhat unstable. Overall, I wasn't able to see reproducible regressions. Please feel free to run performance tests and share your results. I will commit this at the end of this week if there is no objection.

I will commit this at the end of this week if there is no objection.

No objection here.

Have you considered deciding this as a MachineCombiner pattern? This would be a good place to know if the loop is arithmetic or load/store heavy.

I think MachineCombiner is also good place to perform this optimization with minor changes in the profitability check. As of now, however, I don't have any case impacted by this optimization. So, I will deprioritize doing it until I can find the cases.

This revision was automatically updated to reflect the committed changes.