This is an archive of the discontinued LLVM Phabricator instance.

Differential D20172

[AArch64] Disable narrow load merge by default
ClosedPublic

Authored by junbuml on May 11 2016, 8:43 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
jmolloy
mcrosier

Commits

rGb21d4e17a222: [AArch64] Disable narrow load merge by default
rL270251: [AArch64] Disable narrow load merge by default

Summary

As this optimization converts two loads into one load with two shift instructions,
it could potentially hurt performance if a loop is arithmetic operation intensive.

Diff Detail

Event Timeline

junbuml updated this revision to Diff 56920.May 11 2016, 8:43 AM

junbuml retitled this revision from to [AArch64] Disable narrow load merge by default.

junbuml updated this object.

junbuml added reviewers: jmolloy, t.p.northover, mcrosier.

junbuml added a subscriber: llvm-commits.

Herald added subscribers: mcrosier, rengolin, aemerson. · View Herald TranscriptMay 11 2016, 8:43 AM

Based on our results and feedback from our SD colleagues, I'm fine with approving this patch. I know the performance results were neutral for Spec2006. Did you do any additional testing on Spec2000 or EEMBC by chance?

Approving, but feel free to wait for feedback from Tim, James, or others before committing.

This revision is now accepted and ready to land.May 11 2016, 8:49 AM

No performance regression was found in spec200/2006. I will run EEMBC as well.

Hi Jun,

Have you considered deciding this as a MachineCombiner pattern? This would be a good place to know if the loop is arithmetic or load/store heavy.

Cheers,

James

Have you considered deciding this as a MachineCombiner pattern? This would be a good place to know if the loop is arithmetic or load/store heavy.

Let me take a look if we can move this to MachineCombiner.
Thanks James!

evandro added a subscriber: evandro.May 11 2016, 10:57 AM

WRT Exynos M1, this change is neutral.

LGTM

I had multiple EEMBC runs as my score were somewhat unstable. Overall, I wasn't able to see reproducible regressions. Please feel free to run performance tests and share your results. I will commit this at the end of this week if there is no objection.

In D20172#433201, @junbuml wrote:

I will commit this at the end of this week if there is no objection.

No objection here.

Have you considered deciding this as a MachineCombiner pattern? This would be a good place to know if the loop is arithmetic or load/store heavy.

I think MachineCombiner is also good place to perform this optimization with minor changes in the profitability check. As of now, however, I don't have any case impacted by this optimization. So, I will deprioritize doing it until I can find the cases.

Closed by commit rL270251: [AArch64] Disable narrow load merge by default (authored by junbuml). · Explain WhyMay 20 2016, 11:52 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64LoadStoreOptimizer.cpp

2 lines

test/

CodeGen/

AArch64/

arm64-narrow-ldst-merge.ll

6 lines

Diff 56920

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> LdStLimit("aarch64-load-store-scan-limit",
cl::init(20), cl::Hidden);		cl::init(20), cl::Hidden);

// The UpdateLimit limits how far we search for update instructions when we form		// The UpdateLimit limits how far we search for update instructions when we form
// pre-/post-index instructions.		// pre-/post-index instructions.
static cl::opt<unsigned> UpdateLimit("aarch64-update-scan-limit", cl::init(100),		static cl::opt<unsigned> UpdateLimit("aarch64-update-scan-limit", cl::init(100),
cl::Hidden);		cl::Hidden);

static cl::opt<bool> EnableNarrowLdMerge("enable-narrow-ld-merge", cl::Hidden,		static cl::opt<bool> EnableNarrowLdMerge("enable-narrow-ld-merge", cl::Hidden,
cl::init(true),		cl::init(false),
cl::desc("Enable narrow load merge"));		cl::desc("Enable narrow load merge"));

namespace llvm {		namespace llvm {
void initializeAArch64LoadStoreOptPass(PassRegistry &);		void initializeAArch64LoadStoreOptPass(PassRegistry &);
}		}

#define AARCH64_LOAD_STORE_OPT_NAME "AArch64 load / store optimization pass"		#define AARCH64_LOAD_STORE_OPT_NAME "AArch64 load / store optimization pass"

▲ Show 20 Lines • Show All 1,895 Lines • Show Last 20 Lines

test/CodeGen/AArch64/arm64-narrow-ldst-merge.ll

	; RUN: llc < %s -mtriple aarch64--none-eabi -mcpu=cortex-a57 -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=LE			; RUN: llc < %s -mtriple aarch64--none-eabi -mcpu=cortex-a57 -verify-machineinstrs -enable-narrow-ld-merge=true \| FileCheck %s --check-prefix=CHECK --check-prefix=LE
	; RUN: llc < %s -mtriple aarch64_be--none-eabi -mcpu=cortex-a57 -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=BE			; RUN: llc < %s -mtriple aarch64_be--none-eabi -mcpu=cortex-a57 -verify-machineinstrs -enable-narrow-ld-merge=true \| FileCheck %s --check-prefix=CHECK --check-prefix=BE
	; RUN: llc < %s -mtriple aarch64--none-eabi -mcpu=kryo -verify-machineinstrs \| FileCheck %s --check-prefix=CHECK --check-prefix=LE			; RUN: llc < %s -mtriple aarch64--none-eabi -mcpu=kryo -verify-machineinstrs -enable-narrow-ld-merge=true \| FileCheck %s --check-prefix=CHECK --check-prefix=LE

	; CHECK-LABEL: Ldrh_merge			; CHECK-LABEL: Ldrh_merge
	; CHECK-NOT: ldrh			; CHECK-NOT: ldrh
	; CHECK: ldr [[NEW_DEST:w[0-9]+]]			; CHECK: ldr [[NEW_DEST:w[0-9]+]]
	; CHECK-DAG: and [[LO_PART:w[0-9]+]], [[NEW_DEST]], #0xffff			; CHECK-DAG: and [[LO_PART:w[0-9]+]], [[NEW_DEST]], #0xffff
	; CHECK-DAG: lsr [[HI_PART:w[0-9]+]], [[NEW_DEST]], #16			; CHECK-DAG: lsr [[HI_PART:w[0-9]+]], [[NEW_DEST]], #16
	; LE: sub {{w[0-9]+}}, [[LO_PART]], [[HI_PART]]			; LE: sub {{w[0-9]+}}, [[LO_PART]], [[HI_PART]]
	; BE: sub {{w[0-9]+}}, [[HI_PART]], [[LO_PART]]			; BE: sub {{w[0-9]+}}, [[HI_PART]], [[LO_PART]]
	▲ Show 20 Lines • Show All 485 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Disable narrow load merge by defaultClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56920

lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

test/CodeGen/AArch64/arm64-narrow-ldst-merge.ll

[AArch64] Disable narrow load merge by default
ClosedPublic