This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
GVN.cpp
-
test/Transforms/GVN/PRE/
-
Transforms/
-
GVN/
-
PRE/
-
pre-gep-load.ll

Differential D55009

[GVN] Don't perform scalar PRE on GEPs
ClosedPublic

Authored by labrinea on Nov 28 2018, 9:31 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
john.brawn

Commits

rGe4c91f5c4c52: [GVN] Don't perform scalar PRE on GEPs
rL348496: [GVN] Don't perform scalar PRE on GEPs

Summary

Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from sinking the addressing mode computation of memory instructions back to its uses. The problem comes from the insertion of PHIs, which confuse CGP and make it bail.

I found this problem when looking at sqlite amalgamation from https://www.sqlite.org/download.html.

We could teach CGP to look through PHI nodes in FindAllMemoryUses but this would increase the compilation time (currently scanning is limited to 20 memory instructions - sqlite needs 6 times more). Moreover, CGP still wouldn't be able to handle GEPs that have different base and offset but correspond to the same Value Number (like in the regression test).

This looks good for performance and codesize. I am posting some performance numbers targeting Cortex-A57 AArch64 reported by LNT for llvm-test-suite, spec2000, and spec2006 at -O3 using a resent LLVM trunk revision with my patch applied.

Performance Improvements - execution_time
MultiSource/Benchmarks/FreeBench/mason/mason -15.28%
External/SPEC/CINT2000/253.perlbmk/253.perlbmk -4.07%
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk -3.38%
External/SPEC/CINT2006/401.bzip2/401.bzip2 -2.82%
MultiSource/Benchmarks/Olden/em3d/em3d -2.81%
SingleSource/Benchmarks/Shootout-C++/Shootout-C++-heapsort -2.67%
SingleSource/Benchmarks/Shootout/Shootout-heapsort -2.24%
MultiSource/Benchmarks/Bullet/bullet -1.37%
SingleSource/Benchmarks/Adobe-C++/stepanov_vector -1.15%

Performance Regressions - execution_time
External/SPEC/CINT2006/400.perlbench/400.perlbench 1.45%

Performance Improvements - mem_bytes
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan -2.68%
MultiSource/Benchmarks/Olden/tsp/tsp -2.14%
MultiSource/Benchmarks/FreeBench/mason/mason -1.27%

Diff Detail

Repository: rL LLVM

Event Timeline

labrinea created this revision.Nov 28 2018, 9:31 AM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptNov 28 2018, 9:31 AM

It may be worthwhile allowing scalar PRE on GEPs that we know won't be combined into the addressing mode of a load/store, i.e. those where TargetTransformInfo::isLegalAddressingMode returns false.

dmgreen added a subscriber: dmgreen.Nov 30 2018, 2:38 AM

In D55009#1311601, @john.brawn wrote:

It may be worthwhile allowing scalar PRE on GEPs that we know won't be combined into the addressing mode of a load/store, i.e. those where TargetTransformInfo::isLegalAddressingMode returns false.

That would require to run the Target IR Analysis before GVN. It would also require an Address Mode Matcher like the one CodeGenPrepare implements. It doesn't seem worthwhile to me unless there's another way I haven't thought about.

labrinea edited reviewers, added: t.p.northover, john.brawn; removed: llvm-commits.Nov 30 2018, 3:55 AM

labrinea added a subscriber: llvm-commits.Nov 30 2018, 7:09 AM

junbuml added a subscriber: junbuml.Nov 30 2018, 7:26 AM

The idea behind allowing Scalar PRE when the addressing mode isn't legal is for some thing like (adjusted from pre-gep-load.ll)

define void @foo(i32 %stat, i32 %i, double* %p, double* %q) {
entry:
  switch i32 %stat, label %sw.default [
    i32 0, label %sw.bb
    i32 1, label %sw.bb
    i32 2, label %sw.bb2
  ]

sw.bb:                                            ; preds = %entry, %entry
  %arrayidx1 = getelementptr inbounds double, double* %p, i64 1234567
  store double 1.0, double* %arrayidx1, align 8
  br i1 undef, label %if.then, label %if.end

if.then:                                          ; preds = %sw.bb
  br label %return

if.end:                                           ; preds = %sw.bb
  br label %sw.bb2

sw.bb2:                                           ; preds = %if.end, %entry
  %arrayidx5 = getelementptr inbounds double, double* %p, i64 1234567
  store double 0.0, double* %arrayidx5, align 8
  br label %return

sw.default:                                       ; preds = %entry
  br label %return

return:                                           ; preds = %sw.default, %sw.bb2, %if.then
  ret void
}

The offset in the gep is too big to use as an immediate offset in AArch64, so we'd like to not materialise the same constant twice to use as the offset, but doing PRE on the GEP means we get

	mov	w8, #46136
	movk	w8, #150, lsl #16
	add	x8, x2, x8
.LBB0_4:                                // %sw.bb2
	str	xzr, [x8]

but we really want

	mov	w8, #46136
	movk	w8, #150, lsl #16
.LBB0_4:                                // %sw.bb2
	str	xzr, [x2, x8]

i.e. we'd like to PRE the offset generation but not the add. So I think that side of things is better handled by some kind of MachinePRE, and for GVN it's fine to just refuse to do scalar PRE on GEPs (i.e. what this patch does).

john.brawn added inline comments.Dec 4 2018, 8:17 AM

test/Transforms/GVN/PRE/pre-gep-load.ll
1 ↗	(On Diff #175696)	The version of this file that I see in trunk doesn't have these autogenerated check lines. Is there intended to be a commit before this that adjusts the test?

labrinea marked an inline comment as done.Dec 4 2018, 9:07 AM

labrinea added inline comments.

test/Transforms/GVN/PRE/pre-gep-load.ll
1 ↗	(On Diff #175696)	Indeed. I've just added them in this revision to demonstrate the difference. I wasn't intending to do a separate commit unless it's necessary.

OK, then this looks good to me.

This revision is now accepted and ready to land.Dec 5 2018, 10:08 AM

Closed by commit rL348496: [GVN] Don't perform scalar PRE on GEPs (authored by alelab01). · Explain WhyDec 6 2018, 8:15 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

GVN.cpp

10 lines

test/

Transforms/

GVN/

PRE/

pre-gep-load.ll

74 lines

Diff 176981

llvm/trunk/lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 2,150 Lines • ▼ Show 20 Lines	bool GVN::performScalarPRE(Instruction *CurInst) {

// Don't do PRE on compares. The PHI would prevent CodeGenPrepare from		// Don't do PRE on compares. The PHI would prevent CodeGenPrepare from
// sinking the compare again, and it would force the code generator to		// sinking the compare again, and it would force the code generator to
// move the i1 from processor flags or predicate registers into a general		// move the i1 from processor flags or predicate registers into a general
// purpose register.		// purpose register.
if (isa<CmpInst>(CurInst))		if (isa<CmpInst>(CurInst))
return false;		return false;

		// Don't do PRE on GEPs. The inserted PHI would prevent CodeGenPrepare from
		// sinking the addressing mode computation back to its uses. Extending the
		// GEP's live range increases the register pressure, and therefore it can
		// introduce unnecessary spills.
		//
		// This doesn't prevent Load PRE. PHI translation will make the GEP available
		// to the load by moving it to the predecessor block if necessary.
		if (isa<GetElementPtrInst>(CurInst))
		return false;

// We don't currently value number ANY inline asm calls.		// We don't currently value number ANY inline asm calls.
if (CallInst *CallI = dyn_cast<CallInst>(CurInst))		if (CallInst *CallI = dyn_cast<CallInst>(CurInst))
if (CallI->isInlineAsm())		if (CallI->isInlineAsm())
return false;		return false;

uint32_t ValNo = VN.lookup(CurInst);		uint32_t ValNo = VN.lookup(CurInst);

// Look for the predecessors for PRE opportunities. We're		// Look for the predecessors for PRE opportunities. We're
▲ Show 20 Lines • Show All 415 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/GVN/PRE/pre-gep-load.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -gvn -enable-load-pre -S \| FileCheck %s			; RUN: opt < %s -basicaa -gvn -enable-load-pre -S \| FileCheck %s
	; RUN: opt < %s -aa-pipeline=basic-aa -passes=gvn -enable-load-pre -S \| FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes=gvn -enable-load-pre -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	define double @foo(i32 %stat, i32 %i, double** %p) {			define double @foo(i32 %stat, i32 %i, double** %p) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: switch i32 [[STAT:%.]], label [[SW_DEFAULT:%.]] [
				; CHECK-NEXT: i32 0, label [[SW_BB:%.*]]
				; CHECK-NEXT: i32 1, label [[SW_BB]]
				; CHECK-NEXT: i32 2, label [[ENTRY_SW_BB2_CRIT_EDGE:%.*]]
				; CHECK-NEXT: ]
				; CHECK: entry.sw.bb2_crit_edge:
				; CHECK-NEXT: [[DOTPRE:%.]] = load double, double** [[P:%.*]], align 8
				; CHECK-NEXT: [[DOTPRE1:%.]] = sext i32 [[I:%.]] to i64
				; CHECK-NEXT: [[ARRAYIDX5_PHI_TRANS_INSERT:%.]] = getelementptr inbounds double, double [[DOTPRE]], i64 [[DOTPRE1]]
				; CHECK-NEXT: [[DOTPRE2:%.]] = load double, double [[ARRAYIDX5_PHI_TRANS_INSERT]], align 8
				; CHECK-NEXT: br label [[SW_BB2:%.*]]
				; CHECK: sw.bb:
				; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[I]] to i64
				; CHECK-NEXT: [[TMP0:%.]] = load double, double** [[P]], align 8
				; CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds double, double [[TMP0]], i64 [[IDXPROM]]
				; CHECK-NEXT: [[TMP1:%.]] = load double, double [[ARRAYIDX1]], align 8
				; CHECK-NEXT: [[SUB:%.*]] = fsub double [[TMP1]], 1.000000e+00
				; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[SUB]], 0.000000e+00
				; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.]], label [[IF_END:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: br label [[RETURN:%.*]]
				; CHECK: if.end:
				; CHECK-NEXT: br label [[SW_BB2]]
				; CHECK: sw.bb2:
				; CHECK-NEXT: [[TMP2:%.*]] = phi double [ [[DOTPRE2]], [[ENTRY_SW_BB2_CRIT_EDGE]] ], [ [[TMP1]], [[IF_END]] ]
				; CHECK-NEXT: [[IDXPROM3_PRE_PHI:%.*]] = phi i64 [ [[DOTPRE1]], [[ENTRY_SW_BB2_CRIT_EDGE]] ], [ [[IDXPROM]], [[IF_END]] ]
				; CHECK-NEXT: [[TMP3:%.]] = phi double [ [[DOTPRE]], [[ENTRY_SW_BB2_CRIT_EDGE]] ], [ [[TMP0]], [[IF_END]] ]
				; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[TMP3]], i64 [[IDXPROM3_PRE_PHI]]
				; CHECK-NEXT: [[SUB6:%.*]] = fsub double 3.000000e+00, [[TMP2]]
				; CHECK-NEXT: store double [[SUB6]], double* [[ARRAYIDX5]]
				; CHECK-NEXT: br label [[RETURN]]
				; CHECK: sw.default:
				; CHECK-NEXT: br label [[RETURN]]
				; CHECK: return:
				; CHECK-NEXT: [[RETVAL_0:%.*]] = phi double [ 0.000000e+00, [[SW_DEFAULT]] ], [ [[SUB6]], [[SW_BB2]] ], [ [[SUB]], [[IF_THEN]] ]
				; CHECK-NEXT: ret double [[RETVAL_0]]
				;
	entry:			entry:
	switch i32 %stat, label %sw.default [			switch i32 %stat, label %sw.default [
	i32 0, label %sw.bb			i32 0, label %sw.bb
	i32 1, label %sw.bb			i32 1, label %sw.bb
	i32 2, label %sw.bb2			i32 2, label %sw.bb2
	]			]

	sw.bb: ; preds = %entry, %entry			sw.bb: ; preds = %entry, %entry
	%idxprom = sext i32 %i to i64			%idxprom = sext i32 %i to i64
	%arrayidx = getelementptr inbounds double, double* %p, i64 0			%arrayidx = getelementptr inbounds double, double* %p, i64 0
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%arrayidx1 = getelementptr inbounds double, double* %0, i64 %idxprom			%arrayidx1 = getelementptr inbounds double, double* %0, i64 %idxprom
	%1 = load double, double* %arrayidx1, align 8			%1 = load double, double* %arrayidx1, align 8
	%sub = fsub double %1, 1.000000e+00			%sub = fsub double %1, 1.000000e+00
	%cmp = fcmp olt double %sub, 0.000000e+00			%cmp = fcmp olt double %sub, 0.000000e+00
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	if.then: ; preds = %sw.bb			if.then: ; preds = %sw.bb
	br label %return			br label %return

	if.end: ; preds = %sw.bb			if.end: ; preds = %sw.bb
	br label %sw.bb2			br label %sw.bb2

	sw.bb2: ; preds = %if.end, %entry			sw.bb2: ; preds = %if.end, %entry
	%idxprom3 = sext i32 %i to i64			%idxprom3 = sext i32 %i to i64
	%arrayidx4 = getelementptr inbounds double, double* %p, i64 0			%arrayidx4 = getelementptr inbounds double, double* %p, i64 0
	%2 = load double, double* %arrayidx4, align 8			%2 = load double, double* %arrayidx4, align 8
	%arrayidx5 = getelementptr inbounds double, double* %2, i64 %idxprom3			%arrayidx5 = getelementptr inbounds double, double* %2, i64 %idxprom3
	%3 = load double, double* %arrayidx5, align 8			%3 = load double, double* %arrayidx5, align 8
	; CHECK: sw.bb2:
	; CHECK-NOT: sext
	; CHECK: phi double [
	; CHECK-NOT: load
	%sub6 = fsub double 3.000000e+00, %3			%sub6 = fsub double 3.000000e+00, %3
				store double %sub6, double* %arrayidx5
	br label %return			br label %return

	sw.default: ; preds = %entry			sw.default: ; preds = %entry
	br label %return			br label %return

	return: ; preds = %sw.default, %sw.bb2, %if.then			return: ; preds = %sw.default, %sw.bb2, %if.then
	%retval.0 = phi double [ 0.000000e+00, %sw.default ], [ %sub6, %sw.bb2 ], [ %sub, %if.then ]			%retval.0 = phi double [ 0.000000e+00, %sw.default ], [ %sub6, %sw.bb2 ], [ %sub, %if.then ]
	ret double %retval.0			ret double %retval.0
	}			}

	; The load causes the GEP's operands to be PREd earlier than normal. The			; The load causes the GEP's operands to be PREd earlier than normal. The
	; resulting sext ends up in pre.dest and in the GVN system before that BB is			; resulting sext ends up in pre.dest and in the GVN system before that BB is
	; actually processed. Make sure we can deal with the situation.			; actually processed. Make sure we can deal with the situation.

	define void @test_shortcut_safe(i1 %tst, i32 %p1, i32* %a) {			define void @test_shortcut_safe(i1 %tst, i32 %p1, i32* %a) {
	; CHECK-LABEL: define void @test_shortcut_safe			; CHECK-LABEL: @test_shortcut_safe(
	; CHECK: [[SEXT1:%.*]] = sext i32 %p1 to i64			; CHECK-NEXT: br i1 [[TST:%.]], label [[SEXT1:%.]], label [[DOTPRE_DEST_CRIT_EDGE:%.*]]
	; CHECK: [[PHI1:%.]] = phi i64 [ [[SEXT1]], {{%.}} ], [ [[PHI2:%.]], {{%.}} ]			; CHECK: .pre.dest_crit_edge:
	; CHECK: [[SEXT2:%.*]] = sext i32 %p1 to i64			; CHECK-NEXT: [[DOTPRE1:%.]] = sext i32 [[P1:%.]] to i64
	; CHECK: [[PHI2]] = phi i64 [ [[SEXT2]], {{.}} ], [ [[PHI1]], {{%.}} ]			; CHECK-NEXT: br label [[PRE_DEST:%.*]]
	; CHECK: getelementptr inbounds i32, i32* %a, i64 [[PHI2]]			; CHECK: pre.dest:
				; CHECK-NEXT: [[DOTPRE_PRE_PHI:%.]] = phi i64 [ [[DOTPRE1]], [[DOTPRE_DEST_CRIT_EDGE]] ], [ [[IDXPROM2_PRE_PHI:%.]], [[SEXT_USE:%.*]] ]
				; CHECK-NEXT: br label [[SEXT_USE]]
				; CHECK: sext1:
				; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[P1]] to i64
				; CHECK-NEXT: br label [[SEXT_USE]]
				; CHECK: sext.use:
				; CHECK-NEXT: [[IDXPROM2_PRE_PHI]] = phi i64 [ [[IDXPROM]], [[SEXT1]] ], [ [[DOTPRE_PRE_PHI]], [[PRE_DEST]] ]
				; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[IDXPROM2_PRE_PHI]]
				; CHECK-NEXT: [[VAL:%.]] = load i32, i32 [[ARRAYIDX3]], align 4
				; CHECK-NEXT: tail call void @g(i32 [[VAL]])
				; CHECK-NEXT: br label [[PRE_DEST]]
				;

	br i1 %tst, label %sext1, label %pre.dest			br i1 %tst, label %sext1, label %pre.dest

	pre.dest:			pre.dest:
	br label %sext.use			br label %sext.use

	sext1:			sext1:
	%idxprom = sext i32 %p1 to i64			%idxprom = sext i32 %p1 to i64
	Show All 11 Lines