This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Don't perform scalar PRE on GEPs
ClosedPublic

Authored by labrinea on Nov 28 2018, 9:31 AM.

Details

Summary

Partial Redundancy Elimination of GEPs prevents CodeGenPrepare from sinking the addressing mode computation of memory instructions back to its uses. The problem comes from the insertion of PHIs, which confuse CGP and make it bail.

I found this problem when looking at sqlite amalgamation from https://www.sqlite.org/download.html.

We could teach CGP to look through PHI nodes in FindAllMemoryUses but this would increase the compilation time (currently scanning is limited to 20 memory instructions - sqlite needs 6 times more). Moreover, CGP still wouldn't be able to handle GEPs that have different base and offset but correspond to the same Value Number (like in the regression test).

This looks good for performance and codesize. I am posting some performance numbers targeting Cortex-A57 AArch64 reported by LNT for llvm-test-suite, spec2000, and spec2006 at -O3 using a resent LLVM trunk revision with my patch applied.

Performance Improvements - execution_time
MultiSource/Benchmarks/FreeBench/mason/mason -15.28%
External/SPEC/CINT2000/253.perlbmk/253.perlbmk -4.07%
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk -3.38%
External/SPEC/CINT2006/401.bzip2/401.bzip2 -2.82%
MultiSource/Benchmarks/Olden/em3d/em3d -2.81%
SingleSource/Benchmarks/Shootout-C++/Shootout-C++-heapsort -2.67%
SingleSource/Benchmarks/Shootout/Shootout-heapsort -2.24%
MultiSource/Benchmarks/Bullet/bullet -1.37%
SingleSource/Benchmarks/Adobe-C++/stepanov_vector -1.15%

Performance Regressions - execution_time
External/SPEC/CINT2006/400.perlbench/400.perlbench 1.45%

Performance Improvements - mem_bytes
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan -2.68%
MultiSource/Benchmarks/Olden/tsp/tsp -2.14%
MultiSource/Benchmarks/FreeBench/mason/mason -1.27%

Diff Detail

Repository
rL LLVM

Event Timeline

labrinea created this revision.Nov 28 2018, 9:31 AM

It may be worthwhile allowing scalar PRE on GEPs that we know won't be combined into the addressing mode of a load/store, i.e. those where TargetTransformInfo::isLegalAddressingMode returns false.

It may be worthwhile allowing scalar PRE on GEPs that we know won't be combined into the addressing mode of a load/store, i.e. those where TargetTransformInfo::isLegalAddressingMode returns false.

That would require to run the Target IR Analysis before GVN. It would also require an Address Mode Matcher like the one CodeGenPrepare implements. It doesn't seem worthwhile to me unless there's another way I haven't thought about.

The idea behind allowing Scalar PRE when the addressing mode isn't legal is for some thing like (adjusted from pre-gep-load.ll)

define void @foo(i32 %stat, i32 %i, double* %p, double* %q) {
entry:
  switch i32 %stat, label %sw.default [
    i32 0, label %sw.bb
    i32 1, label %sw.bb
    i32 2, label %sw.bb2
  ]

sw.bb:                                            ; preds = %entry, %entry
  %arrayidx1 = getelementptr inbounds double, double* %p, i64 1234567
  store double 1.0, double* %arrayidx1, align 8
  br i1 undef, label %if.then, label %if.end

if.then:                                          ; preds = %sw.bb
  br label %return

if.end:                                           ; preds = %sw.bb
  br label %sw.bb2

sw.bb2:                                           ; preds = %if.end, %entry
  %arrayidx5 = getelementptr inbounds double, double* %p, i64 1234567
  store double 0.0, double* %arrayidx5, align 8
  br label %return

sw.default:                                       ; preds = %entry
  br label %return

return:                                           ; preds = %sw.default, %sw.bb2, %if.then
  ret void
}

The offset in the gep is too big to use as an immediate offset in AArch64, so we'd like to not materialise the same constant twice to use as the offset, but doing PRE on the GEP means we get

	mov	w8, #46136
	movk	w8, #150, lsl #16
	add	x8, x2, x8
.LBB0_4:                                // %sw.bb2
	str	xzr, [x8]

but we really want

	mov	w8, #46136
	movk	w8, #150, lsl #16
.LBB0_4:                                // %sw.bb2
	str	xzr, [x2, x8]

i.e. we'd like to PRE the offset generation but not the add. So I think that side of things is better handled by some kind of MachinePRE, and for GVN it's fine to just refuse to do scalar PRE on GEPs (i.e. what this patch does).

john.brawn added inline comments.Dec 4 2018, 8:17 AM
test/Transforms/GVN/PRE/pre-gep-load.ll
1 ↗(On Diff #175696)

The version of this file that I see in trunk doesn't have these autogenerated check lines. Is there intended to be a commit before this that adjusts the test?

labrinea marked an inline comment as done.Dec 4 2018, 9:07 AM
labrinea added inline comments.
test/Transforms/GVN/PRE/pre-gep-load.ll
1 ↗(On Diff #175696)

Indeed. I've just added them in this revision to demonstrate the difference. I wasn't intending to do a separate commit unless it's necessary.

john.brawn accepted this revision.Dec 5 2018, 10:08 AM

OK, then this looks good to me.

This revision is now accepted and ready to land.Dec 5 2018, 10:08 AM
This revision was automatically updated to reflect the committed changes.