This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Simple GVN hoist
AbandonedPublic

Authored by chill on Sep 14 2021, 6:35 AM.

Details

Summary

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-September/152665.html

This patch implements simple hoisting of instructions from two
single-predecessor blocks to their common predecessor, as a subroutine
in the GVN pass.

The patch pairs two instructions (A and B) with the same value number,
moves A to the predecessor block, replaces all uses of B with A, and
deletes B.

Instructions are paired via sort/merge (could be hashing instead).

Certain instructions act as "hoist barriers" in the sense that they
stop scanning the block for more hoist candidates, thus preventing
instructions to be reordering above them. They themselves can be
hoisted, though, which creates opportunities to hoist other
instructions, which is achieved by consecutive iterations of the
transformation.

Consecutive iterations are also needed to handle loads, which have
unique value numbers.

Initial benchmarking on Neoverse N1 looks good (speedup, higher is
better):

500.perlbench_r 1.13%
502.gcc_r 0.00%
505.mcf_r -1.89%
520.omnetpp_r 0.00%
523.xalancbmk_r 0.00%
525.x264_r 7.67%
531.deepsjeng_r 0.60%
541.leela_r 0.24%
548.exchange2_r 0.00%
557.xz_r 0.75%

(There's that 2% regression in mcf that I've not investigated yet).

Diff Detail

Event Timeline

chill created this revision.Sep 14 2021, 6:35 AM
chill requested review of this revision.Sep 14 2021, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 14 2021, 6:35 AM
chill added inline comments.Sep 14 2021, 6:37 AM
llvm/lib/Transforms/Scalar/GVN.cpp
672

Oops, leftover change, will revert.

chill edited the summary of this revision. (Show Details)Sep 14 2021, 7:11 AM
chill edited the summary of this revision. (Show Details)Sep 14 2021, 7:20 AM
lkail added a subscriber: lkail.Sep 14 2021, 8:24 AM
chill updated this revision to Diff 372492.Sep 14 2021, 8:40 AM

Quick question on the perf numbers. Do you know what the problem is with MCF? Would be good if we can avoid this regression, then the numbers are even better....

xbolva00 added a subscriber: xbolva00.EditedSep 14 2021, 12:09 PM

I believe the latest issues with gvnhoist were regressions, what about to take a look at this pass again and possibly enable (some “obviously” profitable) parts of gvnhoist again?

Did you run spec benchmarks with trunk+gvnhoist?

Thanks for your patch!

llvm/lib/Transforms/Scalar/GVN.cpp
2494

It is better to add parameter like gvn-hoist-max-chain-length in gvnhoist here based on running data from spec.

2842

can you give an example that when I can not transfer execution, how do subsequent instructions in BB to be hoisted? IMHO, collectHoistCandidates should stop here.

2860

Is there any reason not to handle GEPs?

3034

ditto

I believe the latest issues with gvnhoist were regressions, what about to take a look at this pass again and possibly enable (some “obviously” profitable) parts of gvnhoist again?

Did you run spec benchmarks with trunk+gvnhoist?

I was testing spce cpu2017 with gvnhoist recently. perlbench and gcc have regression. It seems that the default value of gvnhoist parameters are too aggressive.

As for mcf, I guess it may casued by large register pressure due to hoist some expression from some huge successor. So simple cost model is necessary.

chill added a comment.Sep 15 2021, 4:39 AM

I believe the latest issues with gvnhoist were regressions, what about to take a look at this pass again and possibly enable (some “obviously” profitable) parts of gvnhoist again?

Did you run spec benchmarks with trunk+gvnhoist?

I did and indeed I got even better speedup, but with -O2 -mllvm -enable-gvnhoist. With -O3 there was no improvement (and maybe a regression) (due to unrolling, see thread on the mailing list).

As for mcf, I guess it may casued by large register pressure due to hoist some expression from some huge successor. So simple cost model is necessary.

Quick question on the perf numbers. Do you know what the problem is with MCF? Would be good if we can avoid this regression, then the numbers are even better....

No idea why MCF regresses. My plan was to look at MCF regressions and why -O2 -enable-gvnhoist gives better numbers than -O3 + this patch.

chill planned changes to this revision.Sep 15 2021, 5:00 AM
xbolva00 added a subscriber: fhahn.Sep 15 2021, 6:33 AM

I believe the latest issues with gvnhoist were regressions, what about to take a look at this pass again and possibly enable (some “obviously” profitable) parts of gvnhoist again?

Did you run spec benchmarks with trunk+gvnhoist?

I did and indeed I got even better speedup, but with -O2 -mllvm -enable-gvnhoist. With -O3 there was no improvement (and maybe a regression) (due to unrolling, see thread on the mailing list).

As for mcf, I guess it may casued by large register pressure due to hoist some expression from some huge successor. So simple cost model is necessary.

Quick question on the perf numbers. Do you know what the problem is with MCF? Would be good if we can avoid this regression, then the numbers are even better....

No idea why MCF regresses. My plan was to look at MCF regressions and why -O2 -enable-gvnhoist gives better numbers than -O3 + this patch.

There is ongoing work to enhance slp with memory versioning by @fhahn - it should help

chill marked 3 inline comments as done.Sep 15 2021, 12:52 PM
chill added inline comments.
llvm/lib/Transforms/Scalar/GVN.cpp
2842

For example:

define dso_local i32 @f2(i32 %c, i32 %a, i32 %b) {
entry:
  %tobool = icmp ne i32 %c, 0
  br i1 %tobool, label %if.then, label %if.else

if.then:
  call void @h()
  %0 = sdiv i32 %a, %b
  br label %if.end

if.else:
  call void @h()
  %1 = sdiv i32 %a, %b
  br label %if.end

if.end:
  %2  = phi i32 [%0, %if.then], [%1, %if.else]
  ret i32 %2
}

declare dso_local void @h() readnone

%0 and %1 will get the same value number, but should not be hoisted, unless call void @h() is also hoisted.
We would indeed stop the iteration in collectHoistCandidates, but hoist the call itself. Then on the next invocation of
performHoist we will have

define dso_local i32 @f2(i32 %c, i32 %a, i32 %b) {
entry:
  %tobool = icmp ne i32 %c, 0
  call void @h()
  br i1 %tobool, label %if.then, label %if.else

if.then:
  %0 = sdiv i32 %a, %b
  br label %if.end

if.else:
  %1 = sdiv i32 %a, %b
  br label %if.end

if.end:
  %2  = phi i32 [%0, %if.then], [%1, %if.else]
  ret i32 %2
}

declare dso_local void @h() readnone

and then we'll move the sdiv too.

2860

That's to avoid hoisting GEPs, but leaving their users behind. I was following the comment from GVNHoist.cpp

Hoisting may affect the performance in some cases. To mitigate that, hoisting
is disabled in the following cases.
1. Scalars across calls.
2. geps when corresponding load/store cannot be hoisted.

I sort of assume someone measured and found that useful :D

3034

So, an instruction that reads or writes memory could be a local dependency. If you hoist such an instruction, it may turn the local dependency into a non-local one, thus allowing more hoisting.

I guess there are alternative approaches (see discussion on the ML) when we explicitly have a dependency instruction, e.g. a non-volatile load
followed by a non-volatile store, aliasing. Store can be hoisted only if the load is hoisted too.

Not clear what to do when we don't have an explicit dependency, e.g.

define dso_local i32 @f0(i32 %c, i32* %p, i32* %q) {
entry:
  %tobool = icmp ne i32 %c, 0
  br i1 %tobool, label %if.then, label %if.else

if.then:
  store volatile i32 0, i32* %p
  store volatile i32 0, i32* %q
  br label %if.end

if.else:
  store volatile i32 0, i32* %p
  store volatile i32 0, i32* %q
  br label %if.end

if.end:
  ret i32 0
}

(Current patch does not handle this case as well, I guess because of MemDep caching).

junparser added inline comments.Sep 15 2021, 11:52 PM
llvm/lib/Transforms/Scalar/GVN.cpp
2842

The thing is can we hoist instruction when isGuaranteedToTransferExecutionToSuccessor is true or isVolatile or maythrow. I'm not sure about this. I believe we should not hoist them.

3034

same issue with volatiled load/store.

I believe the latest issues with gvnhoist were regressions, what about to take a look at this pass again and possibly enable (some “obviously” profitable) parts of gvnhoist again?

Did you run spec benchmarks with trunk+gvnhoist?

I did and indeed I got even better speedup, but with -O2 -mllvm -enable-gvnhoist. With -O3 there was no improvement (and maybe a regression) (due to unrolling, see thread on the mailing list).

my test is under O3 + LTO

As for mcf, I guess it may casued by large register pressure due to hoist some expression from some huge successor. So simple cost model is necessary.

Quick question on the perf numbers. Do you know what the problem is with MCF? Would be good if we can avoid this regression, then the numbers are even better....

No idea why MCF regresses. My plan was to look at MCF regressions and why -O2 -enable-gvnhoist gives better numbers than -O3 + this patch.

With unrolling, hoist can infer longer lifetime for variable which may cause larger register pressure.

chill abandoned this revision.Sep 30 2021, 6:32 AM
chill marked 6 inline comments as done.
llvm/lib/Transforms/Scalar/GVN.cpp
2494

It's in the works.

2842

We don't hoist individual instructions - only pairs. This is part of the way we make sure a we won't speculatively execute volatile/atomic instructions, nor we'd change the number and ordering of these in any execution sequence. If all other correctness conditions are satisfied, we can merge a volatile (or atomic) instruction with an instruction from the other block with the same volatile-ty and atomic ordering. This is fixed in the new series of patches.