This is an archive of the discontinued LLVM Phabricator instance.

New code hoisting pass based on GVN (optimistic approach)
ClosedPublic

Authored by sebpop on Apr 20 2016, 12:25 PM.

Details

Summary

This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining. This implementation is the optimistic approach
that Danny asked us to implement: we start from the set of all known equivalent
computations, and we discard unsafe hoisting situations.

Here is the algorithm Danny asked us to work on last time:

Make a multimap from VN to each expression with that VN (VNTable is not currently this) over the entire program.
For each VN in the table:
 if (size (expressions with a given VN) > 1):
   For each block in domtree in DFS order:
       For each expression, if DFSin/DFSOut(expression->parent) within range of DFSin/DFSOut(block), push(possiblehoistlist (a list), expression (current expression), block (insertion point))
          If you have 2 or more things on possiblehoistlist, calculate availability of operands for each expression in block
              (you can cache the highest point you can move them to for later to checking again and again.
               Since you are using dominance, you know they can only be hoisted to blocks dominated by the highest point you can hoist to).
          If 2 or more things are still available, hoist
Note you can also likely skip any domtree block that does not have two or more children, i just haven't proven it to myself yet.

Because we "value number everything up front":

  • we VN things that do not matter for hoisting,
  • we VN all computations dependent on loads and that would forbid hoisting the whole dependence chain as each load gets a different VN (in the current GVN.cpp implementation.) The temporary solution is to hoist twice and invalidate the VN table in between.

The harder parts compared to the previous patch in http://reviews.llvm.org/D18798
are the detection of safety properties of hoisting:

  • no side effects on all paths between insert point and the original location of all hoistable expressions: we use a traversal of the inverse CFG from the expression to be hoisted to the insertion point to gather all the BBs on the execution paths that we have to check for side-effects.
  • to make hoisting efficient for scalars (and safe for hoisting load expressions) we need to prove that from the insertion point to the end of the function the expressions are needed on all paths.

There are still a few improvements to be implemented:

  • using memory SSA as a minimal data dependence analysis to improve the accuracy of the side effects analysis in loops.
  • computing hoisting subsets: for the moment we try to hoist all expressions with the same VN without checking whether a subset would be safe to hoist.

Passes llvm regression test, test-suite, and SPEC Cpu2006.
Over the c/c++ SPEC 2006 benchmarks the GVN-hoisting pass removes 55012 instructions.

  • Without the patch: Number of call sites deleted, not inlined: 20394 Number of functions deleted because all callers found: 70497 Number of functions inlined: 182119 Number of allocas merged together: 225 Number of caller-callers analyzed: 200361 Number of call sites analyzed: 445806
  • With the patch: Number of call sites deleted, not inlined: 20412 Number of functions deleted because all callers found: 70495 Number of functions inlined: 182122 Number of allocas merged together: 225 Number of caller-callers analyzed: 200360 Number of call sites analyzed: 445624 Number of hoisted instructions: 5435 Number of instructions removed: 55012

Spec2006 results are quite noisy:
(positive is an increase of the spec score in percent)

400.perlbench: 0
401.bzip2: 0
403.gcc: -2.00
429.mcf: -4.00
445.gobmk: 0
456.hmmer: 0
458.sjeng: 0
462.libquantum: -2.00
464.h264ref: 4.00
471.omnetpp: -1.00
473.astar: -1.00
433.milc: 15.00
444.namd: 0
447.dealII: 0
450.soplex: 0
453.povray: 0
470.lbm: 0
482.sphinx3: 1.00

Pass written by:

Sebastian Pop
Aditya Kumar
Xiaoyu Hu
Brian Rzycki
Daniel Berlin

Diff Detail

Repository
rL LLVM

Event Timeline

sebpop updated this revision to Diff 54398.Apr 20 2016, 12:25 PM
sebpop retitled this revision from to New code hoisting pass based on GVN (optimistic approach).
sebpop updated this object.
sebpop added reviewers: dberlin, chandlerc.
sebpop set the repository for this revision to rL LLVM.

Ooh, this looks shiny; thanks for doing it!

I have a few drive-by nits for you.

llvm/lib/Transforms/Scalar/GVNHoist.cpp
139 ↗(On Diff #54398)

Can we use a SmallPtrSet here?

(If you're not familiar with LLVM containers, SmallPtrSet<T, N> can be passed without the size (e.g. for parameters) as a SmallPtrSetImpl<T>, and it has a count(Foo) function, which is equivalent to set.find(Foo) != set.end(). :) )

186 ↗(On Diff #54398)

SmallPtrSet (x2) ?

209 ↗(On Diff #54398)

SmallVector?

232 ↗(On Diff #54398)

SmallPtrSet?

292 ↗(On Diff #54398)

std::move(InstructionsToHoist)?

302 ↗(On Diff #54398)

std::move(InstructionsToHoist)?

326 ↗(On Diff #54398)

Would for (const auto &HP : HPL) work instead?

330 ↗(On Diff #54398)

Can we take this by const& to avoid a copy?

sebpop updated this revision to Diff 54433.Apr 20 2016, 3:18 PM
sebpop removed rL LLVM as the repository for this revision.

Addressed all comments from George Burgess IV.
Amended patch passes regression tests and test-suite on x86_64-linux.
Thanks for the review!

hiraditya set the repository for this revision to rL LLVM.Apr 22 2016, 7:38 AM
sebpop removed rL LLVM as the repository for this revision.Apr 22 2016, 11:37 AM
sebpop updated this revision to Diff 54684.Apr 22 2016, 11:37 AM

The updated patch from Aditya has some cleanups, compile time improvements, and uses less memory.

We measured the total number of instructions executed when "clang -cc1" is compiling all the preprocessed files of the llvm test-suite through valgrind:

  • with patch: 926457230716
  • without: 923166450214

Overall compilation overhead of the new pass is 0.35%.

dberlin edited edge metadata.Apr 22 2016, 11:42 AM
dberlin added a subscriber: dberlin.

And what is the performance improvement?

sebpop updated this revision to Diff 54703.Apr 22 2016, 12:31 PM
sebpop edited edge metadata.

Some more cleanups.

I tried this patch on our Hexagon compiler to see what impact the pass had on some of our performance benchmarks (mostly embedded programs). The biggest improvement was 1.5% and the biggest degradation was -1.8% (in spec2K/twolf). Most differences were under 1%. Many benchmarks were unchanged. I didn't look at code size though.

Also, I did see one test failure due to infinite recursion caused by some odd looking IR as input to the pass. The odd IR is generated by the jump threading pass.

for.cond:                                         ; preds = %for.cond
  %inc113 = add nsw i32 %inc113, 1
  br label %for.cond

opt -gvn-hoist -S < gvn-bug.ll

llvm/lib/Transforms/Scalar/GVNHoist.cpp
66 ↗(On Diff #54703)

ID isn't used anywhere.

Also, I did see one test failure due to infinite recursion caused by some odd looking IR as input to the pass. The odd IR is generated by the jump threading pass.

for.cond:                                         ; preds = %for.cond
  %inc113 = add nsw i32 %inc113, 1
  br label %for.cond

The recursion is in llvm::GVN::ValueTable::lookup_or_add of %inc113 that is defined
recursively. As our pass is now value numbering all the program, it will try to VN this
variable and it will start the infinite recursion.

We could change the way we iterate over the CFG to only walk through BBs reachable from the function entry.
Another work around is to add -simplifycfg to remove the dead code.
The fix is to detect self-defined variables in GVN.cpp.

sebpop added inline comments.Apr 26 2016, 11:37 AM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
396 ↗(On Diff #54703)

Replacing the iteration over all the BBs of the function with a depth_first iteration fixes the infinite recursion problem reported by Brendon:

for (BasicBlock *BB : depth_first(&F.getEntryBlock()))
sebpop updated this revision to Diff 55095.Apr 26 2016, 2:36 PM

The updated patch computes partial hoisting locations for a subset of expressions, and also hoists stores and function calls.
Here are the stats for a build of the spec2006:
Number of hoisted instructions: 24021
Number of instructions removed: 74080

sebpop updated this revision to Diff 55499.Apr 28 2016, 3:33 PM

Update patch to use Memory SSA as a minimal data dependence analysis.
We may want to commit the fix to MemorySSA.cpp as a separate patch.

On the SPEC 2006 benchmarks with the patch (the number between parentheses is the difference against without the patch, positive is more with the patch):
Number of call sites deleted, not inlined: 20419 (+25)
Number of functions deleted because all callers found: 70393 (-104)
Number of functions inlined: 182452 (+333)
Number of allocas merged together: 227 (0)
Number of caller-callers analyzed: 200825 (+464)
Number of call sites analyzed: 446853 (+1047)
Number of hoisted instructions: 34977
Number of instructions removed: 88044

On the llvm test-suite the numbers are:
Number of hoisted instructions: 35125
Number of instructions removed: 234484

sebpop marked 10 inline comments as done.Apr 28 2016, 3:35 PM

These numbers looks really great. Out of curiosity, do you happen to have runtime measurements as well?

chandlerc edited edge metadata.Apr 28 2016, 5:42 PM
In D19338#416165, @joker.eph wrote:

These numbers looks really great. Out of curiosity, do you happen to have runtime measurements as well?

The original patch description has some runtime numbers from SPEC... But they're pretty up-and-down.

I'd be really interested in any analysis done on the 4% regression. Also I'd love to see the runtime numbers from the whole test suite.

sebpop updated this revision to Diff 56018.May 3 2016, 10:08 AM
sebpop edited edge metadata.

Some cleanups and also added a cost function to not hoist expressions too far from their original location.
By default we bound the hoisting to 4 levels and a maximum of 4 basic blocks on the paths to the hoisting point.
On the test-suite we now have the following stats:

gvn-hoist.*Number of instructions hoisted: 25324
gvn-hoist.*Number of instructions removed: 30393
gvn-hoist.*Number of loads hoisted: 11421
gvn-hoist.*Number of loads removed: 14687
gvn-hoist.*Number of stores hoisted: 25
gvn-hoist.*Number of stores removed: 25
gvn-hoist.*Number of calls hoisted: 10
gvn-hoist.*Number of calls removed: 10

We will post SPEC score improvements and test-suite compile time and execution time numbers for this revision of the patch.

sebpop added a comment.May 4 2016, 9:16 AM

The following experiments were performed with the last posted patch on an intel i7-4790K 4.4GHz x86_64-linux.

  • SPEC 2006 run 3 times in validation mode: the improvements are all in the INT part of the benchmark (we discarded everything less than +/-0.5% as noise.)

433.milc: 0
444.namd: 0
447.dealII: 0
450.soplex: 0
453.povray: 0
470.lbm: 0
482.sphinx3: 0
400.perlbench: 1.5
401.bzip2: 1.3
403.gcc: 0
429.mcf: 0
445.gobmk: 0
456.hmmer: 0
458.sjeng: 0
462.libquantum: 1.1
464.h264ref: 0
471.omnetpp: 1
473.astar: 1.5

  • On the SPEC 2006 we have the following compile time statistics:

Number of call sites deleted, not inlined: 406 (+0)
Number of functions deleted because all callers found: 70492 (+0)
Number of functions inlined: 202127 (-5)
Number of allocas merged together: 225 (+0)
Number of caller-callers analyzed: 225239 (-13)
Number of call sites analyzed: 488920 (-1748)
gvn-hoist.*Number of instructions hoisted: 24017
gvn-hoist.*Number of instructions removed: 26349
gvn-hoist.*Number of loads hoisted: 8482
gvn-hoist.*Number of loads removed: 9719
gvn-hoist.*Number of stores hoisted: 9
gvn-hoist.*Number of stores removed: 9
gvn-hoist.*Number of calls hoisted: 34
gvn-hoist.*Number of calls removed: 34

  • On the LLVM test-suite we have seen numbers that vary too much: about 30% noise.

We ran the test-suite 3 times and selected the best run, discarded all benchmarks running for less than 2 seconds, and discarded all results less than +/-2%:
On 2mm.c -stats reports that gvn-hoist does not hoist any instruction: that is just noise.
Most likely all llvm test-suite execution times that we report are noisy.

  • Execution time slow down (first two numbers are base and peak execution time in seconds, the third number is percent speedup, positive is better)

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/2mm.test: 6.728 6.104 -9.2700
MultiSource/Applications/JM/lencod/lencod.test: 2.928 2.836 -3.1400

  • Execution time speed up:

SingleSource/Benchmarks/Shootout/hash.test: 2.22 2.292 3.2400
SingleSource/Benchmarks/CoyoteBench/huffbench.test: 7.212 7.496 3.9300

  • Code size decrease: (first two numbers are code size of base and peak, the third number is percent improvement in code size, negative is better)

SingleSource/Benchmarks/Shootout/ary3.test: 12832 8736 -31.9200
MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test: 656680 652584 -.6200
MultiSource/Applications/sqlite3/sqlite3.test: 791408 787312 -.5100
MultiSource/Benchmarks/FreeBench/distray/distray.test: 18528 18488 -.2100
SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test: 45464 45400 -.1400
SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test: 46752 46696 -.1100
MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test: 45904 45856 -.1000
MultiSource/Applications/SIBsim4/SIBsim4.test: 69768 69728 -.0500

  • Code size increase (these regressions may be due to more inlining):

SingleSource/Benchmarks/Linpack/linpack-pc.test: 30128 30176 .1500
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1.test: 13600 13656 .4100
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test: 325224 329320 1.2500
MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test: 81272 85368 5.0300
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test: 79840 83936 5.1300
MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test: 42656 46752 9.6000

sebpop added a comment.May 4 2016, 3:13 PM

We measured the total number of instructions executed when "clang -cc1" is compiling all the preprocessed files of the llvm test-suite through valgrind:

with patch: 931904066692
without: 928877989824

Overall compilation overhead of the new pass is 0.32%.

sebpop updated this revision to Diff 56360.May 5 2016, 3:45 PM

Update heuristic to avoid hoisting geps without hoisting their ld/st, and also avoid hoisting scalars past calls that could increase register pressure.

sebpop updated this revision to Diff 57494.May 17 2016, 10:06 AM

Updated patch from Aditya: do not restrict hoisting when optimizing for code size.

Also Aditya asked to report the number of spills with/wo the patch.

  • On SPEC2006 on x86_64, without the patch:

Number of spills inserted: 39703
with the patch:
Number of spills inserted: 39502 (-201)

  • On the test-suite on x86_64, without the patch:

Number of spills inserted: 51148
with the patch:
Number of spills inserted: 51273 (+125)

majnemer added inline comments.
llvm/lib/Transforms/Scalar/GVNHoist.cpp
137–152 ↗(On Diff #57494)

What if the call is to a function which divides by zero? What would stop you from hoisting the call to it?

635–637 ↗(On Diff #57494)

Likewise, you need to make sure that the call has no side effects which is different from it not mutating memory.

This looks much better to start. Thank you for working so hard on it.
More comments coming, but i made a first pass.

llvm/lib/Transforms/Scalar/GVNHoist.cpp
213 ↗(On Diff #57494)

Errr, isn't this the definition of A post-dominating B or C?

A post-dominates B if all paths from A to end of function pass through B.
Same with (A, C).

If that's right, i would just use post-dominance here :)

425 ↗(On Diff #57494)

I wonder how expensive this computation ends up being. It never changes per-iteration unless something is messing with the CFG out from under you.

(This is why GCC uses et-splay trees)

635–637 ↗(On Diff #57494)

+1
Pure and const (in gcc parlance) are pretty much the only thing you can safely move.

665 ↗(On Diff #57494)

Note that you can get the DFS in/out numbers from the dominator tree if that is an acceptable ordering (IE DFS on DT).

hiraditya added inline comments.May 17 2016, 2:23 PM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
213 ↗(On Diff #57494)

It is more like B and C combined post dominating A.

425 ↗(On Diff #57494)

For each Instruction in the InstructionsToHoist, the nearest common dominator (w.r.t. HoistPt) could change.

e.g.,
A -> B -> C (has I1)
B-> D (has I2)
A -> E (has I3).

And if I1, I2 and I3 have the same GVN.
In this case nearestCommonDominator(C, D) = B, and, nearestCommonDominator(B, E) = A.

635–637 ↗(On Diff #57494)

Will do that. Thanks.

665 ↗(On Diff #57494)

It seems DFS number is not always available in the dominator tree. DFS numbers are updated only if there are too many (> 32) slow queries in GenericDomTree.h:468

sebpop added inline comments.May 17 2016, 2:25 PM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
137–152 ↗(On Diff #57494)

We do check for no EH and no BB address taken in between the original place of the expressions and the place we hoist it to.
After hoisting, if the call throws an exception it would happen a bit earlier than it used to,
although there shouldn't be any other exceptions or side effects happening in between the new and old place of the call.

We also check that all paths from the new location to the end of the function do have the exact call that we hoist.
So the call with the exception has to happen on all paths.

213 ↗(On Diff #57494)

Post-Dominance is computed on the reversed-CFG: (turning each edge in the opposite direction)
A post-dominates B if all paths from the end of the function to B on the reversed-CFG pass through A.

425 ↗(On Diff #57494)

The good thing is that the hoist pass does not change the CFG.
Overall compilation time is also 0.35% increase on x86_64 compiling all the test-suite through valgrind (see some of the previous experiments.)

635–637 ↗(On Diff #57494)

I will add a check for that. Thanks for catching this.

665 ↗(On Diff #57494)

I will look at that. I remember Aditya had a patch doing that: he needed to expose that numbering from the DT construction through a new function. From what I remember we reverted back to this numbering after some problem with that numbering. I will try to see if I can remove this loop from the patch.

sebpop updated this revision to Diff 59290.Jun 1 2016, 3:25 PM

Address Danny's and David's comments.

sebpop updated this revision to Diff 59454.Jun 2 2016, 2:08 PM

Rebased as of today.

Sorry, i'll get to this one today or tomorrow.

I took a second pass at it

llvm/lib/Transforms/Scalar/GVNHoist.cpp
81 ↗(On Diff #59454)

Do you actually use the ordering guarantee of multimap?

83 ↗(On Diff #59454)

Please document this class per the llvm style guidelines

96 ↗(On Diff #59454)

Please document this class per the llvm style guidelines

111 ↗(On Diff #59454)

Please document this class per the llvm style guidelines

125 ↗(On Diff #59454)

Please use std::pair, or define an appropriate key, not convert vn numbers to strings :)

249 ↗(On Diff #59454)

How expensive is this in practice (because it's another thing that, computed in the right order, will share most of the subcomputation work)?

672 ↗(On Diff #59454)

So, this is going to be super-expensive to do, and should be marked with a FIXME.
You should only compute it once, and I have the APIs you need to do updates being worked on in a different review.

You should also consider whether this should simply be a pass dependency.

llvm/lib/Transforms/Utils/MemorySSA.cpp
629 ↗(On Diff #59454)

This change needs to be explained and submitted as a separate patch :)

dberlin added inline comments.Jun 6 2016, 8:43 AM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
278 ↗(On Diff #59454)

I feel like this + gatherallblocks is really an up-safe/down-safe computation (depending on whether it's a load or store), and can be done much saner than you are doing it.

In particular, this seems a really complicated and expensive way to compute this property, compared to how most PRE papers do it.

To be specific (and feel free to tell me why i'm wrong, these functions are
a bit hard to decipher without comments):

For starters, for loads/stores I don't understand why you use
gatherallpaths and then check the memoryuses, instead of calculating
Nearest common dominator (or postdominator, depending)(hoist point, blocks
for all the uses).

Let's ignore the may-throw/etc issues for a second (which can be done with
a single block test)

For scalar computations, hoist point safety depends on whether you can
recompute the operands at that dominator, nothing else.

For load/stores, it's easier. For loads, if the memory state (ie the thing
the clobbering memorydef defines) dominates your hoistpoint, then it
becomes the same scalar question, because if you can re-make the load, you
know the memory-state will be the same.
For stores, if you are sinking, you are defining the memory state in a
given block, the only question is whether that memory state is killed
before it the use.

Checking domination is not necessary, the only way it can be true is if
every store produces the same VN, and it's intersection of all successors
to the sink point.

If you are trying to *hoist* stores, it's a much simpler problem, the point
you can hoist to is
nearest-common-dominator(getClobberingDefinition(store)->block, blocks of
all uses).

majnemer added inline comments.Jun 6 2016, 9:53 AM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
81 ↗(On Diff #59454)

I'd recommend a DenseMap from your key space to vector (or SmallVector) of Instruction *.

119–125 ↗(On Diff #59454)

I'd recommend hash_combine utilized in a DenseMapInfo instead of this logic.

dberlin added inline comments.Jun 6 2016, 9:54 AM
llvm/lib/Transforms/Scalar/GVNHoist.cpp
125 ↗(On Diff #59454)

(IE use hash_combine and make that work, or whatever)

sebpop marked 7 inline comments as done.Jun 6 2016, 1:03 PM
sebpop added inline comments.
llvm/lib/Transforms/Scalar/GVNHoist.cpp
81 ↗(On Diff #59454)

No, we do not use the ordering in the multimap.

125 ↗(On Diff #59454)

hash_combine works.

672 ↗(On Diff #59454)

I added a FIXME note. I agree that we should not throw away an expensive analysis.
I do not expect this loop to execute more than 2 or 3 times following the number of dependences between loading the address of another load/store.

sebpop marked 3 inline comments as done.Jun 6 2016, 1:04 PM
sebpop updated this revision to Diff 59769.Jun 6 2016, 1:07 PM

Update to address all comments from Danny and David.
Thanks for the reviews!

Is there something else we need to address in this patch before we can commit?

Thanks,
Sebastian

This pass fixes redundancy elimination in zlib as described in https://llvm.org/bugs/show_bug.cgi?id=22005

Sure, though note that's a size optimization :)
In any case, i will try to make a final review pass and give approval today
or tomorrow.

Sure, though note that's a size optimization :)
In any case, i will try to make a final review pass and give approval today
or tomorrow.

The patch is not only a code size improvement patch.
With this pass we have seen a 15% speedup on a proprietary benchmark,
and of course the spec 2006 is also slightly improving as we reported earlier.
There are also 5 or 6 code hoisting bugs in the GCC bugzilla that are all caught by this pass.

Thanks,
Sebastian

I'm very interested to understand where that speedup might come from. Do
you have a small example of what is happening you can share?

I'm very interested to understand where that speedup might come from. Do
you have a small example of what is happening you can share?

A reduced testcase from that benchmark is @scalarsHoisting.
When embedded in a loop, each iteration gets some benefit from the hoisting.
Also reported as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159
From that bug report:

$ cat h.c
float foo_p(float d, float min, float max, float a)
{

float tmin;
float tmax;

float inv = 1.0f / d;
if (inv >= 0) {
  tmin = (min - a) * inv;
  tmax = (max - a) * inv;
} else {
  tmin = (max - a) * inv;
  tmax = (min - a) * inv;
}

return tmax + tmin;

}

$ clang h.c -Ofast -S -o-
foo_p: @foo_p
BB#0: // %entry
fmov s4, #1.00000000
fdiv s0, s4, s0
fcmp s0, #0.0
fcsel s4, s1, s2, lt
fcsel s1, s2, s1, lt
fsub s1, s1, s3
fsub s2, s4, s3
fadd s1, s2, s1
fmul s0, s1, s0
ret

With the GVN-hoist pass we end up moving the two fmul and fsub up
between fdiv and fcmp, adding more latency between the fdiv and
the user of the result, the fcmp. That allows some processors to
compute in parallel the fdiv, fmuls, and fsubs.

The pass implemented here also fixes the following bugs:
https://llvm.org/bugs/show_bug.cgi?id=12754
https://llvm.org/bugs/show_bug.cgi?id=20242
https://llvm.org/bugs/show_bug.cgi?id=22005

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=5738
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11820
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11832
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21485
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23286
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29144
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32590
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35303
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=38204
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43159
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52256

dberlin accepted this revision.Jun 30 2016, 11:36 AM
dberlin edited edge metadata.

At this point, i think this is good enough to start with and if we find things to improve, we can iterate on them in-tree.
Thank you for all your hard work on this, i know it has been a bit of a slog :)

This revision is now accepted and ready to land.Jun 30 2016, 11:36 AM

At this point, i think this is good enough to start with and if we find things to improve, we can iterate on them in-tree.
Thank you for all your hard work on this, i know it has been a bit of a slog :)

That was a good exercise in patience ;-)
Thanks Danny for your reviews: they greatly helped improve the patch over time.
I rebased the patch, and I will commit it tonight after more tests.

sebpop updated this revision to Diff 62404.Jun 30 2016, 12:59 PM
sebpop edited edge metadata.

Rebased and clang-formatted.
make check and llvm test-suite pass on x86_64-linux.
Still testing SPEC and some other benchmarks before commit later today.

This revision was automatically updated to reflect the committed changes.
LuoYuanke added inline comments.
llvm/trunk/lib/Transforms/Scalar/GVNHoist.cpp
285

Hi,

I'm trying to understanding why we can hoist instruction across exception handler. Is it because the instruction may raise exception and exception pointer should not be changed? Does this hoist rule on exception handler also apply to PRE?

I watch the video that you present, but still didn't figure it out.

Thanks

Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2019, 11:21 PM
Herald added subscribers: mgrang, mgorny. · View Herald Transcript