This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/
-
Scalar.h
-
Scalar/
-
GVN.h
-
lib/
-
Passes/
-
PassRegistry.def
-
Transforms/
-
IPO/
-
PassManagerBuilder.cpp
-
Scalar/
-
CMakeLists.txt
19/39
GVNHoist.cpp
-
Scalar.cpp
-
Utils/
1/1
MemorySSA.cpp
-
test/Transforms/GVN/
-
Transforms/
-
GVN/
-
hoist.ll

Differential D19338

New code hoisting pass based on GVN (optimistic approach)
ClosedPublic

Authored by sebpop on Apr 20 2016, 12:25 PM.

Download Raw Diff

Details

Reviewers

chandlerc
• dberlin

Commits

rG4177480aadb3: code hoisting pass based on GVN
rG63847d04e795: code hoisting pass based on GVN
rG5c5798c57c99: code hoisting pass based on GVN
rL275561: code hoisting pass based on GVN
rL275401: code hoisting pass based on GVN
rL274305: code hoisting pass based on GVN

Summary

This pass hoists duplicated computations in the program. The primary goal of
gvn-hoist is to reduce the size of functions before inline heuristics to reduce
the total cost of function inlining. This implementation is the optimistic approach
that Danny asked us to implement: we start from the set of all known equivalent
computations, and we discard unsafe hoisting situations.

Here is the algorithm Danny asked us to work on last time:

Make a multimap from VN to each expression with that VN (VNTable is not currently this) over the entire program.
For each VN in the table:
 if (size (expressions with a given VN) > 1):
   For each block in domtree in DFS order:
       For each expression, if DFSin/DFSOut(expression->parent) within range of DFSin/DFSOut(block), push(possiblehoistlist (a list), expression (current expression), block (insertion point))
          If you have 2 or more things on possiblehoistlist, calculate availability of operands for each expression in block
              (you can cache the highest point you can move them to for later to checking again and again.
               Since you are using dominance, you know they can only be hoisted to blocks dominated by the highest point you can hoist to).
          If 2 or more things are still available, hoist
Note you can also likely skip any domtree block that does not have two or more children, i just haven't proven it to myself yet.

Because we "value number everything up front":

we VN things that do not matter for hoisting,
we VN all computations dependent on loads and that would forbid hoisting the whole dependence chain as each load gets a different VN (in the current GVN.cpp implementation.) The temporary solution is to hoist twice and invalidate the VN table in between.

The harder parts compared to the previous patch in http://reviews.llvm.org/D18798
are the detection of safety properties of hoisting:

no side effects on all paths between insert point and the original location of all hoistable expressions: we use a traversal of the inverse CFG from the expression to be hoisted to the insertion point to gather all the BBs on the execution paths that we have to check for side-effects.
to make hoisting efficient for scalars (and safe for hoisting load expressions) we need to prove that from the insertion point to the end of the function the expressions are needed on all paths.

There are still a few improvements to be implemented:

using memory SSA as a minimal data dependence analysis to improve the accuracy of the side effects analysis in loops.
computing hoisting subsets: for the moment we try to hoist all expressions with the same VN without checking whether a subset would be safe to hoist.

Passes llvm regression test, test-suite, and SPEC Cpu2006.
Over the c/c++ SPEC 2006 benchmarks the GVN-hoisting pass removes 55012 instructions.

Without the patch: Number of call sites deleted, not inlined: 20394 Number of functions deleted because all callers found: 70497 Number of functions inlined: 182119 Number of allocas merged together: 225 Number of caller-callers analyzed: 200361 Number of call sites analyzed: 445806
With the patch: Number of call sites deleted, not inlined: 20412 Number of functions deleted because all callers found: 70495 Number of functions inlined: 182122 Number of allocas merged together: 225 Number of caller-callers analyzed: 200360 Number of call sites analyzed: 445624 Number of hoisted instructions: 5435 Number of instructions removed: 55012

Spec2006 results are quite noisy:
(positive is an increase of the spec score in percent)

400.perlbench: 0
401.bzip2: 0
403.gcc: -2.00
429.mcf: -4.00
445.gobmk: 0
456.hmmer: 0
458.sjeng: 0
462.libquantum: -2.00
464.h264ref: 4.00
471.omnetpp: -1.00
473.astar: -1.00
433.milc: 15.00
444.namd: 0
447.dealII: 0
450.soplex: 0
453.povray: 0
470.lbm: 0
482.sphinx3: 1.00

Pass written by:

Sebastian Pop
Aditya Kumar
Xiaoyu Hu
Brian Rzycki
Daniel Berlin

Diff Detail

Event Timeline

sebpop updated this revision to Diff 54398.Apr 20 2016, 12:25 PM

sebpop retitled this revision from to New code hoisting pass based on GVN (optimistic approach).

sebpop updated this object.

sebpop added reviewers: • dberlin, chandlerc.

sebpop set the repository for this revision to rL LLVM.

sebpop added subscribers: llvm-commits, hiraditya, flyingforyou.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptApr 20 2016, 12:25 PM

hxy9243 added a subscriber: hxy9243.Apr 20 2016, 12:50 PM

Ooh, this looks shiny; thanks for doing it!

I have a few drive-by nits for you.

llvm/lib/Transforms/Scalar/GVNHoist.cpp
140	Can we use a `SmallPtrSet` here? (If you're not familiar with LLVM containers, `SmallPtrSet<T, N>` can be passed without the size (e.g. for parameters) as a `SmallPtrSetImpl<T>`, and it has a `count(Foo)` function, which is equivalent to `set.find(Foo) != set.end()`. :) )
187	`SmallPtrSet` (x2) ?
210	`SmallVector`?
233	`SmallPtrSet`?
293	`std::move(InstructionsToHoist)`?
303	`std::move(InstructionsToHoist)`?
327	Would `for (const auto &HP : HPL)` work instead?
331	Can we take this by `const&` to avoid a copy?

Addressed all comments from George Burgess IV.
Amended patch passes regression tests and test-suite on x86_64-linux.
Thanks for the review!

hiraditya set the repository for this revision to rL LLVM.Apr 22 2016, 7:38 AM

sebpop removed rL LLVM as the repository for this revision.Apr 22 2016, 11:37 AM

The updated patch from Aditya has some cleanups, compile time improvements, and uses less memory.

We measured the total number of instructions executed when "clang -cc1" is compiling all the preprocessed files of the llvm test-suite through valgrind:

with patch: 926457230716
without: 923166450214

Overall compilation overhead of the new pass is 0.35%.

And what is the performance improvement?

Some more cleanups.

I tried this patch on our Hexagon compiler to see what impact the pass had on some of our performance benchmarks (mostly embedded programs). The biggest improvement was 1.5% and the biggest degradation was -1.8% (in spec2K/twolf). Most differences were under 1%. Many benchmarks were unchanged. I didn't look at code size though.

Also, I did see one test failure due to infinite recursion caused by some odd looking IR as input to the pass. The odd IR is generated by the jump threading pass.

for.cond:                                         ; preds = %for.cond
  %inc113 = add nsw i32 %inc113, 1
  br label %for.cond

opt -gvn-hoist -S < gvn-bug.ll

gvn-bug.ll5 KBDownload

llvm/lib/Transforms/Scalar/GVNHoist.cpp
67	ID isn't used anywhere.

In D19338#410725, @bcahoon wrote:
Also, I did see one test failure due to infinite recursion caused by some odd looking IR as input to the pass. The odd IR is generated by the jump threading pass.
for.cond:                                         ; preds = %for.cond
  %inc113 = add nsw i32 %inc113, 1
  br label %for.cond

The recursion is in llvm::GVN::ValueTable::lookup_or_add of %inc113 that is defined
recursively. As our pass is now value numbering all the program, it will try to VN this
variable and it will start the infinite recursion.

We could change the way we iterate over the CFG to only walk through BBs reachable from the function entry.
Another work around is to add -simplifycfg to remove the dead code.
The fix is to detect self-defined variables in GVN.cpp.

sebpop added inline comments.Apr 26 2016, 11:37 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
397	Replacing the iteration over all the BBs of the function with a depth_first iteration fixes the infinite recursion problem reported by Brendon: for (BasicBlock *BB : depth_first(&F.getEntryBlock()))

The updated patch computes partial hoisting locations for a subset of expressions, and also hoists stores and function calls.
Here are the stats for a build of the spec2006:
Number of hoisted instructions: 24021
Number of instructions removed: 74080

Update patch to use Memory SSA as a minimal data dependence analysis.
We may want to commit the fix to MemorySSA.cpp as a separate patch.

On the SPEC 2006 benchmarks with the patch (the number between parentheses is the difference against without the patch, positive is more with the patch):
Number of call sites deleted, not inlined: 20419 (+25)
Number of functions deleted because all callers found: 70393 (-104)
Number of functions inlined: 182452 (+333)
Number of allocas merged together: 227 (0)
Number of caller-callers analyzed: 200825 (+464)
Number of call sites analyzed: 446853 (+1047)
Number of hoisted instructions: 34977
Number of instructions removed: 88044

On the llvm test-suite the numbers are:
Number of hoisted instructions: 35125
Number of instructions removed: 234484

These numbers looks really great. Out of curiosity, do you happen to have runtime measurements as well?

In D19338#416165, @joker.eph wrote:

These numbers looks really great. Out of curiosity, do you happen to have runtime measurements as well?

The original patch description has some runtime numbers from SPEC... But they're pretty up-and-down.

I'd be really interested in any analysis done on the 4% regression. Also I'd love to see the runtime numbers from the whole test suite.

Some cleanups and also added a cost function to not hoist expressions too far from their original location.
By default we bound the hoisting to 4 levels and a maximum of 4 basic blocks on the paths to the hoisting point.
On the test-suite we now have the following stats:

gvn-hoist.*Number of instructions hoisted: 25324
gvn-hoist.*Number of instructions removed: 30393
gvn-hoist.*Number of loads hoisted: 11421
gvn-hoist.*Number of loads removed: 14687
gvn-hoist.*Number of stores hoisted: 25
gvn-hoist.*Number of stores removed: 25
gvn-hoist.*Number of calls hoisted: 10
gvn-hoist.*Number of calls removed: 10

We will post SPEC score improvements and test-suite compile time and execution time numbers for this revision of the patch.

The following experiments were performed with the last posted patch on an intel i7-4790K 4.4GHz x86_64-linux.

SPEC 2006 run 3 times in validation mode: the improvements are all in the INT part of the benchmark (we discarded everything less than +/-0.5% as noise.)

433.milc: 0
444.namd: 0
447.dealII: 0
450.soplex: 0
453.povray: 0
470.lbm: 0
482.sphinx3: 0
400.perlbench: 1.5
401.bzip2: 1.3
403.gcc: 0
429.mcf: 0
445.gobmk: 0
456.hmmer: 0
458.sjeng: 0
462.libquantum: 1.1
464.h264ref: 0
471.omnetpp: 1
473.astar: 1.5

On the SPEC 2006 we have the following compile time statistics:

Number of call sites deleted, not inlined: 406 (+0)
Number of functions deleted because all callers found: 70492 (+0)
Number of functions inlined: 202127 (-5)
Number of allocas merged together: 225 (+0)
Number of caller-callers analyzed: 225239 (-13)
Number of call sites analyzed: 488920 (-1748)
gvn-hoist.*Number of instructions hoisted: 24017
gvn-hoist.*Number of instructions removed: 26349
gvn-hoist.*Number of loads hoisted: 8482
gvn-hoist.*Number of loads removed: 9719
gvn-hoist.*Number of stores hoisted: 9
gvn-hoist.*Number of stores removed: 9
gvn-hoist.*Number of calls hoisted: 34
gvn-hoist.*Number of calls removed: 34

On the LLVM test-suite we have seen numbers that vary too much: about 30% noise.

We ran the test-suite 3 times and selected the best run, discarded all benchmarks running for less than 2 seconds, and discarded all results less than +/-2%:
On 2mm.c -stats reports that gvn-hoist does not hoist any instruction: that is just noise.
Most likely all llvm test-suite execution times that we report are noisy.

Execution time slow down (first two numbers are base and peak execution time in seconds, the third number is percent speedup, positive is better)

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/2mm.test: 6.728 6.104 -9.2700
MultiSource/Applications/JM/lencod/lencod.test: 2.928 2.836 -3.1400

Execution time speed up:

SingleSource/Benchmarks/Shootout/hash.test: 2.22 2.292 3.2400
SingleSource/Benchmarks/CoyoteBench/huffbench.test: 7.212 7.496 3.9300

Code size decrease: (first two numbers are code size of base and peak, the third number is percent improvement in code size, negative is better)

SingleSource/Benchmarks/Shootout/ary3.test: 12832 8736 -31.9200
MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test: 656680 652584 -.6200
MultiSource/Applications/sqlite3/sqlite3.test: 791408 787312 -.5100
MultiSource/Benchmarks/FreeBench/distray/distray.test: 18528 18488 -.2100
SingleSource/Benchmarks/Adobe-C++/stepanov_vector.test: 45464 45400 -.1400
SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction.test: 46752 46696 -.1100
MultiSource/Benchmarks/Ptrdist/yacr2/yacr2.test: 45904 45856 -.1000
MultiSource/Applications/SIBsim4/SIBsim4.test: 69768 69728 -.0500

Code size increase (these regressions may be due to more inlining):

SingleSource/Benchmarks/Linpack/linpack-pc.test: 30128 30176 .1500
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1.test: 13600 13656 .4100
MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test: 325224 329320 1.2500
MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test: 81272 85368 5.0300
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt.test: 79840 83936 5.1300
MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael.test: 42656 46752 9.6000

We measured the total number of instructions executed when "clang -cc1" is compiling all the preprocessed files of the llvm test-suite through valgrind:

with patch: 931904066692
without: 928877989824

Overall compilation overhead of the new pass is 0.32%.

Update heuristic to avoid hoisting geps without hoisting their ld/st, and also avoid hoisting scalars past calls that could increase register pressure.

Updated patch from Aditya: do not restrict hoisting when optimizing for code size.

Also Aditya asked to report the number of spills with/wo the patch.

On SPEC2006 on x86_64, without the patch:

Number of spills inserted: 39703
with the patch:
Number of spills inserted: 39502 (-201)

On the test-suite on x86_64, without the patch:

Number of spills inserted: 51148
with the patch:
Number of spills inserted: 51273 (+125)

majnemer added a subscriber: majnemer.May 17 2016, 11:09 AM

majnemer added inline comments.

llvm/lib/Transforms/Scalar/GVNHoist.cpp
138–153	What if the call is to a function which divides by zero? What would stop you from hoisting the call to it?
636–638	Likewise, you need to make sure that the call has no side effects which is different from it not mutating memory.

This looks much better to start. Thank you for working so hard on it.
More comments coming, but i made a first pass.

llvm/lib/Transforms/Scalar/GVNHoist.cpp
214	Errr, isn't this the definition of A post-dominating B or C? A post-dominates B if all paths from A to end of function pass through B. Same with (A, C). If that's right, i would just use post-dominance here :)
426	I wonder how expensive this computation ends up being. It never changes per-iteration unless something is messing with the CFG out from under you. (This is why GCC uses et-splay trees)
636–638	+1 Pure and const (in gcc parlance) are pretty much the only thing you can safely move.
666	Note that you can get the DFS in/out numbers from the dominator tree if that is an acceptable ordering (IE DFS on DT).

junbuml added a subscriber: junbuml.May 17 2016, 12:06 PM

hiraditya added inline comments.May 17 2016, 2:23 PM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
214	It is more like B and C combined post dominating A.
426	For each Instruction in the InstructionsToHoist, the nearest common dominator (w.r.t. HoistPt) could change. e.g., A -> B -> C (has I1) B-> D (has I2) A -> E (has I3). And if I1, I2 and I3 have the same GVN. In this case nearestCommonDominator(C, D) = B, and, nearestCommonDominator(B, E) = A.
636–638	Will do that. Thanks.
666	It seems DFS number is not always available in the dominator tree. DFS numbers are updated only if there are too many (> 32) slow queries in GenericDomTree.h:468

sebpop added inline comments.May 17 2016, 2:25 PM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
138–153	We do check for no EH and no BB address taken in between the original place of the expressions and the place we hoist it to. After hoisting, if the call throws an exception it would happen a bit earlier than it used to, although there shouldn't be any other exceptions or side effects happening in between the new and old place of the call. We also check that all paths from the new location to the end of the function do have the exact call that we hoist. So the call with the exception has to happen on all paths.
214	Post-Dominance is computed on the reversed-CFG: (turning each edge in the opposite direction) A post-dominates B if all paths from the end of the function to B on the reversed-CFG pass through A.
426	The good thing is that the hoist pass does not change the CFG. Overall compilation time is also 0.35% increase on x86_64 compiling all the test-suite through valgrind (see some of the previous experiments.)
636–638	I will add a check for that. Thanks for catching this.
666	I will look at that. I remember Aditya had a patch doing that: he needed to expose that numbering from the DT construction through a new function. From what I remember we reverted back to this numbering after some problem with that numbering. I will try to see if I can remove this loop from the patch.

Address Danny's and David's comments.

Rebased as of today.

Sorry, i'll get to this one today or tomorrow.

I took a second pass at it

llvm/lib/Transforms/Scalar/GVNHoist.cpp
82	Do you actually use the ordering guarantee of multimap?
84	Please document this class per the llvm style guidelines
97	Please document this class per the llvm style guidelines
112	Please document this class per the llvm style guidelines
126	Please use std::pair, or define an appropriate key, not convert vn numbers to strings :)
250	How expensive is this in practice (because it's another thing that, computed in the right order, will share most of the subcomputation work)?
673	So, this is going to be super-expensive to do, and should be marked with a FIXME. You should only compute it once, and I have the APIs you need to do updates being worked on in a different review. You should also consider whether this should simply be a pass dependency.
llvm/lib/Transforms/Utils/MemorySSA.cpp
617	This change needs to be explained and submitted as a separate patch :)

• dberlin added inline comments.Jun 6 2016, 8:43 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
279	I feel like this + gatherallblocks is really an up-safe/down-safe computation (depending on whether it's a load or store), and can be done much saner than you are doing it. In particular, this seems a really complicated and expensive way to compute this property, compared to how most PRE papers do it.

mssimpso added a subscriber: mssimpso.Jun 6 2016, 8:50 AM

To be specific (and feel free to tell me why i'm wrong, these functions are
a bit hard to decipher without comments):

For starters, for loads/stores I don't understand why you use
gatherallpaths and then check the memoryuses, instead of calculating
Nearest common dominator (or postdominator, depending)(hoist point, blocks
for all the uses).

Let's ignore the may-throw/etc issues for a second (which can be done with
a single block test)

For scalar computations, hoist point safety depends on whether you can
recompute the operands at that dominator, nothing else.

For load/stores, it's easier. For loads, if the memory state (ie the thing
the clobbering memorydef defines) dominates your hoistpoint, then it
becomes the same scalar question, because if you can re-make the load, you
know the memory-state will be the same.
For stores, if you are sinking, you are defining the memory state in a
given block, the only question is whether that memory state is killed
before it the use.

Checking domination is not necessary, the only way it can be true is if
every store produces the same VN, and it's intersection of all successors
to the sink point.

If you are trying to *hoist* stores, it's a much simpler problem, the point
you can hoist to is
nearest-common-dominator(getClobberingDefinition(store)->block, blocks of
all uses).

majnemer added inline comments.Jun 6 2016, 9:53 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
82	I'd recommend a DenseMap from your key space to vector (or SmallVector) of Instruction *.
120–126	I'd recommend hash_combine utilized in a DenseMapInfo instead of this logic.

• dberlin added inline comments.Jun 6 2016, 9:54 AM

llvm/lib/Transforms/Scalar/GVNHoist.cpp
126	(IE use hash_combine and make that work, or whatever)

sebpop marked 7 inline comments as done.Jun 6 2016, 1:03 PM

sebpop added inline comments.

llvm/lib/Transforms/Scalar/GVNHoist.cpp
82	No, we do not use the ordering in the multimap.
126	hash_combine works.
673	I added a FIXME note. I agree that we should not throw away an expensive analysis. I do not expect this loop to execute more than 2 or 3 times following the number of dependences between loading the address of another load/store.

Update to address all comments from Danny and David.
Thanks for the reviews!

sebpop mentioned this in D21039: Fix memory access local dominance function for live on entry.Jun 6 2016, 1:11 PM

mcrosier added a subscriber: mcrosier.Jun 10 2016, 8:39 AM

Ping.

Is there something else we need to address in this patch before we can commit?

Thanks,
Sebastian

This pass fixes redundancy elimination in zlib as described in https://llvm.org/bugs/show_bug.cgi?id=22005

Sure, though note that's a size optimization :)
In any case, i will try to make a final review pass and give approval today
or tomorrow.

In D19338#470067, @dberlin wrote:

Sure, though note that's a size optimization :)
In any case, i will try to make a final review pass and give approval today
or tomorrow.

The patch is not only a code size improvement patch.
With this pass we have seen a 15% speedup on a proprietary benchmark,
and of course the spec 2006 is also slightly improving as we reported earlier.
There are also 5 or 6 code hoisting bugs in the GCC bugzilla that are all caught by this pass.

Thanks,
Sebastian

I'm very interested to understand where that speedup might come from. Do
you have a small example of what is happening you can share?

In D19338#470858, @dberlin wrote:

I'm very interested to understand where that speedup might come from. Do
you have a small example of what is happening you can share?

A reduced testcase from that benchmark is @scalarsHoisting.
When embedded in a loop, each iteration gets some benefit from the hoisting.
Also reported as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70159
From that bug report:

$ cat h.c
float foo_p(float d, float min, float max, float a)
{

float tmin;
float tmax;

float inv = 1.0f / d;
if (inv >= 0) {
  tmin = (min - a) * inv;
  tmax = (max - a) * inv;
} else {
  tmin = (max - a) * inv;
  tmax = (min - a) * inv;
}

return tmax + tmin;

}

$ clang h.c -Ofast -S -o-
foo_p: @foo_p
BB#0: // %entry
fmov s4, #1.00000000
fdiv s0, s4, s0
fcmp s0, #0.0
fcsel s4, s1, s2, lt
fcsel s1, s2, s1, lt
fsub s1, s1, s3
fsub s2, s4, s3
fadd s1, s2, s1
fmul s0, s1, s0
ret

With the GVN-hoist pass we end up moving the two fmul and fsub up
between fdiv and fcmp, adding more latency between the fdiv and
the user of the result, the fcmp. That allows some processors to
compute in parallel the fdiv, fmuls, and fsubs.

The pass implemented here also fixes the following bugs:
https://llvm.org/bugs/show_bug.cgi?id=12754
https://llvm.org/bugs/show_bug.cgi?id=20242
https://llvm.org/bugs/show_bug.cgi?id=22005

At this point, i think this is good enough to start with and if we find things to improve, we can iterate on them in-tree.
Thank you for all your hard work on this, i know it has been a bit of a slog :)

This revision is now accepted and ready to land.Jun 30 2016, 11:36 AM

In D19338#471558, @dberlin wrote:

At this point, i think this is good enough to start with and if we find things to improve, we can iterate on them in-tree.
Thank you for all your hard work on this, i know it has been a bit of a slog :)

That was a good exercise in patience ;-)
Thanks Danny for your reviews: they greatly helped improve the patch over time.
I rebased the patch, and I will commit it tonight after more tests.

Rebased and clang-formatted.
make check and llvm test-suite pass on x86_64-linux.
Still testing SPEC and some other benchmarks before commit later today.

Closed by commit rL274305: code hoisting pass based on GVN (authored by spop). · Explain WhyJun 30 2016, 5:31 PM

This revision was automatically updated to reflect the committed changes.

LuoYuanke added a subscriber: LuoYuanke.Nov 29 2019, 11:21 PM

LuoYuanke added inline comments.

llvm/trunk/lib/Transforms/Scalar/GVNHoist.cpp
285 ↗	(On Diff #62448)	Hi, I'm trying to understanding why we can hoist instruction across exception handler. Is it because the instruction may raise exception and exception pointer should not be changed? Does this hoist rule on exception handler also apply to PRE? I watch the video that you present, but still didn't figure it out. Thanks

Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2019, 11:21 PM

Herald added subscribers: mgrang, mgorny. · View Herald Transcript

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

Scalar/

GVN.h

15 lines

lib/

Passes/

PassRegistry.def

1 line

Transforms/

IPO/

PassManagerBuilder.cpp

1 line

Scalar/

CMakeLists.txt

1 line

GVNHoist.cpp

747 lines

Scalar.cpp

5 lines

Utils/

MemorySSA.cpp

4 lines

test/

Transforms/

GVN/

hoist.ll

651 lines

Diff 59290

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	void initializeMemorySanitizerPass(PassRegistry&);			void initializeMemorySanitizerPass(PassRegistry&);
	void initializeThreadSanitizerPass(PassRegistry&);			void initializeThreadSanitizerPass(PassRegistry&);
	void initializeSanitizerCoverageModulePass(PassRegistry&);			void initializeSanitizerCoverageModulePass(PassRegistry&);
	void initializeDataFlowSanitizerPass(PassRegistry&);			void initializeDataFlowSanitizerPass(PassRegistry&);
	void initializeEfficiencySanitizerPass(PassRegistry&);			void initializeEfficiencySanitizerPass(PassRegistry&);
	void initializeScalarizerPass(PassRegistry&);			void initializeScalarizerPass(PassRegistry&);
	void initializeEarlyCSELegacyPassPass(PassRegistry &);			void initializeEarlyCSELegacyPassPass(PassRegistry &);
	void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry &);			void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry &);
				void initializeGVNHoistLegacyPassPass(PassRegistry &);
	void initializeExpandISelPseudosPass(PassRegistry&);			void initializeExpandISelPseudosPass(PassRegistry&);
	void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);			void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);
	void initializeGCMachineCodeAnalysisPass(PassRegistry&);			void initializeGCMachineCodeAnalysisPass(PassRegistry&);
	void initializeGCModuleInfoPass(PassRegistry&);			void initializeGCModuleInfoPass(PassRegistry&);
	void initializeGVNLegacyPassPass(PassRegistry&);			void initializeGVNLegacyPassPass(PassRegistry&);
	void initializeGlobalDCELegacyPassPass(PassRegistry&);			void initializeGlobalDCELegacyPassPass(PassRegistry&);
	void initializeGlobalOptLegacyPassPass(PassRegistry&);			void initializeGlobalOptLegacyPassPass(PassRegistry&);
	void initializeGlobalsAAWrapperPassPass(PassRegistry&);			void initializeGlobalsAAWrapperPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createStripDeadPrototypesPass();		(void) llvm::createStripDeadPrototypesPass();
(void) llvm::createTailCallEliminationPass();		(void) llvm::createTailCallEliminationPass();
(void) llvm::createJumpThreadingPass();		(void) llvm::createJumpThreadingPass();
(void) llvm::createUnifyFunctionExitNodesPass();		(void) llvm::createUnifyFunctionExitNodesPass();
(void) llvm::createInstCountPass();		(void) llvm::createInstCountPass();
(void) llvm::createConstantHoistingPass();		(void) llvm::createConstantHoistingPass();
(void) llvm::createCodeGenPreparePass();		(void) llvm::createCodeGenPreparePass();
(void) llvm::createEarlyCSEPass();		(void) llvm::createEarlyCSEPass();
		(void) llvm::createGVNHoistPass();
(void) llvm::createMergedLoadStoreMotionPass();		(void) llvm::createMergedLoadStoreMotionPass();
(void) llvm::createGVNPass();		(void) llvm::createGVNPass();
(void) llvm::createMemCpyOptPass();		(void) llvm::createMemCpyOptPass();
(void) llvm::createLoopDeletionPass();		(void) llvm::createLoopDeletionPass();
(void) llvm::createPostDomTree();		(void) llvm::createPostDomTree();
(void) llvm::createInstructionNamerPass();		(void) llvm::createInstructionNamerPass();
(void) llvm::createMetaRenamerPass();		(void) llvm::createMetaRenamerPass();
(void) llvm::createPostOrderFunctionAttrsLegacyPass();		(void) llvm::createPostOrderFunctionAttrsLegacyPass();
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines
	//			//
	// EarlyCSE - This pass performs a simple and fast CSE pass over the dominator			// EarlyCSE - This pass performs a simple and fast CSE pass over the dominator
	// tree.			// tree.
	//			//
	FunctionPass *createEarlyCSEPass();			FunctionPass *createEarlyCSEPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// GVNHoist - This pass performs a simple and fast GVN pass over the dominator
				// tree to hoist common expressions from sibling branches.
				//
				FunctionPass *createGVNHoistPass();

				//===----------------------------------------------------------------------===//
				//
	// MergedLoadStoreMotion - This pass merges loads and stores in diamonds. Loads			// MergedLoadStoreMotion - This pass merges loads and stores in diamonds. Loads
	// are hoisted into the header, while stores sink into the footer.			// are hoisted into the header, while stores sink into the footer.
	//			//
	FunctionPass *createMergedLoadStoreMotionPass();			FunctionPass *createMergedLoadStoreMotionPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// MemCpyOpt - This pass performs optimizations related to eliminating memcpy			// MemCpyOpt - This pass performs optimizations related to eliminating memcpy
	▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/GVN.h

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	void markInstructionForDeletion(Instruction *I) {
VN.erase(I);		VN.erase(I);
InstrsToErase.push_back(I);		InstrsToErase.push_back(I);
}		}

DominatorTree &getDominatorTree() const { return *DT; }		DominatorTree &getDominatorTree() const { return *DT; }
AliasAnalysis *getAliasAnalysis() const { return VN.getAliasAnalysis(); }		AliasAnalysis *getAliasAnalysis() const { return VN.getAliasAnalysis(); }
MemoryDependenceResults &getMemDep() const { return *MD; }		MemoryDependenceResults &getMemDep() const { return *MD; }

private:
friend class gvn::GVNLegacyPass;

struct Expression;		struct Expression;
friend struct DenseMapInfo<Expression>;

/// This class holds the mapping between values and value numbers. It is used		/// This class holds the mapping between values and value numbers. It is used
/// as an efficient mechanism to determine the expression-wise equivalence of		/// as an efficient mechanism to determine the expression-wise equivalence of
/// two values.		/// two values.
class ValueTable {		class ValueTable {
DenseMap<Value *, uint32_t> valueNumbering;		DenseMap<Value *, uint32_t> valueNumbering;
DenseMap<Expression, uint32_t> expressionNumbering;		DenseMap<Expression, uint32_t> expressionNumbering;
AliasAnalysis *AA;		AliasAnalysis *AA;
Show All 25 Lines	public:
void setAliasAnalysis(AliasAnalysis *A) { AA = A; }		void setAliasAnalysis(AliasAnalysis *A) { AA = A; }
AliasAnalysis *getAliasAnalysis() const { return AA; }		AliasAnalysis *getAliasAnalysis() const { return AA; }
void setMemDep(MemoryDependenceResults *M) { MD = M; }		void setMemDep(MemoryDependenceResults *M) { MD = M; }
void setDomTree(DominatorTree *D) { DT = D; }		void setDomTree(DominatorTree *D) { DT = D; }
uint32_t getNextUnusedValueNumber() { return nextValueNumber; }		uint32_t getNextUnusedValueNumber() { return nextValueNumber; }
void verifyRemoved(const Value *) const;		void verifyRemoved(const Value *) const;
};		};

		private:
		friend class gvn::GVNLegacyPass;
		friend struct DenseMapInfo<Expression>;

MemoryDependenceResults *MD;		MemoryDependenceResults *MD;
DominatorTree *DT;		DominatorTree *DT;
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
AssumptionCache *AC;		AssumptionCache *AC;
SetVector<BasicBlock *> DeadBlocks;		SetVector<BasicBlock *> DeadBlocks;

ValueTable VN;		ValueTable VN;

▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	private:
void addDeadBlock(BasicBlock *BB);		void addDeadBlock(BasicBlock *BB);
void assignValNumForDeadCode();		void assignValNumForDeadCode();
};		};

/// Create a legacy GVN pass. This also allows parameterizing whether or not		/// Create a legacy GVN pass. This also allows parameterizing whether or not
/// loads are eliminated by the pass.		/// loads are eliminated by the pass.
FunctionPass *createGVNPass(bool NoLoads = false);		FunctionPass *createGVNPass(bool NoLoads = false);

		/// \brief A simple and fast domtree-based GVN pass to hoist common expressions
		/// from sibling branches.
		struct GVNHoistPass : PassInfoMixin<GVNHoistPass> {
		/// \brief Run the pass over the function.
		PreservedAnalyses run(Function &F, AnalysisManager<Function> &AM);
		};

}		}

#endif		#endif

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	#ifndef FUNCTION_PASS			#ifndef FUNCTION_PASS
	#define FUNCTION_PASS(NAME, CREATE_PASS)			#define FUNCTION_PASS(NAME, CREATE_PASS)
	#endif			#endif
	FUNCTION_PASS("aa-eval", AAEvaluator())			FUNCTION_PASS("aa-eval", AAEvaluator())
	FUNCTION_PASS("adce", ADCEPass())			FUNCTION_PASS("adce", ADCEPass())
	FUNCTION_PASS("dce", DCEPass())			FUNCTION_PASS("dce", DCEPass())
	FUNCTION_PASS("dse", DSEPass())			FUNCTION_PASS("dse", DSEPass())
	FUNCTION_PASS("early-cse", EarlyCSEPass())			FUNCTION_PASS("early-cse", EarlyCSEPass())
				FUNCTION_PASS("gvn-hoist", GVNHoistPass())
	FUNCTION_PASS("instcombine", InstCombinePass())			FUNCTION_PASS("instcombine", InstCombinePass())
	FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())			FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())
	FUNCTION_PASS("no-op-function", NoOpFunctionPass())			FUNCTION_PASS("no-op-function", NoOpFunctionPass())
	FUNCTION_PASS("loweratomic", LowerAtomicPass())			FUNCTION_PASS("loweratomic", LowerAtomicPass())
	FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())			FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())
	FUNCTION_PASS("gvn", GVN())			FUNCTION_PASS("gvn", GVN())
	FUNCTION_PASS("print", PrintFunctionPass(dbgs()))			FUNCTION_PASS("print", PrintFunctionPass(dbgs()))
	FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))			FUNCTION_PASS("print<assumptions>", AssumptionPrinterPass(dbgs()))
	Show All 33 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateFunctionPassManager(
addInitialAliasAnalysisPasses(FPM);		addInitialAliasAnalysisPasses(FPM);

FPM.add(createCFGSimplificationPass());		FPM.add(createCFGSimplificationPass());
if (UseNewSROA)		if (UseNewSROA)
FPM.add(createSROAPass());		FPM.add(createSROAPass());
else		else
FPM.add(createScalarReplAggregatesPass());		FPM.add(createScalarReplAggregatesPass());
FPM.add(createEarlyCSEPass());		FPM.add(createEarlyCSEPass());
		FPM.add(createGVNHoistPass());
FPM.add(createLowerExpectIntrinsicPass());		FPM.add(createLowerExpectIntrinsicPass());
}		}

// Do PGO instrumentation generation or use pass as the option specified.		// Do PGO instrumentation generation or use pass as the option specified.
void PassManagerBuilder::addPGOInstrPasses(legacy::PassManagerBase &MPM) {		void PassManagerBuilder::addPGOInstrPasses(legacy::PassManagerBase &MPM) {
if (!PGOInstrGen.empty()) {		if (!PGOInstrGen.empty()) {
MPM.add(createPGOInstrumentationGenLegacyPass());		MPM.add(createPGOInstrumentationGenLegacyPass());
// Add the profile lowering pass.		// Add the profile lowering pass.
▲ Show 20 Lines • Show All 642 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

	add_llvm_library(LLVMScalarOpts			add_llvm_library(LLVMScalarOpts
	ADCE.cpp			ADCE.cpp
	AlignmentFromAssumptions.cpp			AlignmentFromAssumptions.cpp
	BDCE.cpp			BDCE.cpp
	ConstantHoisting.cpp			ConstantHoisting.cpp
	ConstantProp.cpp			ConstantProp.cpp
	CorrelatedValuePropagation.cpp			CorrelatedValuePropagation.cpp
	DCE.cpp			DCE.cpp
	DeadStoreElimination.cpp			DeadStoreElimination.cpp
	EarlyCSE.cpp			EarlyCSE.cpp
	FlattenCFGPass.cpp			FlattenCFGPass.cpp
	Float2Int.cpp			Float2Int.cpp
	GVN.cpp			GVN.cpp
				GVNHoist.cpp
	InductiveRangeCheckElimination.cpp			InductiveRangeCheckElimination.cpp
	IndVarSimplify.cpp			IndVarSimplify.cpp
	JumpThreading.cpp			JumpThreading.cpp
	LICM.cpp			LICM.cpp
	LoadCombine.cpp			LoadCombine.cpp
	LoopDeletion.cpp			LoopDeletion.cpp
	LoopDataPrefetch.cpp			LoopDataPrefetch.cpp
	LoopDistribute.cpp			LoopDistribute.cpp
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/GVNHoist.cpp

This file was added.

				//===- GVNHoist.cpp - Hoist scalar and load expressions -------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass hoists expressions from branches to a common dominator. It uses
				// GVN (global value numbering) to discover expressions computing the same
				// values. The primary goal is to reduce the code size, and in some
				// cases reduce critical path (by exposing more ILP).
				// Hoisting may affect the performance in some cases. To mitigate that, hoisting
				// is disabled in the following cases.
				// 1. Scalars across calls.
				// 2. geps when corresponding load/store cannot be hoisted.
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallPtrSet.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Scalar/GVN.h"
				#include "llvm/Transforms/Utils/MemorySSA.h"
				#include <functional>
				#include <unordered_map>
				#include <vector>

				using namespace llvm;

				#define DEBUG_TYPE "gvn-hoist"

				STATISTIC(NumHoisted, "Number of instructions hoisted");
				STATISTIC(NumRemoved, "Number of instructions removed");
				STATISTIC(NumLoadsHoisted, "Number of loads hoisted");
				STATISTIC(NumLoadsRemoved, "Number of loads removed");
				STATISTIC(NumStoresHoisted, "Number of stores hoisted");
				STATISTIC(NumStoresRemoved, "Number of stores removed");
				STATISTIC(NumCallsHoisted, "Number of calls hoisted");
				STATISTIC(NumCallsRemoved, "Number of calls removed");

				static cl::opt<int>
				MaxHoistedThreshold("gvn-max-hoisted", cl::Hidden, cl::init(-1),
				cl::desc("Max number of instructions to hoist "
				"(default unlimited = -1)"));
				static cl::opt<int> MaxNumberOfBBSInPath(
				"gvn-hoist-max-bbs", cl::Hidden, cl::init(4),
				cl::desc("Max number of basic blocks on the path between "
				"hoisting locations (default = 4, unlimited = -1)"));

				static int HoistedCtr = 0;

				namespace {

				struct SortByDFSIn {
				private:
				DenseMap<const BasicBlock *, unsigned> &DFSNumber;

				public:
				SortByDFSIn(DenseMap<const BasicBlock *, unsigned> &D) : DFSNumber(D) {}

				bool operator()(const Instruction A, const Instruction B) const {
				assert(A != B);
				const BasicBlock *BA = A->getParent();
				const BasicBlock *BB = B->getParent();
				unsigned NA = DFSNumber[BA];
				bcahoonUnsubmitted Done Reply Inline Actions ID isn't used anywhere. bcahoon: ID isn't used anywhere.
				unsigned NB = DFSNumber[BB];
				if (NA < NB)
				return true;
				if (NA == NB) {
				// Sort them in the order they occur in the same basic block.
				BasicBlock::const_iterator AI(A), BI(B);
				return std::distance(AI, BI) < 0;
				}
				return false;
				}
				};

				// A multimap from a VN (value number) to all the instructions with that VN.
				typedef std::multimap<unsigned, Instruction *> VNtoInsns;

				dberlinUnsubmitted Not Done Reply Inline Actions Do you actually use the ordering guarantee of multimap? dberlin: Do you actually use the ordering guarantee of multimap?
				majnemerUnsubmitted Done Reply Inline Actions I'd recommend a DenseMap from your key space to vector (or SmallVector) of Instruction . majnemer:* I'd recommend a DenseMap from your key space to vector (or SmallVector) of Instruction *.
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions No, we do not use the ordering in the multimap. sebpop: No, we do not use the ordering in the multimap.
				class InsnInfo {
				VNtoInsns VNtoScalars;
				dberlinUnsubmitted Done Reply Inline Actions Please document this class per the llvm style guidelines dberlin: Please document this class per the llvm style guidelines

				public:
				void insert(Instruction *I, GVN::ValueTable &VN) {
				// Scalar instruction.
				unsigned V = VN.lookupOrAdd(I);
				VNtoScalars.insert(std::make_pair(V, I));
				}

				const VNtoInsns &getVNTable() const { return VNtoScalars; }
				};

				class LoadInfo {
				VNtoInsns VNtoLoads;
				dberlinUnsubmitted Done Reply Inline Actions Please document this class per the llvm style guidelines dberlin: Please document this class per the llvm style guidelines

				public:
				void insert(LoadInst *Load, GVN::ValueTable &VN) {
				if (Load->isSimple()) {
				Value *Ptr = Load->getPointerOperand();
				unsigned V = VN.lookupOrAdd(Ptr);
				VNtoLoads.insert(std::make_pair(V, Load));
				}
				}

				const VNtoInsns &getVNTable() const { return VNtoLoads; }
				};

				class StoreInfo {
				VNtoInsns VNtoStores;
				dberlinUnsubmitted Done Reply Inline Actions Please document this class per the llvm style guidelines dberlin: Please document this class per the llvm style guidelines

				public:
				void insert(StoreInst *Store, GVN::ValueTable &VN) {
				if (!Store->isSimple())
				return;
				// Hash the store address and the stored value.
				std::string VNS;
				Value *Ptr = Store->getPointerOperand();
				VNS += std::to_string(VN.lookupOrAdd(Ptr));
				VNS += ",";
				Value *Val = Store->getValueOperand();
				VNS += std::to_string(VN.lookupOrAdd(Val));
				VNtoStores.insert(std::make_pair(std::hash<std::string>()(VNS), Store));
				}
				dberlinUnsubmitted Done Reply Inline Actions Please use std::pair, or define an appropriate key, not convert vn numbers to strings :) dberlin: Please use std::pair, or define an appropriate key, not convert vn numbers to strings :)
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions hash_combine works. sebpop: hash_combine works.
				dberlinUnsubmitted Done Reply Inline Actions (IE use hash_combine and make that work, or whatever) dberlin: (IE use hash_combine and make that work, or whatever)
				majnemerUnsubmitted Done Reply Inline Actions I'd recommend hash_combine utilized in a DenseMapInfo instead of this logic. majnemer: I'd recommend hash_combine utilized in a DenseMapInfo instead of this logic.

				const VNtoInsns &getVNTable() const { return VNtoStores; }
				};

				class CallInfo {
				VNtoInsns VNtoCallsScalars;
				VNtoInsns VNtoCallsLoads;
				VNtoInsns VNtoCallsStores;

				public:
				void insert(CallInst *Call, GVN::ValueTable &VN) {
				// A call that doesNotAccessMemory is handled as a Scalar,
				// onlyReadsMemory will be handled as a Load instruction,
				// all other calls will be handled as stores.
				george.burgess.ivUnsubmitted Done Reply Inline Actions Can we use a `SmallPtrSet` here? (If you're not familiar with LLVM containers, `SmallPtrSet<T, N>` can be passed without the size (e.g. for parameters) as a `SmallPtrSetImpl<T>`, and it has a `count(Foo)` function, which is equivalent to `set.find(Foo) != set.end()`. :) ) george.burgess.iv: Can we use a `SmallPtrSet` here? (If you're not familiar with LLVM containers, `SmallPtrSet<T…
				unsigned V = VN.lookupOrAdd(Call);

				if (Call->doesNotAccessMemory())
				VNtoCallsScalars.insert(std::make_pair(V, Call));
				else if (Call->onlyReadsMemory())
				VNtoCallsLoads.insert(std::make_pair(V, Call));
				else
				VNtoCallsStores.insert(std::make_pair(V, Call));
				}

				const VNtoInsns &getScalarVNTable() const { return VNtoCallsScalars; }

				const VNtoInsns &getLoadVNTable() const { return VNtoCallsLoads; }
				majnemerUnsubmitted Not Done Reply Inline Actions What if the call is to a function which divides by zero? What would stop you from hoisting the call to it? majnemer: What if the call is to a function which divides by zero? What would stop you from hoisting the…
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions We do check for no EH and no BB address taken in between the original place of the expressions and the place we hoist it to. After hoisting, if the call throws an exception it would happen a bit earlier than it used to, although there shouldn't be any other exceptions or side effects happening in between the new and old place of the call. We also check that all paths from the new location to the end of the function do have the exact call that we hoist. So the call with the exception has to happen on all paths. sebpop: We do check for no EH and no BB address taken in between the original place of the expressions…

				const VNtoInsns &getStoreVNTable() const { return VNtoCallsStores; }
				};

				typedef DenseMap<const BasicBlock *, bool> BBSideEffectsSet;
				typedef SmallVector<Instruction *, 4> SmallVecInsn;
				typedef SmallVectorImpl<Instruction *> SmallVecImplInsn;

				// This pass hoists common computations across branches sharing common
				// dominator. The primary goal is to reduce the code size, and in some
				// cases reduce critical path (by exposing more ILP).
				class GVNHoistLegacyPassImpl {
				public:
				GVN::ValueTable VN;
				DominatorTree *DT;
				AliasAnalysis *AA;
				MemoryDependenceResults *MD;
				DenseMap<const BasicBlock *, unsigned> DFSNumber;
				BBSideEffectsSet BBSideEffects;
				MemorySSA *MSSA;
				MemorySSAWalker *MSSAW;
				enum InsKind { Unknown, Scalar, Load, Store };

				GVNHoistLegacyPassImpl(DominatorTree Dt, AliasAnalysis Aa,
				MemoryDependenceResults *Md)
				: DT(Dt), AA(Aa), MD(Md), MSSAW(nullptr) {}

				// Return true when there are exception handling in BB.
				bool hasEH(const BasicBlock *BB) {
				auto It = BBSideEffects.find(BB);
				if (It != BBSideEffects.end())
				return It->second;

				if (BB->isEHPad() \|\| BB->hasAddressTaken()) {
				george.burgess.ivUnsubmitted Done Reply Inline Actions `SmallPtrSet` (x2) ? george.burgess.iv: `SmallPtrSet` (x2) ?
				BBSideEffects[BB] = true;
				return true;
				}

				if (BB->getTerminator()->mayThrow() \|\| !BB->getTerminator()->mayReturn()) {
				BBSideEffects[BB] = true;
				return true;
				}

				BBSideEffects[BB] = false;
				return false;
				}

				// Return true when there are exception handling blocks on the execution path.
				bool hasEH(SmallPtrSetImpl<const BasicBlock *> &Paths) {
				for (const BasicBlock *BB : Paths)
				if (hasEH(BB))
				return true;

				return false;
				}

				// Return true when all paths from A to the end of the function pass through
				george.burgess.ivUnsubmitted Done Reply Inline Actions `SmallVector`? george.burgess.iv: `SmallVector`?
				// either B or C.
				bool hoistingFromAllPaths(const BasicBlock A, const BasicBlock B,
				const BasicBlock *C) {
				// We fully copy the WL in order to be able to remove items from it.
				dberlinUnsubmitted Not Done Reply Inline Actions Errr, isn't this the definition of A post-dominating B or C? A post-dominates B if all paths from A to end of function pass through B. Same with (A, C). If that's right, i would just use post-dominance here :) dberlin: Errr, isn't this the definition of A post-dominating B or C? A post-dominates B if all paths…
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions Post-Dominance is computed on the reversed-CFG: (turning each edge in the opposite direction) A post-dominates B if all paths from the end of the function to B on the reversed-CFG pass through A. sebpop: Post-Dominance is computed on the reversed-CFG: (turning each edge in the opposite direction) A…
				hiradityaUnsubmitted Not Done Reply Inline Actions It is more like B and C combined post dominating A. hiraditya: It is more like B and C combined post dominating A.
				SmallPtrSet<const BasicBlock *, 2> WL;
				WL.insert(B);
				WL.insert(C);

				for (auto It = df_begin(A), E = df_end(A); It != E;) {
				// There exists a path from A to the exit of the function if we are still
				// iterating in DF traversal and we removed all instructions from the work
				// list.
				if (WL.empty())
				return false;

				const BasicBlock BB = It;
				if (WL.erase(BB)) {
				// Stop DFS traversal when BB is in the work list.
				It.skipChildren();
				continue;
				}

				// Check for end of function, calls that do not return, etc.
				george.burgess.ivUnsubmitted Done Reply Inline Actions `SmallPtrSet`? george.burgess.iv: `SmallPtrSet`?
				if (!isGuaranteedToTransferExecutionToSuccessor(BB->getTerminator()))
				return false;

				// Increment DFS traversal when not skipping children.
				++It;
				}

				return true;
				}

				// Each element of a hoisting list contains the basic block where to hoist and
				// a list of instructions to be hoisted.
				typedef std::pair<BasicBlock *, SmallVecInsn> HoistingPointInfo;
				typedef SmallVector<HoistingPointInfo, 4> HoistingPointList;

				// Initialize Paths with all the basic blocks executed in between A and B.
				void gatherAllBlocks(SmallPtrSetImpl<const BasicBlock *> &Paths,
				dberlinUnsubmitted Done Reply Inline Actions How expensive is this in practice (because it's another thing that, computed in the right order, will share most of the subcomputation work)? dberlin: How expensive is this in practice (because it's another thing that, computed in the right order…
				const BasicBlock A, const BasicBlock B) {
				assert(DT->dominates(A, B) && "Invalid path");

				// We may need to keep B in the Paths set if we have already added it
				// to Paths for another expression.
				bool Keep = Paths.count(B);

				// Record in Paths all basic blocks reachable in depth-first iteration on
				// the inverse CFG from B to A. These blocks are all the blocks that may be
				// executed between the execution of A and B. Hoisting an expression from B
				// into A has to be safe on all execution paths.
				for (auto I = idf_ext_begin(B, Paths), E = idf_ext_end(B, Paths); I != E;) {
				if (*I == A)
				// Stop traversal when reaching A.
				I.skipChildren();
				else
				++I;
				}

				// Safety check for B will be handled separately.
				if (!Keep)
				Paths.erase(B);

				// Safety check for A will be handled separately.
				Paths.erase(A);
				}

				// Return true when there are users of A in one of the BBs of Paths.
				bool hasMemoryUseOnPaths(MemoryAccess *A,
				dberlinUnsubmitted Done Reply Inline Actions I feel like this + gatherallblocks is really an up-safe/down-safe computation (depending on whether it's a load or store), and can be done much saner than you are doing it. In particular, this seems a really complicated and expensive way to compute this property, compared to how most PRE papers do it. dberlin: I feel like this + gatherallblocks is really an up-safe/down-safe computation (depending on…
				SmallPtrSetImpl<const BasicBlock *> &Paths) {
				Value::user_iterator UI = A->user_begin();
				Value::user_iterator UE = A->user_end();
				const BasicBlock *BBA = A->getBlock();
				for (; UI != UE; ++UI)
				if (MemoryAccess UM = dyn_cast<MemoryAccess>(UI))
				for (const BasicBlock *PBB : Paths) {
				if (PBB == BBA) {
				if (MSSA->locallyDominates(UM, A))
				return true;
				continue;
				}
				if (PBB == UM->getBlock())
				return true;
				george.burgess.ivUnsubmitted Done Reply Inline Actions `std::move(InstructionsToHoist)`? george.burgess.iv: `std::move(InstructionsToHoist)`?
				}
				return false;
				}

				// Return true when it is safe to hoist an instruction Insn to NewHoistPt and
				// move the insertion point from HoistPt to NewHoistPt.
				bool safeToHoist(const BasicBlock NewHoistPt, const BasicBlock HoistPt,
				const Instruction Insn, const Instruction First, InsKind K,
				int &BBsOnAllPaths) {
				if (hasEH(HoistPt))
				george.burgess.ivUnsubmitted Done Reply Inline Actions `std::move(InstructionsToHoist)`? george.burgess.iv: `std::move(InstructionsToHoist)`?
				return false;

				const BasicBlock *BBInsn = Insn->getParent();
				// When HoistPt already contains an instruction to be hoisted, the
				// expression is needed on all paths.

				// Check that the hoisted expression is needed on all paths: it is unsafe
				// to hoist loads to a place where there may be a path not loading from
				// the same address: for instance there may be a branch on which the
				// address of the load may not be initialized. FIXME: at -Oz we may want
				// to hoist scalars to a place where they are partially needed.
				if (BBInsn != NewHoistPt &&
				!hoistingFromAllPaths(NewHoistPt, HoistPt, BBInsn))
				return false;

				// Check for unsafe hoistings due to side effects.
				SmallPtrSet<const BasicBlock *, 4> Paths;
				gatherAllBlocks(Paths, NewHoistPt, HoistPt);
				gatherAllBlocks(Paths, NewHoistPt, BBInsn);

				// Check whether there are too many blocks on the hoisting path.
				BBsOnAllPaths += Paths.size();
				if (MaxNumberOfBBSInPath != -1 && BBsOnAllPaths >= MaxNumberOfBBSInPath)
				return false;
				george.burgess.ivUnsubmitted Done Reply Inline Actions Would `for (const auto &HP : HPL)` work instead? george.burgess.iv: Would `for (const auto &HP : HPL)` work instead?

				if (hasEH(Paths))
				return false;

				george.burgess.ivUnsubmitted Done Reply Inline Actions Can we take this by `const&` to avoid a copy? george.burgess.iv: Can we take this by `const&` to avoid a copy?
				// Safe to hoist scalars.
				if (K == InsKind::Scalar)
				return true;

				// For loads and stores, we check for dependences on the Memory SSA.
				MemoryAccess *MemdefInsn =
				cast<MemoryUseOrDef>(MSSA->getMemoryAccess(Insn))->getDefiningAccess();
				BasicBlock *BBMemdefInsn = MemdefInsn->getBlock();

				if (DT->properlyDominates(NewHoistPt, BBMemdefInsn))
				// Cannot move Insn past BBMemdefInsn to NewHoistPt.
				return false;

				MemoryAccess *MemdefFirst =
				cast<MemoryUseOrDef>(MSSA->getMemoryAccess(First))->getDefiningAccess();
				BasicBlock *BBMemdefFirst = MemdefFirst->getBlock();

				if (DT->properlyDominates(NewHoistPt, BBMemdefFirst))
				// Cannot move First past BBMemdefFirst to NewHoistPt.
				return false;

				if (K == InsKind::Store) {
				// Check that we do not move a store past loads.
				if (DT->dominates(BBMemdefInsn, NewHoistPt))
				if (hasMemoryUseOnPaths(MemdefInsn, Paths))
				return false;

				if (DT->dominates(BBMemdefFirst, NewHoistPt))
				if (hasMemoryUseOnPaths(MemdefFirst, Paths))
				return false;
				}

				if (DT->properlyDominates(BBMemdefInsn, NewHoistPt) &&
				DT->properlyDominates(BBMemdefFirst, NewHoistPt))
				return true;

				const BasicBlock *BBFirst = First->getParent();
				if (BBInsn == BBFirst)
				return false;

				assert(BBMemdefInsn == NewHoistPt \|\| BBMemdefFirst == NewHoistPt);

				if (BBInsn != NewHoistPt && BBFirst != NewHoistPt)
				return true;

				if (BBInsn == NewHoistPt) {
				if (DT->properlyDominates(BBMemdefFirst, NewHoistPt))
				return true;
				assert(BBInsn == BBMemdefFirst);
				if (MSSA->locallyDominates(MSSA->getMemoryAccess(Insn), MemdefFirst))
				return false;
				return true;
				}

				if (BBFirst == NewHoistPt) {
				if (DT->properlyDominates(BBMemdefInsn, NewHoistPt))
				return true;
				assert(BBFirst == BBMemdefInsn);
				if (MSSA->locallyDominates(MSSA->getMemoryAccess(First), MemdefInsn))
				return false;
				return true;
				}

				// No side effects: it is safe to hoist.
				return true;
				}
				sebpopAuthorUnsubmitted Done Reply Inline Actions Replacing the iteration over all the BBs of the function with a depth_first iteration fixes the infinite recursion problem reported by Brendon: for (BasicBlock BB : depth_first(&F.getEntryBlock())) sebpop:* Replacing the iteration over all the BBs of the function with a depth_first iteration fixes the…

				// Partition InstructionsToHoist into a set of candidates which can share a
				// common hoisting point. The partitions are collected in HPL. IsScalar is
				// true when the instructions in InstructionsToHoist are scalars. IsLoad is
				// true when the InstructionsToHoist are loads, false when they are stores.
				void partitionCandidates(SmallVecImplInsn &InstructionsToHoist,
				HoistingPointList &HPL, InsKind K) {
				// No need to sort for two instructions.
				if (InstructionsToHoist.size() > 2) {
				SortByDFSIn Pred(DFSNumber);
				std::sort(InstructionsToHoist.begin(), InstructionsToHoist.end(), Pred);
				}

				// Create a work list of all the BB of the Insns to be hoisted.
				SmallPtrSet<BasicBlock *, 4> WL;
				SmallVecImplInsn::iterator II = InstructionsToHoist.begin();
				SmallVecImplInsn::iterator Start = II;
				BasicBlock HoistPt = (II)->getParent();
				WL.insert((*II)->getParent());
				int BBsOnAllPaths = 0;

				for (++II; II != InstructionsToHoist.end(); ++II) {
				Instruction Insn = II;
				BasicBlock *BB = Insn->getParent();
				BasicBlock *NewHoistPt = DT->findNearestCommonDominator(HoistPt, BB);
				WL.insert(BB);
				if (safeToHoist(NewHoistPt, HoistPt, Insn, *Start, K, BBsOnAllPaths)) {
				// Extend HoistPt to NewHoistPt.
				HoistPt = NewHoistPt;
				dberlinUnsubmitted Not Done Reply Inline Actions I wonder how expensive this computation ends up being. It never changes per-iteration unless something is messing with the CFG out from under you. (This is why GCC uses et-splay trees) dberlin: I wonder how expensive this computation ends up being. It never changes per-iteration unless…
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions The good thing is that the hoist pass does not change the CFG. Overall compilation time is also 0.35% increase on x86_64 compiling all the test-suite through valgrind (see some of the previous experiments.) sebpop: The good thing is that the hoist pass does not change the CFG. Overall compilation time is also…
				hiradityaUnsubmitted Not Done Reply Inline Actions For each Instruction in the InstructionsToHoist, the nearest common dominator (w.r.t. HoistPt) could change. e.g., A -> B -> C (has I1) B-> D (has I2) A -> E (has I3). And if I1, I2 and I3 have the same GVN. In this case nearestCommonDominator(C, D) = B, and, nearestCommonDominator(B, E) = A. hiraditya: For each Instruction in the InstructionsToHoist, the nearest common dominator (w.r.t. HoistPt)…
				continue;
				}
				// Not safe to hoist: save the previous work list and start over from BB.
				if (std::distance(Start, II) > 1)
				HPL.push_back(std::make_pair(HoistPt, SmallVecInsn(Start, II)));
				else
				WL.clear();

				// We start over to compute HoistPt from BB.
				Start = II;
				HoistPt = BB;
				BBsOnAllPaths = 0;
				}

				// Save the last partition.
				if (std::distance(Start, II) > 1)
				HPL.push_back(std::make_pair(HoistPt, SmallVecInsn(Start, II)));
				}

				// Initialize HPL from Map.
				void computeInsertionPoints(const VNtoInsns &Map, HoistingPointList &HPL,
				InsKind K) {
				for (VNtoInsns::const_iterator It = Map.begin(); It != Map.end();
				It = Map.upper_bound(It->first)) {
				if (MaxHoistedThreshold != -1 && ++HoistedCtr > MaxHoistedThreshold)
				return;

				unsigned V = It->first;
				if (Map.count(V) < 2)
				continue;

				// Compute the insertion point and the list of expressions to be hoisted.
				auto R = Map.equal_range(V);
				auto First = R.first;
				auto Last = R.second;
				SmallVecInsn InstructionsToHoist;
				for (; First != Last; ++First) {
				Instruction *I = First->second;
				BasicBlock *BB = I->getParent();
				if (!hasEH(BB))
				InstructionsToHoist.push_back(I);
				}

				if (InstructionsToHoist.size())
				partitionCandidates(InstructionsToHoist, HPL, K);
				}
				}

				// Return true when all operands of Instr are available at insertion point
				// HoistPt. When limiting the number of hoisted expressions, one could hoist
				// a load without hoisting its access function. So before hoisting any
				// expression, make sure that all its operands are available at insert point.
				bool allOperandsAvailable(const Instruction *I,
				const BasicBlock *HoistPt) const {
				for (unsigned i = 0, e = I->getNumOperands(); i != e; ++i) {
				const Value *Op = I->getOperand(i);
				const Instruction *Inst = dyn_cast<Instruction>(Op);
				if (Inst && !DT->dominates(Inst->getParent(), HoistPt))
				return false;
				}

				return true;
				}

				Instruction firstOfTwo(Instruction I, Instruction *J) const {
				for (Instruction &I1 : *I->getParent())
				if (&I1 == I \|\| &I1 == J)
				return &I1;
				llvm_unreachable("Both I and J must be from same BB");
				}

				// Replace the use of From with To in Insn.
				void replaceUseWith(Instruction Insn, Value From, Value *To) const {
				for (Value::use_iterator UI = From->use_begin(), UE = From->use_end();
				UI != UE;) {
				Use &U = *UI++;
				if (U.getUser() == Insn) {
				U.set(To);
				return;
				}
				}
				llvm_unreachable("should replace exactly once");
				}

				bool makeOperandsAvailable(Instruction Repl, BasicBlock HoistPt) const {
				// Check whether the GEP of a ld/st can be synthesized at HoistPt.
				Instruction *Gep = nullptr;
				Instruction *Val = nullptr;
				if (LoadInst *Ld = dyn_cast<LoadInst>(Repl))
				Gep = dyn_cast<Instruction>(Ld->getPointerOperand());
				if (StoreInst *St = dyn_cast<StoreInst>(Repl)) {
				Gep = dyn_cast<Instruction>(St->getPointerOperand());
				Val = dyn_cast<Instruction>(St->getValueOperand());
				}

				if (!Gep \|\| !isa<GetElementPtrInst>(Gep))
				return false;

				// Check whether we can compute the Gep at HoistPt.
				if (!allOperandsAvailable(Gep, HoistPt))
				return false;

				// Also check that the stored value is available.
				if (Val && !allOperandsAvailable(Val, HoistPt))
				return false;

				// Copy the gep before moving the ld/st.
				Instruction *ClonedGep = Gep->clone();
				ClonedGep->insertBefore(HoistPt->getTerminator());
				replaceUseWith(Repl, Gep, ClonedGep);

				// Also copy Val when it is a gep: geps are not hoisted by default.
				if (Val && isa<GetElementPtrInst>(Val)) {
				Instruction *ClonedVal = Val->clone();
				ClonedVal->insertBefore(HoistPt->getTerminator());
				replaceUseWith(Repl, Val, ClonedVal);
				}

				return true;
				}

				std::pair<unsigned, unsigned> hoist(HoistingPointList &HPL) {
				unsigned NI = 0, NL = 0, NS = 0, NC = 0, NR = 0;
				for (const HoistingPointInfo &HP : HPL) {
				// Find out whether we already have one of the instructions in HoistPt,
				// in which case we do not have to move it.
				BasicBlock *HoistPt = HP.first;
				const SmallVecInsn &InstructionsToHoist = HP.second;
				Instruction *Repl = nullptr;
				for (Instruction *I : InstructionsToHoist)
				if (I->getParent() == HoistPt) {
				// If there are two instructions in HoistPt to be hoisted in place:
				// update Repl to be the first one, such that we can rename the uses
				// of the second based on the first.
				Repl = !Repl ? I : firstOfTwo(Repl, I);
				}

				if (Repl) {
				// Repl is already in HoistPt: it remains in place.
				assert(allOperandsAvailable(Repl, HoistPt) &&
				"instruction depends on operands that are not available");
				} else {
				// When we do not find Repl in HoistPt, select the first in the list
				// and move it to HoistPt.
				Repl = InstructionsToHoist.front();

				// We can move Repl in HoistPt only when all operands are available.
				// The order in which hoistings are done may influence the availability
				// of operands.
				if (!allOperandsAvailable(Repl, HoistPt) &&
				!makeOperandsAvailable(Repl, HoistPt))
				continue;
				Repl->moveBefore(HoistPt->getTerminator());
				}

				if (isa<LoadInst>(Repl))
				++NL;
				else if (isa<StoreInst>(Repl))
				++NS;
				else if (isa<CallInst>(Repl))
				++NC;
				else // Scalar
				++NI;

				// Remove and rename all other instructions.
				for (Instruction *I : InstructionsToHoist)
				if (I != Repl) {
				++NR;
				if (isa<LoadInst>(Repl))
				++NumLoadsRemoved;
				else if (isa<StoreInst>(Repl))
				++NumStoresRemoved;
				else if (isa<CallInst>(Repl))
				++NumCallsRemoved;
				I->replaceAllUsesWith(Repl);
				I->eraseFromParent();
				}
				}

				NumHoisted += NL + NS + NC + NI;
				NumRemoved += NR;
				NumLoadsHoisted += NL;
				NumStoresHoisted += NS;
				NumCallsHoisted += NC;
				return {NI, NL + NC + NS};
				}

				// Hoist all expressions. Returns Number of scalars hoisted
				// and number of non-scalars hoisted.
				std::pair<unsigned, unsigned> hoistExpressions(Function &F) {
				InsnInfo II;
				LoadInfo LI;
				StoreInfo SI;
				CallInfo CI;
				const bool OptForMinSize = F.optForMinSize();
				for (BasicBlock *BB : depth_first(&F.getEntryBlock())) {
				for (Instruction &I1 : *BB) {
				if (LoadInst *Load = dyn_cast<LoadInst>(&I1))
				LI.insert(Load, VN);
				else if (StoreInst *Store = dyn_cast<StoreInst>(&I1))
				SI.insert(Store, VN);
				else if (CallInst *Call = dyn_cast<CallInst>(&I1)) {
				if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Call)) {
				if (isa<DbgInfoIntrinsic>(II) \|\|
				II->getIntrinsicID() == Intrinsic::assume)
				continue;
				}
				if (Call->mayHaveSideEffects()) {
				if (!OptForMinSize)
				break;
				// We may continue hoisting across calls which write to memory.
				if (Call->mayThrow() \|\| !Call->mayReturn())
				majnemerUnsubmitted Not Done Reply Inline Actions Likewise, you need to make sure that the call has no side effects which is different from it not mutating memory. majnemer: Likewise, you need to make sure that the call has no side effects which is different from it…
				dberlinUnsubmitted Not Done Reply Inline Actions +1 Pure and const (in gcc parlance) are pretty much the only thing you can safely move. dberlin: +1 Pure and const (in gcc parlance) are pretty much the only thing you can safely move.
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions I will add a check for that. Thanks for catching this. sebpop: I will add a check for that. Thanks for catching this.
				hiradityaUnsubmitted Not Done Reply Inline Actions Will do that. Thanks. hiraditya: Will do that. Thanks.
				break;
				}
				CI.insert(Call, VN);
				} else if (OptForMinSize \|\| !isa<GetElementPtrInst>(&I1))
				// Do not hoist scalars past calls that may write to memory because
				// that could result in spills later. geps are handled separately.
				// TODO: We can relax this for targets like AArch64 as they have more
				// registers than X86.
				II.insert(&I1, VN);
				}
				}

				HoistingPointList HPL;
				computeInsertionPoints(II.getVNTable(), HPL, InsKind::Scalar);
				computeInsertionPoints(LI.getVNTable(), HPL, InsKind::Load);
				computeInsertionPoints(SI.getVNTable(), HPL, InsKind::Store);
				computeInsertionPoints(CI.getScalarVNTable(), HPL, InsKind::Scalar);
				computeInsertionPoints(CI.getLoadVNTable(), HPL, InsKind::Load);
				computeInsertionPoints(CI.getStoreVNTable(), HPL, InsKind::Store);
				return hoist(HPL);
				}

				bool run(Function &F) {
				VN.setDomTree(DT);
				VN.setAliasAnalysis(AA);
				VN.setMemDep(MD);
				bool Res = false;

				dberlinUnsubmitted Not Done Reply Inline Actions Note that you can get the DFS in/out numbers from the dominator tree if that is an acceptable ordering (IE DFS on DT). dberlin: Note that you can get the DFS in/out numbers from the dominator tree if that is an acceptable…
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions I will look at that. I remember Aditya had a patch doing that: he needed to expose that numbering from the DT construction through a new function. From what I remember we reverted back to this numbering after some problem with that numbering. I will try to see if I can remove this loop from the patch. sebpop: I will look at that. I remember Aditya had a patch doing that: he needed to expose that…
				hiradityaUnsubmitted Not Done Reply Inline Actions It seems DFS number is not always available in the dominator tree. DFS numbers are updated only if there are too many (> 32) slow queries in GenericDomTree.h:468 hiraditya: It seems DFS number is not always available in the dominator tree. DFS numbers are updated…
				unsigned I = 0;
				for (const BasicBlock *BB : depth_first(&F.getEntryBlock()))
				DFSNumber.insert(std::make_pair(BB, ++I));

				// FIXME: use lazy evaluation of VN to avoid the fix-point computation.
				while (1) {
				MemorySSA M(F);
				dberlinUnsubmitted Not Done Reply Inline Actions So, this is going to be super-expensive to do, and should be marked with a FIXME. You should only compute it once, and I have the APIs you need to do updates being worked on in a different review. You should also consider whether this should simply be a pass dependency. dberlin: So, this is going to be super-expensive to do, and should be marked with a FIXME. You should…
				sebpopAuthorUnsubmitted Not Done Reply Inline Actions I added a FIXME note. I agree that we should not throw away an expensive analysis. I do not expect this loop to execute more than 2 or 3 times following the number of dependences between loading the address of another load/store. sebpop: I added a FIXME note. I agree that we should not throw away an expensive analysis. I do not…
				MSSA = &M;
				MSSAW = MSSA->buildMemorySSA(AA, DT);

				auto HoistStat = hoistExpressions(F);
				if (HoistStat.first + HoistStat.second == 0) {
				delete MSSAW;
				return Res;
				}
				if (HoistStat.second > 0) {
				// Memory SSA is not updated by our code generator: recompute it.
				delete MSSAW;
				// To address a limitation of the current GVN, we need to rerun the
				// hoisting after we hoisted loads in order to be able to hoist all
				// scalars dependent on the hoisted loads. Same for stores.
				VN.clear();
				}
				Res = true;
				}

				return Res;
				}
				};

				class GVNHoistLegacyPass : public FunctionPass {
				public:
				static char ID;

				GVNHoistLegacyPass() : FunctionPass(ID) {
				initializeGVNHoistLegacyPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override {
				auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
				auto &MD = getAnalysis<MemoryDependenceWrapperPass>().getMemDep();

				GVNHoistLegacyPassImpl G(&DT, &AA, &MD);
				return G.run(F);
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<AAResultsWrapperPass>();
				AU.addRequired<MemoryDependenceWrapperPass>();
				AU.addPreserved<DominatorTreeWrapperPass>();
				}
				};
				} // namespace

				PreservedAnalyses GVNHoistPass::run(Function &F,
				AnalysisManager<Function> &AM) {
				DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
				AliasAnalysis &AA = AM.getResult<AAManager>(F);
				MemoryDependenceResults &MD = AM.getResult<MemoryDependenceAnalysis>(F);

				GVNHoistLegacyPassImpl G(&DT, &AA, &MD);
				if (!G.run(F))
				return PreservedAnalyses::all();

				PreservedAnalyses PA;
				PA.preserve<DominatorTreeAnalysis>();
				return PA;
				}

				char GVNHoistLegacyPass::ID = 0;
				INITIALIZE_PASS_BEGIN(GVNHoistLegacyPass, "gvn-hoist",
				"Early GVN Hoisting of Expressions", false, false)
				INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
				INITIALIZE_PASS_END(GVNHoistLegacyPass, "gvn-hoist",
				"Early GVN Hoisting of Expressions", false, false)

				FunctionPass *llvm::createGVNHoistPass() { return new GVNHoistLegacyPass(); }

llvm/lib/Transforms/Scalar/Scalar.cpp

Show All 37 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeConstantPropagationPass(Registry);		initializeConstantPropagationPass(Registry);
initializeCorrelatedValuePropagationPass(Registry);		initializeCorrelatedValuePropagationPass(Registry);
initializeDCELegacyPassPass(Registry);		initializeDCELegacyPassPass(Registry);
initializeDeadInstEliminationPass(Registry);		initializeDeadInstEliminationPass(Registry);
initializeScalarizerPass(Registry);		initializeScalarizerPass(Registry);
initializeDSELegacyPassPass(Registry);		initializeDSELegacyPassPass(Registry);
initializeGVNLegacyPassPass(Registry);		initializeGVNLegacyPassPass(Registry);
initializeEarlyCSELegacyPassPass(Registry);		initializeEarlyCSELegacyPassPass(Registry);
		initializeGVNHoistLegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyPass(Registry);		initializeIndVarSimplifyPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLICMPass(Registry);		initializeLICMPass(Registry);
initializeLoopDataPrefetchPass(Registry);		initializeLoopDataPrefetchPass(Registry);
initializeLoopDeletionPass(Registry);		initializeLoopDeletionPass(Registry);
initializeLoopAccessAnalysisPass(Registry);		initializeLoopAccessAnalysisPass(Registry);
▲ Show 20 Lines • Show All 178 Lines • ▼ Show 20 Lines
void LLVMAddCorrelatedValuePropagationPass(LLVMPassManagerRef PM) {		void LLVMAddCorrelatedValuePropagationPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createCorrelatedValuePropagationPass());		unwrap(PM)->add(createCorrelatedValuePropagationPass());
}		}

void LLVMAddEarlyCSEPass(LLVMPassManagerRef PM) {		void LLVMAddEarlyCSEPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createEarlyCSEPass());		unwrap(PM)->add(createEarlyCSEPass());
}		}

		void LLVMAddGVNHoistLegacyPass(LLVMPassManagerRef PM) {
		unwrap(PM)->add(createGVNHoistPass());
		}

void LLVMAddTypeBasedAliasAnalysisPass(LLVMPassManagerRef PM) {		void LLVMAddTypeBasedAliasAnalysisPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createTypeBasedAAWrapperPass());		unwrap(PM)->add(createTypeBasedAAWrapperPass());
}		}

void LLVMAddScopedNoAliasAAPass(LLVMPassManagerRef PM) {		void LLVMAddScopedNoAliasAAPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createScopedNoAliasAAWrapperPass());		unwrap(PM)->add(createScopedNoAliasAAWrapperPass());
}		}

void LLVMAddBasicAliasAnalysisPass(LLVMPassManagerRef PM) {		void LLVMAddBasicAliasAnalysisPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createBasicAAWrapperPass());		unwrap(PM)->add(createBasicAAWrapperPass());
}		}

void LLVMAddLowerExpectIntrinsicPass(LLVMPassManagerRef PM) {		void LLVMAddLowerExpectIntrinsicPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLowerExpectIntrinsicPass());		unwrap(PM)->add(createLowerExpectIntrinsicPass());
}		}

llvm/lib/Transforms/Utils/MemorySSA.cpp

	Show First 20 Lines • Show All 605 Lines • ▼ Show 20 Lines
	/// \brief Determine, for two memory accesses in the same block,			/// \brief Determine, for two memory accesses in the same block,
	/// whether \p Dominator dominates \p Dominatee.			/// whether \p Dominator dominates \p Dominatee.
	/// \returns True if \p Dominator dominates \p Dominatee.			/// \returns True if \p Dominator dominates \p Dominatee.
	bool MemorySSA::locallyDominates(const MemoryAccess *Dominator,			bool MemorySSA::locallyDominates(const MemoryAccess *Dominator,
	const MemoryAccess *Dominatee) const {			const MemoryAccess *Dominatee) const {

	assert((Dominator->getBlock() == Dominatee->getBlock()) &&			assert((Dominator->getBlock() == Dominatee->getBlock()) &&
	"Asking for local domination when accesses are in different blocks!");			"Asking for local domination when accesses are in different blocks!");

				if (isLiveOnEntryDef(Dominatee))
				return false;

				dberlinUnsubmitted Done Reply Inline Actions This change needs to be explained and submitted as a separate patch :) dberlin: This change needs to be explained and submitted as a separate patch :)
	// Get the access list for the block			// Get the access list for the block
	const AccessListType *AccessList = getBlockAccesses(Dominator->getBlock());			const AccessListType *AccessList = getBlockAccesses(Dominator->getBlock());
	AccessListType::const_reverse_iterator It(Dominator->getIterator());			AccessListType::const_reverse_iterator It(Dominator->getIterator());

	// If we hit the beginning of the access list before we hit dominatee, we must			// If we hit the beginning of the access list before we hit dominatee, we must
	// dominate it			// dominate it
	return std::none_of(It, AccessList->rend(),			return std::none_of(It, AccessList->rend(),
	[&](const MemoryAccess &MA) { return &MA == Dominatee; });			[&](const MemoryAccess &MA) { return &MA == Dominatee; });
	▲ Show 20 Lines • Show All 508 Lines • Show Last 20 Lines

llvm/test/Transforms/GVN/hoist.ll

This file was added.

				; RUN: opt -gvn-hoist -S < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@GlobalVar = internal global float 1.000000e+00

				; Check that all scalar expressions are hoisted.
				;
				; CHECK-LABEL: @scalarsHoisting
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @scalarsHoisting(float %d, float %min, float %max, float %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%sub = fsub float %min, %a
				%mul = fmul float %sub, %div
				%sub1 = fsub float %max, %a
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				%sub3 = fsub float %max, %a
				%mul4 = fmul float %sub3, %div
				%sub5 = fsub float %min, %a
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]
				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that all loads and scalars depending on the loads are hoisted.
				; Check that getelementptr computation gets hoisted before the load.
				;
				; CHECK-LABEL: @readsAndScalarsHoisting
				; CHECK: load
				; CHECK: load
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: load
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @readsAndScalarsHoisting(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%A = getelementptr float, float* %min, i32 1
				%0 = load float, float* %A, align 4
				%1 = load float, float* %a, align 4
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%B = getelementptr float, float* %min, i32 1
				%5 = load float, float* %B, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]
				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that we do not hoist loads after a store: the first two loads will be
				; hoisted, and then the third load will not be hoisted.
				;
				; CHECK-LABEL: @readsAndWrites
				; CHECK: load
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: store
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: load
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @readsAndWrites(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				store float %0, float* @GlobalVar
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]
				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that we do hoist loads when the store is above the insertion point.
				;
				; CHECK-LABEL: @readsAndWriteAboveInsertPt
				; CHECK: load
				; CHECK: load
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: load
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @readsAndWriteAboveInsertPt(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				store float 0.000000e+00, float* @GlobalVar
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]
				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that dependent expressions are hoisted.
				; CHECK-LABEL: @dependentScalarsHoisting
				; CHECK: fsub
				; CHECK: fadd
				; CHECK: fdiv
				; CHECK: fmul
				; CHECK-NOT: fsub
				; CHECK-NOT: fadd
				; CHECK-NOT: fdiv
				; CHECK-NOT: fmul
				define float @dependentScalarsHoisting(float %a, float %b, i1 %c) {
				entry:
				br i1 %c, label %if.then, label %if.else

				if.then:
				%d = fsub float %b, %a
				%e = fadd float %d, %a
				%f = fdiv float %e, %a
				%g = fmul float %f, %a
				br label %if.end

				if.else:
				%h = fsub float %b, %a
				%i = fadd float %h, %a
				%j = fdiv float %i, %a
				%k = fmul float %j, %a
				br label %if.end

				if.end:
				%r = phi float [ %g, %if.then ], [ %k, %if.else ]
				ret float %r
				}

				; Check that all independent expressions are hoisted.
				; CHECK-LABEL: @independentScalarsHoisting
				; CHECK: fadd
				; CHECK: fsub
				; CHECK: fdiv
				; CHECK: fmul
				; CHECK-NOT: fsub
				; CHECK-NOT: fdiv
				; CHECK-NOT: fmul
				define float @independentScalarsHoisting(float %a, float %b, i1 %c) {
				entry:
				br i1 %c, label %if.then, label %if.else

				if.then:
				%d = fadd float %b, %a
				%e = fsub float %b, %a
				%f = fdiv float %b, %a
				%g = fmul float %b, %a
				br label %if.end

				if.else:
				%i = fadd float %b, %a
				%h = fsub float %b, %a
				%j = fdiv float %b, %a
				%k = fmul float %b, %a
				br label %if.end

				if.end:
				%p = phi float [ %d, %if.then ], [ %i, %if.else ]
				%q = phi float [ %e, %if.then ], [ %h, %if.else ]
				%r = phi float [ %f, %if.then ], [ %j, %if.else ]
				%s = phi float [ %g, %if.then ], [ %k, %if.else ]
				%t = fadd float %p, %q
				%u = fadd float %r, %s
				%v = fadd float %t, %u
				ret float %v
				}

				; Check that we hoist load and scalar expressions in triangles.
				; CHECK-LABEL: @triangleHoisting
				; CHECK: load
				; CHECK: load
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: load
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @triangleHoisting(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.end: ; preds = %entry
				%p1 = phi float [ %mul2, %if.then ], [ 0.000000e+00, %entry ]
				%p2 = phi float [ %mul, %if.then ], [ 0.000000e+00, %entry ]
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div

				%x = fadd float %p1, %mul6
				%y = fadd float %p2, %mul4
				%z = fadd float %x, %y
				ret float %z
				}

				; Check that we hoist load and scalar expressions in dominator.
				; CHECK-LABEL: @dominatorHoisting
				; CHECK: load
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: load
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @dominatorHoisting(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %entry
				%p1 = phi float [ %mul4, %if.then ], [ 0.000000e+00, %entry ]
				%p2 = phi float [ %mul6, %if.then ], [ 0.000000e+00, %entry ]

				%x = fadd float %p1, %mul2
				%y = fadd float %p2, %mul
				%z = fadd float %x, %y
				ret float %z
				}

				; Check that we hoist load and scalar expressions in dominator.
				; CHECK-LABEL: @domHoisting
				; CHECK: load
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK: load
				; CHECK: fsub
				; CHECK: fmul
				; CHECK-NOT: load
				; CHECK-NOT: fmul
				; CHECK-NOT: fsub
				define float @domHoisting(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then:
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.else:
				%6 = load float, float* %max, align 4
				%7 = load float, float* %a, align 4
				%sub9 = fsub float %6, %7
				%mul10 = fmul float %sub9, %div
				%8 = load float, float* %min, align 4
				%sub12 = fsub float %8, %7
				%mul13 = fmul float %sub12, %div
				br label %if.end

				if.end:
				%p1 = phi float [ %mul4, %if.then ], [ %mul10, %if.else ]
				%p2 = phi float [ %mul6, %if.then ], [ %mul13, %if.else ]

				%x = fadd float %p1, %mul2
				%y = fadd float %p2, %mul
				%z = fadd float %x, %y
				ret float %z
				}

				; Check that we do not hoist loads past stores within a same basic block.
				; CHECK-LABEL: @noHoistInSingleBBWithStore
				; CHECK: load
				; CHECK: store
				; CHECK: load
				; CHECK: store
				define i32 @noHoistInSingleBBWithStore() {
				entry:
				%D = alloca i32, align 4
				%0 = bitcast i32* %D to i8*
				%bf = load i8, i8* %0, align 4
				%bf.clear = and i8 %bf, -3
				store i8 %bf.clear, i8* %0, align 4
				%bf1 = load i8, i8* %0, align 4
				%bf.clear1 = and i8 %bf1, 1
				store i8 %bf.clear1, i8* %0, align 4
				ret i32 0
				}

				; Check that we do not hoist loads past calls within a same basic block.
				; CHECK-LABEL: @noHoistInSingleBBWithCall
				; CHECK: load
				; CHECK: call
				; CHECK: load
				declare void @foo()
				define i32 @noHoistInSingleBBWithCall() {
				entry:
				%D = alloca i32, align 4
				%0 = bitcast i32* %D to i8*
				%bf = load i8, i8* %0, align 4
				%bf.clear = and i8 %bf, -3
				call void @foo()
				%bf1 = load i8, i8* %0, align 4
				%bf.clear1 = and i8 %bf1, 1
				ret i32 0
				}

				; Check that we do not hoist loads past stores in any branch of a diamond.
				; CHECK-LABEL: @noHoistInDiamondWithOneStore1
				; CHECK: fdiv
				; CHECK: fcmp
				; CHECK: br
				define float @noHoistInDiamondWithOneStore1(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				store float 0.000000e+00, float* @GlobalVar
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				; There are no side effects on the if.else branch.
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]

				%6 = load float, float* %max, align 4
				%7 = load float, float* %a, align 4
				%sub6 = fsub float %6, %7
				%mul7 = fmul float %sub6, %div
				%8 = load float, float* %min, align 4
				%sub8 = fsub float %8, %7
				%mul9 = fmul float %sub8, %div

				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that we do not hoist loads past a store in any branch of a diamond.
				; CHECK-LABEL: @noHoistInDiamondWithOneStore2
				; CHECK: fdiv
				; CHECK: fcmp
				; CHECK: br
				define float @noHoistInDiamondWithOneStore2(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %if.then, label %if.else

				if.then: ; preds = %entry
				; There are no side effects on the if.then branch.
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4
				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %if.end

				if.else: ; preds = %entry
				store float 0.000000e+00, float* @GlobalVar
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %if.then ], [ %mul4, %if.else ]

				%6 = load float, float* %max, align 4
				%7 = load float, float* %a, align 4
				%sub6 = fsub float %6, %7
				%mul7 = fmul float %sub6, %div
				%8 = load float, float* %min, align 4
				%sub8 = fsub float %8, %7
				%mul9 = fmul float %sub8, %div

				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that we do not hoist loads outside a loop containing stores.
				; CHECK-LABEL: @noHoistInLoopsWithStores
				; CHECK: fdiv
				; CHECK: fcmp
				; CHECK: br
				define float @noHoistInLoopsWithStores(float %d, float* %min, float* %max, float* %a) {
				entry:
				%div = fdiv float 1.000000e+00, %d
				%cmp = fcmp oge float %div, 0.000000e+00
				br i1 %cmp, label %do.body, label %if.else

				do.body:
				%0 = load float, float* %min, align 4
				%1 = load float, float* %a, align 4

				; It is unsafe to hoist the loads outside the loop because of the store.
				store float 0.000000e+00, float* @GlobalVar

				%sub = fsub float %0, %1
				%mul = fmul float %sub, %div
				%2 = load float, float* %max, align 4
				%sub1 = fsub float %2, %1
				%mul2 = fmul float %sub1, %div
				br label %while.cond

				while.cond:
				%cmp1 = fcmp oge float %mul2, 0.000000e+00
				br i1 %cmp1, label %if.end, label %do.body

				if.else:
				%3 = load float, float* %max, align 4
				%4 = load float, float* %a, align 4
				%sub3 = fsub float %3, %4
				%mul4 = fmul float %sub3, %div
				%5 = load float, float* %min, align 4
				%sub5 = fsub float %5, %4
				%mul6 = fmul float %sub5, %div
				br label %if.end

				if.end:
				%tmax.0 = phi float [ %mul2, %while.cond ], [ %mul6, %if.else ]
				%tmin.0 = phi float [ %mul, %while.cond ], [ %mul4, %if.else ]

				%add = fadd float %tmax.0, %tmin.0
				ret float %add
				}

				; Check that we hoist stores: all the instructions from the then branch
				; should be hoisted.
				; CHECK-LABEL: @hoistStores
				; CHECK: zext
				; CHECK: trunc
				; CHECK: getelementptr
				; CHECK: load
				; CHECK: getelementptr
				; CHECK: store
				; CHECK: load
				; CHECK: load
				; CHECK: zext
				; CHECK: add
				; CHECK: store
				; CHECK: br
				; CHECK: if.then
				; CHECK: br

				%struct.foo = type { i16* }

				define void @hoistStores(%struct.foo* %s, i32* %coord, i1 zeroext %delta) {
				entry:
				%frombool = zext i1 %delta to i8
				%tobool = trunc i8 %frombool to i1
				br i1 %tobool, label %if.then, label %if.else

				if.then: ; preds = %entry
				%p = getelementptr inbounds %struct.foo, %struct.foo* %s, i32 0, i32 0
				%0 = load i16, i16* %p, align 8
				%incdec.ptr = getelementptr inbounds i16, i16* %0, i32 1
				store i16* %incdec.ptr, i16** %p, align 8
				%1 = load i16, i16* %0, align 2
				%conv = zext i16 %1 to i32
				%2 = load i32, i32* %coord, align 4
				%add = add i32 %2, %conv
				store i32 %add, i32* %coord, align 4
				br label %if.end

				if.else: ; preds = %entry
				%p1 = getelementptr inbounds %struct.foo, %struct.foo* %s, i32 0, i32 0
				%3 = load i16, i16* %p1, align 8
				%incdec.ptr2 = getelementptr inbounds i16, i16* %3, i32 1
				store i16* %incdec.ptr2, i16** %p1, align 8
				%4 = load i16, i16* %3, align 2
				%conv3 = zext i16 %4 to i32
				%5 = load i32, i32* %coord, align 4
				%add4 = add i32 %5, %conv3
				store i32 %add4, i32* %coord, align 4
				%6 = load i16, i16* %p1, align 8
				%incdec.ptr6 = getelementptr inbounds i16, i16* %6, i32 1
				store i16* %incdec.ptr6, i16** %p1, align 8
				%7 = load i16, i16* %6, align 2
				%conv7 = zext i16 %7 to i32
				%shl = shl i32 %conv7, 8
				%8 = load i32, i32* %coord, align 4
				%add8 = add i32 %8, %shl
				store i32 %add8, i32* %coord, align 4
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

New code hoisting pass based on GVN (optimistic approach)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 59290

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LinkAllPasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/GVN.h

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/GVNHoist.cpp

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/lib/Transforms/Utils/MemorySSA.cpp

llvm/test/Transforms/GVN/hoist.ll

New code hoisting pass based on GVN (optimistic approach)
ClosedPublic