This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
LinkAllPasses.h
-
Transforms/Utils/
-
Utils/
-
MemorySSA.h
-
lib/Transforms/
-
Transforms/
-
IPO/
-
PassManagerBuilder.cpp
-
Scalar/
2/15
EarlyCSE.cpp
-
Utils/
-
MemorySSA.cpp
-
test/Transforms/EarlyCSE/
-
Transforms/
-
EarlyCSE/
-
AArch64/
-
intrinsics.ll
-
ldstN.ll
-
atomics.ll
-
basic.ll
-
commute.ll
-
fence.ll
-
guards.ll
1
memoryssa.ll

Differential D19821

[EarlyCSE] Optionally use MemorySSA. NFC.
ClosedPublic

Authored by gberry on May 2 2016, 12:18 PM.

Download Raw Diff

Details

Reviewers

reames
majnemer
gberry
• dberlin
sanjoy
deadalnix

Commits

rG8d84605f25d9: [EarlyCSE] Optionally use MemorySSA. NFC.
rL280279: [EarlyCSE] Optionally use MemorySSA. NFC.

Summary

Use MemorySSA, if requested, to do less conservative memory dependency checking.

This change doesn't enable the MemorySSA enhanced EarlyCSE in the default pipelines, so should be NFC.

Diff Detail

Event Timeline

gberry updated this revision to Diff 55861.May 2 2016, 12:18 PM

gberry retitled this revision from to [EarlyCSE] Port to use MemorySSA (disabled by default). NFC..

gberry updated this object.

gberry added reviewers: • dberlin, sanjoy, reames, majnemer.

gberry added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptMay 2 2016, 12:18 PM

gberry added a parent revision: D19664: [MemorySSA] Port to new pass manager.May 2 2016, 12:19 PM

A few nits in passing.

lib/Transforms/Scalar/EarlyCSE.cpp
306–307	Can this be committed as a separate change?
563	Please capitalize handle and add a period.
574	Please add a descriptive && "Error message!".
760–762	Separate commit.
776	Separate commit. Maybe typdef?

george.burgess.iv added a subscriber: george.burgess.iv.May 4 2016, 12:45 PM

At a meta level, I'm not convinced that updating EarlyCSE to work with MemorySSA is the right approach. EarlyCSE is focused on being really really fast at cleaning up stupidly redundant IR so that the rest of the pass pipeline doesn't need to worry about it. MemorySSA is relatively expensive to construct. Given that, I'm really not sure putting it at the very beginning of the pipeline is a good design choice.

Now, having said that, it might make sense to have a dominance order CSE pass based on MemorySSA for use later in the pass pipeline. Currently we use EarlyCSE for two distinct purposes, its possible that it might be time to split them.

Can you justify why this is the right approach?

lib/Transforms/Scalar/EarlyCSE.cpp
504	Huh? This should be handled entirely inside MemorySSA?
555	Having two sets of variables, one integers, one pointers with similar names is highly confusing. I'd suggest pulling out a MemorySSA specific impl function and calling it from here to wrap the desired asserts.
563	This comment doesn't make sense where placed?
674	Code like this strongly hints that MemorySSA should be using ValueHandles.
test/Transforms/EarlyCSE/memoryssa.ll
8	If we do go this way, you'll need far far more tests.

This revision now requires changes to proceed.May 5 2016, 6:31 PM

So, if it's not actually slower in practice, would that address your
objection

In particular, i'm trying to understand if your concerns are *mostly* "we
want to keep this fast", or broader than that.
If they are broader than that, i'd like to understand the objection.
Because the speed one is simply "either we can make it fast enough or we
can't" (and i agree if we can't we shouldn't do it :P)

junbuml added a subscriber: junbuml.May 9 2016, 10:47 AM

Update based on reames review feedback

@reames I've attempted to resolved most of your individual concerns (or at least made them explicit in the change). The bigger question of whether this is worth the compile time remains to be determined. Do you think more tests need to be added in addition to the already existing EarlyCSE tests? Adding additional run lines to those tests to enable -early-cse-use-memoryssa seems like overkill to me, but I don't feel to strongly about it. Or are you more concerned about adding new tests for cases that are only caught by MemorySSA (both positive and negative)?

@dberlin, @george.burgess.iv There are a couple of FIXME comments in this change that identify cases where MemorySSA is maybe being too conservative (e.g. when dealing with fence release instructions and load atomic instructions). Do you think it is reasonable to refine these cases in MemorySSA or is the conservatism restricted to EarlyCSE's usage, in which case we should deal with it in EarlyCSE? Similarly, what do you think of Phillip's suggestion to look at using ValueHandles in MemorySSA to make removal invalidating more automated?

@reames @dberlin Regarding the compile time impact, do you think it would be worth pursuing a change to make EarlyCSE's use of MemorySSA optional? That way we could avoid using it for early passes EarlyCSE and only use it for later ones, perhaps even influenced by optimization level? A related aspect of the plan for MemorySSA that I'd like to understand is how well we think we'll be able to amortize the cost of building it by preserving/maintaining it across passes. Daniel, can you share your thoughts on that?

So, i would like to see real numbers that say this is going to slow down
anything (or speed it up).
As I said, if the objection is speed, yes, we should look into that, and if
something needs to be done, we should do it.

We can amortize the cost quite well. It should essentially cost nearly
nothing past initial setup cost (it's not harder than the SSA updates we do
today, which are not expensive).

The entire plan is actually to amortize the cost.
Right now, the default pass schedules put things that use memdep mostly in
a row.

At the outset, with a little work, we should have to compute memoryssa
twice (once before MLSM/GVN/MemCpyOpt, once before DSE).
Getting all the way to DSE is harder in the sense that it's a longer way to
go to preserve passes.

But it's also interesting to note that none of these passes preserve memdep
today, and the cost of doing memdep queries on every store (as DSE would)
with no cache, should be more than the cost of memoryssa building + usage.

It definitely can be made to be so.
So that part doesn't worry me.

Shoving this in EarlyCSE, if it's fast enough, seems reasonable at a
glance. In a perfect world, it would be good to preserve it everywhere.
I'm not sure, at the beginning, it makes sense to try to preserve it across
tons and tons of passes that won't ever use it, but do touch memory
heavily. So i would expect EarlyCSE to end up as another computation point
for quite a while.

That needs to be traded off past how much better/easier/etc it makes
EarlyCSE.

Update to use MemorySSA in EarlyCSE if it is available, and make it
available for -O3 or greater in first EarlyCSE pass added by
addFunctionSimplificationPasses().

Herald added a subscriber: mehdi_amini. · View Herald TranscriptJun 14 2016, 3:01 PM

gberry retitled this revision from [EarlyCSE] Port to use MemorySSA (disabled by default). NFC. to [EarlyCSE] Use MemorySSA if available..Jun 14 2016, 3:03 PM

gberry updated this object.

Compile time test for the llvm test suite on aarch64 (at -O3) were mostly a wash, some faster, some slower, no big outliers. The net change was slightly better compile times.

Notable performance improvements (no significant regressions):
MultiSource/Benchmarks/Trimaran/enc-md5/enc-md5 -6%
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 -5%
MultiSource/Benchmarks/sim/sim -3%
SingleSource/Benchmarks/McGill/chomp -4%
MultiSource/Benchmarks/Ptrdist/anagram/anagram -2%
spec2000/bzip2 -10%

@reames I've added some additional lit test coverage, is there more lit test coverage you'd like to see?

Hi Geoff, what are the numbers of the top slower ones?

I'm also interested.
We already know of some performance issues related to caching and use
optimization with weird testcases (many many nested blocks) that we are
fixing.
If we have significant perf regressions, it would be useful if for no other
reason than to inform the stuff george is taking a look at.

Here are the worst llvm test-suite compile time regressions. I've filtered out the very small test cases. The data shown are the percent diffs of compile times from 5 different runs with and without the above change.

llvm-test-suite/SingleSource/Benchmarks/Misc/ffbench:normal PASS +1.079%, +2.837%, +17.951%, +21.615%, +33.206%
llvm-test-suite/SingleSource/Benchmarks/Shootout-C++/shootout-cxx-moments:normal PASS +1.148%, +9.801%, +14.607%, +15.686%, +19.707%
llvm-test-suite/MultiSource/Applications/spiff/spiff:normal PASS -0.773%, +2.938%, +13.511%, +16.846%, +21.552%
llvm-test-suite/SingleSource/Benchmarks/Shootout-C++/shootout-cxx-nestedloop:normal PASS -1.623%, +1.635%, +12.996%, +13.561%, +14.354%
llvm-test-suite/MultiSource/Benchmarks/Prolangs-C++/garage/garage:normal PASS +1.092%, +8.865%, +12.830%, +14.206%, +15.722%
llvm-test-suite/MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl:normal PASS +4.676%, +5.196%, +10.725%, +12.248%, +12.967%
llvm-test-suite/MultiSource/Benchmarks/MallocBench/espresso/espresso:normal PASS -2.227%, +4.387%, +9.187%, +11.195%, +13.738%
llvm-test-suite/MultiSource/Benchmarks/BitBench/uuencode/uuencode:normal PASS -0.223%, +3.068%, +7.235%, +9.723%, +18.255%
llvm-test-suite/SingleSource/Benchmarks/Misc-C++/stepanov_container:normal PASS +1.185%, +2.054%, +4.369%, +8.627%, +11.474%
llvm-test-suite/MultiSource/Benchmarks/Prolangs-C++/family/family:normal PASS +1.386%, +1.767%, +4.300%, +10.017%, +19.309%
llvm-test-suite/MultiSource/Benchmarks/mediabench/adpcm/rawcaudio/rawcaudio:normal PASS +1.995%, +2.907%, +3.697%, +12.124%, +12.976%
llvm-test-suite/SingleSource/Benchmarks/Dhrystone/fldry:normal PASS +1.920%, +2.027%, +2.843%, +14.746%, +14.803%
llvm-test-suite/MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4:normal PASS +1.305%, +1.674%, +2.269%, +2.961%, +9.660%

This looks super noisy even at 5 runs :)

@dberlin Yeah, I'm in the process of double checking some of these to make sure my testing methodology was sound.

Updated llvm-test-suite compile time regressions using LNT methodology:
llvm-test-suite/SingleSource/Benchmarks/Misc/flops-6 +4.019%
llvm-test-suite/SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding +4.967%
llvm-test-suite/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect +5.268%
llvm-test-suite/SingleSource/UnitTests/Vectorizer/gcc-loops +5.538%

Looks like the affected benchmarks changed in the new measurements, does that hold for performance improvements as well?

George, can you stare at these quickly and see if any of your caching
changes/etc will help?

(It's fine if not, just trying to avoid duplicating work)

In D19821#462310, @bruno wrote:

Looks like the affected benchmarks changed in the new measurements, does that hold for performance improvements as well?

I don't have that data yet, I'll update when I do.

George, can you stare at these quickly and see if any of your caching changes/etc will help?

That depends on what exactly is slowing the benchmarks down so much. If our usage pattern is query -> remove -> query -> remove, then our cache may become useless, since (worst case) we drop the entire thing on each removal. If we primarily query defs, then this pattern gives us the same effectively-n^2 behavior of MemDep. One of the big goals of the new cache/walker is to allow us to drop as little as possible.

In terms of pure walker/cache speed, the current walker is happy to do a lot of potentially useless work walking phis we can't optimize; the one I'm working on will do as little work as possible in that case. Also, the current walker potentially does a lot of domtree queries when caching results, whereas the one I'm working on does none (except in asserts). Glancing at some of the benchmarks, I'm not sure if any of that is what's slowing us down here, though.

If you'd like, I'm happy to profile/poke around and give you a more definitive answer.

lib/Transforms/Scalar/EarlyCSE.cpp
500	Nit: Please use ///
571	Nit: `return EarlierHeapGen == LaterHeapGen`?

In D19821#462327, @gberry wrote:

In D19821#462310, @bruno wrote:

Looks like the affected benchmarks changed in the new measurements, does that hold for performance improvements as well?

I don't have that data yet, I'll update when I do.

The performance changes are mostly the same. Updated perf data (this is on an A57-like OoO aarch64 core, time deltas (negative is better)):
llvm-test-suite/SingleSource/Benchmarks/CoyoteBench/fftbench -21.811%, -21.633%, -21.488%, -21.074%, -20.747%, -20.168%, -20.084%, -18.987%, -18.615%, -18.487%
llvm-test-suite/MultiSource/Benchmarks/Trimaran/enc-md5/enc-md5 -6.557%, -6.557%, -6.557%, -6.077%, -6.077%, -5.525%, -5.525%, -5.525%, -5.525%, -5.000%
llvm-test-suite/MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 -3.306%, -3.306%, -3.306%, -2.479%, -2.479%, -2.479%, -2.479%, -2.479%, -1.681%, -1.667%
llvm-test-suite/MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo -25.000%, -12.500%, -12.500%, -12.500%, -12.500%, -12.500%, +0.000%, +0.000%, +0.000%, +0.000%
llvm-test-suite/MultiSource/Benchmarks/sim/sim -12.815%, -2.667%, -2.667%, -2.400%, -1.872%, -1.114%, -1.111%, -1.070%, -1.067%, -1.067%
llvm-test-suite/MultiSource/Benchmarks/MallocBench/cfrac/cfrac -2.426%, -2.162%, -1.729%, -1.445%, -1.124%, -1.124%, -1.124%, -1.124%, -1.124%, -0.845%

llvm-test-suite/MultiSource/Benchmarks/mafft/pairlocalalign +0.399%, +0.430%, +0.458%, +0.507%, +0.593%, +0.626%, +0.778%, +1.097%, +1.195%, +1.309%

Update to address George's comments

I re-ran the llvm test-suite compile-time numbers with more samples and found no significant changes (improvements or regressions) in compile time.

I've put this on hold until I can re-run compile-time numbers after George's changes to the MemorySSA caching code go in (http://reviews.llvm.org/D21777)

gberry added a parent revision: D21777: [MemorySSA] Switch to a different walker.Jul 5 2016, 11:00 AM

Sorry for not responding to this for so long.

My objection is primarily from a compile time concern. Right now, EarlyCSE is a *very* cheap pass to run. If you can keep it fast (even when we have to reconstruct MemorySSA) I don't object to having EarlyCSE MemorySSA based. I think that is a very hard bar to pass in practice. In particular, the bar is not total O3 time. It's EarlyCSE time. I fully expect that the more precise analysis may speed up other passes, but we can't assume that happens for all inputs. (As I write this, I'm recognizing that this might be too high a bar to set. If you think I'm being unreasonable, argue why and what a better line should be.)

Given I'm not going to have time to be active involved in this thread, I'm going to defer to other reviewers. If they think this is a good idea, I will not actively block the thread.

p.s. The newer structure of using the original fast path check with memory ssa as a backup which is optional if available is much cleaner than the original code. I'm okay with something like this landing (once other reviewers have signed off) even if the compile time question isn't fully resolved provided that the force memory SSA pass is under an off by default option. As structured, this doesn't complicate the existing code much at all.

New version that adds a pass parameter to control whether MemorySSA is used.

Also changed the memory generation check to do a simpler MemorySSA
dominance check.

Herald added a reviewer: deadalnix. · View Herald TranscriptAug 22 2016, 2:17 PM

gberry retitled this revision from [EarlyCSE] Use MemorySSA if available. to [EarlyCSE] Optionally use MemorySSA. NFC..Aug 22 2016, 2:18 PM

gberry updated this object.

gberry edited edge metadata.

I've collected some compile time stats when enabling MemorySSA EarlyCSE just for the EarlyCSE pass added at the beginning of addFunctionSimplificationPasses at O2 and higher.
There were 8 benchmarks in the llvm test-suite whose compile time increased by more than 1%. The biggest increase was in consumer-typeset. Drilling down a bit, the MemorySSA construction time for compiling the z44.c input to this benchmark is reported as 2% of runtime.

• dberlin added inline comments.Aug 22 2016, 2:27 PM

lib/Transforms/Scalar/EarlyCSE.cpp
532	For loads, you don't have to ask for the clobbering access. It's already optimized such that getDefiningAccess == the clobbering access For stores, not sure if you realize this, but given store q (lets's call this a) x = load p store q (let's call this b) if you call getClobberingMemoryAccess on b, it will return a.

gberry added inline comments.Aug 22 2016, 2:57 PM

lib/Transforms/Scalar/EarlyCSE.cpp
532	For 1., I was not clear on whether this holds true after store removal. For 2., yeah I get this, I'm not sure what you're getting at though. The removal of this second store by EarlyCSE doesn't use MemorySSA to check for intervening loads in this change. It uses the 'LastStore' tracking to know when a store made redundant by a second store can be removed.

• dberlin added inline comments.Aug 22 2016, 3:04 PM

lib/Transforms/Scalar/EarlyCSE.cpp
532	Updates have to make it hold after store removal :) The problem is that if we don't keep this invariant up to date, it means everyone uses getClobberingAccess, which does a bunch of work to discover the load already points to the same thing. Everyone doing that is much higher than the cost of one person updating the dominating def. (there is one case were getClobberingAccess will give you a better answer, and that is on cases where we gave up during use optimization. I only have one testcase this occurs on. We only give up on optimizing a load if it's going to be super expensive, and you probably do not want to try to get better answers in that case). As for updating when you remove stores, you should simply be able to replace any loads the store uses with getClobberingAccess(store) using RAUW. Under the covers, removeMemoryAccess calls RAUW with the DefiningAccess. We could change it to use getClobberingMemoryAccess for loads, and DefiningAccess for stores. ah, okay.

gberry added inline comments.Aug 23 2016, 10:17 AM

lib/Transforms/Scalar/EarlyCSE.cpp
532	Okay, I get why just checking the defining access for loads is better (we get to skip the AA check). For stores, we may be able to do something faster than calling getClobberingAccess(store). We could instead walk up the store defining access chain and stop if we get to a point that dominates the earlier load or a clobbering write. I'll have to time this to see if it makes a difference. I guess it will depend on what percentage of the time the clobber cache has been thrown away. As for updating when removing stores: it seems like doing RAUW getClobberingAccess(store) is not optimal in some cases. For example: store @G1 ; 1 = MD(entry) store @G2 ; 2 = MD(1) store %p ; 3 = MD(2) load @G1 ; MU(3) load @G2 ; MU(3) If we remove 3 and RUAW getClobberingAccess(3) (=2) we get: store @G1 ; 1 = MD(entry) store @G2 ; 2 = MD(1) load @G1 ; MU(2) load @G2 ; MU(2) but the load @G1 would be more precise if it was MU(1) (and the invariant that defining access == clobbering access would be broken). Is this just a compile-time/precision trade-off? Maybe for that reason it makes more sense to let the client decide if they want to do the simple RAUW getClobberingAccess(Store) or optimize each use separately?

Approved by @dberlin over email

Closed by commit rL280279: [EarlyCSE] Optionally use MemorySSA. NFC. (authored by gberry). · Explain WhyAug 31 2016, 12:32 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

LinkAllPasses.h

2 lines

Transforms/

Utils/

MemorySSA.h

7 lines

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

4 lines

Scalar/

EarlyCSE.cpp

136 lines

Utils/

MemorySSA.cpp

2 lines

test/

Transforms/

EarlyCSE/

AArch64/

1 line

1 line

1 line

1 line

1 line

1 line

1 line

33 lines

Diff 60768

include/llvm/LinkAllPasses.h

Show All 37 Lines
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRPrintingPasses.h"		#include "llvm/IR/IRPrintingPasses.h"
#include "llvm/Transforms/IPO.h"		#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/FunctionAttrs.h"		#include "llvm/Transforms/IPO/FunctionAttrs.h"
#include "llvm/Transforms/Instrumentation.h"		#include "llvm/Transforms/Instrumentation.h"
#include "llvm/Transforms/ObjCARC.h"		#include "llvm/Transforms/ObjCARC.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
		#include "llvm/Transforms/Utils/MemorySSA.h"
#include "llvm/Transforms/Utils/SymbolRewriter.h"		#include "llvm/Transforms/Utils/SymbolRewriter.h"
#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"		#include "llvm/Transforms/Utils/UnifyFunctionExitNodes.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include "llvm/Support/Valgrind.h"		#include "llvm/Support/Valgrind.h"
#include <cstdlib>		#include <cstdlib>

namespace {		namespace {
struct ForcePassLinking {		struct ForcePassLinking {
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createLoopUnrollPass();		(void) llvm::createLoopUnrollPass();
(void) llvm::createLoopUnswitchPass();		(void) llvm::createLoopUnswitchPass();
(void) llvm::createLoopVersioningLICMPass();		(void) llvm::createLoopVersioningLICMPass();
(void) llvm::createLoopIdiomPass();		(void) llvm::createLoopIdiomPass();
(void) llvm::createLoopRotatePass();		(void) llvm::createLoopRotatePass();
(void) llvm::createLowerExpectIntrinsicPass();		(void) llvm::createLowerExpectIntrinsicPass();
(void) llvm::createLowerInvokePass();		(void) llvm::createLowerInvokePass();
(void) llvm::createLowerSwitchPass();		(void) llvm::createLowerSwitchPass();
		(void) llvm::createMemorySSAPass();
(void) llvm::createNaryReassociatePass();		(void) llvm::createNaryReassociatePass();
(void) llvm::createObjCARCAAWrapperPass();		(void) llvm::createObjCARCAAWrapperPass();
(void) llvm::createObjCARCAPElimPass();		(void) llvm::createObjCARCAPElimPass();
(void) llvm::createObjCARCExpandPass();		(void) llvm::createObjCARCExpandPass();
(void) llvm::createObjCARCContractPass();		(void) llvm::createObjCARCContractPass();
(void) llvm::createObjCARCOptPass();		(void) llvm::createObjCARCOptPass();
(void) llvm::createPAEvalPass();		(void) llvm::createPAEvalPass();
(void) llvm::createPromoteMemoryToRegisterPass();		(void) llvm::createPromoteMemoryToRegisterPass();
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/MemorySSA.h

Show First 20 Lines • Show All 633 Lines • ▼ Show 20 Lines	public:

void verifyAnalysis() const override;		void verifyAnalysis() const override;
void print(raw_ostream &OS, const Module *M = nullptr) const override;		void print(raw_ostream &OS, const Module *M = nullptr) const override;

private:		private:
std::unique_ptr<MemorySSA> MSSA;		std::unique_ptr<MemorySSA> MSSA;
};		};

		//===--------------------------------------------------------------------===//
		//
		// createMemorySSAPass - This pass builds memory SSA to allow walking memory
		// instructions using a use/def graph.
		//
		FunctionPass *createMemorySSAPass();

/// \brief This is the generic walker interface for walkers of MemorySSA.		/// \brief This is the generic walker interface for walkers of MemorySSA.
/// Walkers are used to be able to further disambiguate the def-use chains		/// Walkers are used to be able to further disambiguate the def-use chains
/// MemorySSA gives you, or otherwise produce better info than MemorySSA gives		/// MemorySSA gives you, or otherwise produce better info than MemorySSA gives
/// you.		/// you.
/// In particular, while the def-use chains provide basic information, and are		/// In particular, while the def-use chains provide basic information, and are
/// guaranteed to give, for example, the nearest may-aliasing MemoryDef for a		/// guaranteed to give, for example, the nearest may-aliasing MemoryDef for a
/// MemoryUse as AliasAnalysis considers it, a user mant want better or other		/// MemoryUse as AliasAnalysis considers it, a user mant want better or other
/// information. In particular, they may want to use SCEV info to further		/// information. In particular, they may want to use SCEV info to further
▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

	Show All 30 Lines
	#include "llvm/Target/TargetMachine.h"			#include "llvm/Target/TargetMachine.h"
	#include "llvm/Transforms/IPO.h"			#include "llvm/Transforms/IPO.h"
	#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"			#include "llvm/Transforms/IPO/ForceFunctionAttrs.h"
	#include "llvm/Transforms/IPO/FunctionAttrs.h"			#include "llvm/Transforms/IPO/FunctionAttrs.h"
	#include "llvm/Transforms/IPO/InferFunctionAttrs.h"			#include "llvm/Transforms/IPO/InferFunctionAttrs.h"
	#include "llvm/Transforms/Instrumentation.h"			#include "llvm/Transforms/Instrumentation.h"
	#include "llvm/Transforms/Scalar.h"			#include "llvm/Transforms/Scalar.h"
	#include "llvm/Transforms/Scalar/GVN.h"			#include "llvm/Transforms/Scalar/GVN.h"
				#include "llvm/Transforms/Utils/MemorySSA.h"
	#include "llvm/Transforms/Vectorize.h"			#include "llvm/Transforms/Vectorize.h"

	using namespace llvm;			using namespace llvm;

	static cl::opt<bool>			static cl::opt<bool>
	RunLoopVectorization("vectorize-loops", cl::Hidden,			RunLoopVectorization("vectorize-loops", cl::Hidden,
	cl::desc("Run the Loop vectorization passes"));			cl::desc("Run the Loop vectorization passes"));

	▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
	void PassManagerBuilder::addFunctionSimplificationPasses(			void PassManagerBuilder::addFunctionSimplificationPasses(
	legacy::PassManagerBase &MPM) {			legacy::PassManagerBase &MPM) {
	// Start of function pass.			// Start of function pass.
	// Break up aggregate allocas, using SSAUpdater.			// Break up aggregate allocas, using SSAUpdater.
	if (UseNewSROA)			if (UseNewSROA)
	MPM.add(createSROAPass());			MPM.add(createSROAPass());
	else			else
	MPM.add(createScalarReplAggregatesPass(-1, false));			MPM.add(createScalarReplAggregatesPass(-1, false));
				if (OptLevel > 2)
				// Add MemorySSA to enhance memory dependency analysis in EarlyCSE.
				MPM.add(createMemorySSAPass());
	MPM.add(createEarlyCSEPass()); // Catch trivial redundancies			MPM.add(createEarlyCSEPass()); // Catch trivial redundancies
	// Speculative execution if the target has divergent branches; otherwise nop.			// Speculative execution if the target has divergent branches; otherwise nop.
	MPM.add(createSpeculativeExecutionIfHasBranchDivergencePass());			MPM.add(createSpeculativeExecutionIfHasBranchDivergencePass());
	MPM.add(createJumpThreadingPass()); // Thread jumps.			MPM.add(createJumpThreadingPass()); // Thread jumps.
	MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals			MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals
	MPM.add(createCFGSimplificationPass()); // Merge & remove BBs			MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
	// Combine silly seq's			// Combine silly seq's
	addInstructionCombiningPass(MPM);			addInstructionCombiningPass(MPM);
	▲ Show 20 Lines • Show All 625 Lines • Show Last 20 Lines

lib/Transforms/Scalar/EarlyCSE.cpp

Show All 26 Lines
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/RecyclingAllocator.h"		#include "llvm/Support/RecyclingAllocator.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
		#include "llvm/Transforms/Utils/MemorySSA.h"
#include <deque>		#include <deque>
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "early-cse"		#define DEBUG_TYPE "early-cse"

STATISTIC(NumSimplify, "Number of instructions simplified or DCE'd");		STATISTIC(NumSimplify, "Number of instructions simplified or DCE'd");
STATISTIC(NumCSE, "Number of instructions CSE'd");		STATISTIC(NumCSE, "Number of instructions CSE'd");
▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines
/// cases so that instcombine and other passes are more effective. It is		/// cases so that instcombine and other passes are more effective. It is
/// expected that a later pass of GVN will catch the interesting/hard cases.		/// expected that a later pass of GVN will catch the interesting/hard cases.
class EarlyCSE {		class EarlyCSE {
public:		public:
const TargetLibraryInfo &TLI;		const TargetLibraryInfo &TLI;
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
DominatorTree &DT;		DominatorTree &DT;
AssumptionCache &AC;		AssumptionCache &AC;
		MemorySSA *MSSA;
typedef RecyclingAllocator<		typedef RecyclingAllocator<
BumpPtrAllocator, ScopedHashTableVal<SimpleValue, Value *>> AllocatorTy;		BumpPtrAllocator, ScopedHashTableVal<SimpleValue, Value *>> AllocatorTy;
typedef ScopedHashTable<SimpleValue, Value *, DenseMapInfo<SimpleValue>,		typedef ScopedHashTable<SimpleValue, Value *, DenseMapInfo<SimpleValue>,
AllocatorTy> ScopedHTType;		AllocatorTy> ScopedHTType;

/// \brief A scoped hash table of the current values of all of our simple		/// \brief A scoped hash table of the current values of all of our simple
/// scalar expressions.		/// scalar expressions.
///		///
Show All 34 Lines	public:
typedef ScopedHashTable<Value , LoadValue, DenseMapInfo<Value >,		typedef ScopedHashTable<Value , LoadValue, DenseMapInfo<Value >,
LoadMapAllocator> LoadHTType;		LoadMapAllocator> LoadHTType;
LoadHTType AvailableLoads;		LoadHTType AvailableLoads;

/// \brief A scoped hash table of the current values of read-only call		/// \brief A scoped hash table of the current values of read-only call
/// values.		/// values.
///		///
/// It uses the same generation count as loads.		/// It uses the same generation count as loads.
typedef ScopedHashTable<CallValue, std::pair<Instruction *, unsigned>>		typedef ScopedHashTable<CallValue, std::pair<Instruction *, unsigned>>
CallHTType;		CallHTType;
		mcrosierUnsubmitted Not Done Reply Inline Actions Can this be committed as a separate change? mcrosier: Can this be committed as a separate change?
CallHTType AvailableCalls;		CallHTType AvailableCalls;

/// \brief This is the current generation of the memory value.		/// \brief This is the current generation of the memory value.
unsigned CurrentGeneration;		unsigned CurrentGeneration;

/// \brief Set up the EarlyCSE runner for a particular function.		/// \brief Set up the EarlyCSE runner for a particular function.
EarlyCSE(const TargetLibraryInfo &TLI, const TargetTransformInfo &TTI,		EarlyCSE(const TargetLibraryInfo &TLI, const TargetTransformInfo &TTI,
DominatorTree &DT, AssumptionCache &AC)		DominatorTree &DT, AssumptionCache &AC, MemorySSA *MSSA)
: TLI(TLI), TTI(TTI), DT(DT), AC(AC), CurrentGeneration(0) {}		: TLI(TLI), TTI(TTI), DT(DT), AC(AC), MSSA(MSSA), CurrentGeneration(0) {}

bool run();		bool run();

private:		private:
// Almost a POD, but needs to call the constructors for the scoped hash		// Almost a POD, but needs to call the constructors for the scoped hash
// tables so that a new scope gets pushed on. These are RAII so that the		// tables so that a new scope gets pushed on. These are RAII so that the
// scope gets popped when the NodeScope is destroyed.		// scope gets popped when the NodeScope is destroyed.
class NodeScope {		class NodeScope {
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	Value getOrCreateResult(Value Inst, Type *ExpectedType) const {
if (LoadInst *LI = dyn_cast<LoadInst>(Inst))		if (LoadInst *LI = dyn_cast<LoadInst>(Inst))
return LI;		return LI;
else if (StoreInst *SI = dyn_cast<StoreInst>(Inst))		else if (StoreInst *SI = dyn_cast<StoreInst>(Inst))
return SI->getValueOperand();		return SI->getValueOperand();
assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");		assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");
return TTI.getOrCreateResultFromMemIntrinsic(cast<IntrinsicInst>(Inst),		return TTI.getOrCreateResultFromMemIntrinsic(cast<IntrinsicInst>(Inst),
ExpectedType);		ExpectedType);
}		}

		bool isSameMemGeneration(unsigned EarlierGeneration, unsigned LaterGeneration,
		Instruction EarlierInst, Instruction LaterInst);
		bool isSameMemGenerationMemSSA(Instruction *EarlierInst,
		Instruction *LaterInst);

		void removeMSSA(Instruction *Inst) {
		if (!MSSA)
		return;
		if (MemoryAccess *MA = MSSA->getMemoryAccess(Inst))
		MSSA->removeMemoryAccess(MA);
		}
};		};
}		}

		// Determine if the memory referenced by LaterInst is from the same heap version
		george.burgess.ivUnsubmitted Done Reply Inline Actions Nit: Please use /// george.burgess.iv: Nit: Please use ///
		// as EarlierInst.
		// This is currently called in two scenarios:
		//
		// load p
		reamesUnsubmitted Not Done Reply Inline Actions Huh? This should be handled entirely inside MemorySSA? reames: Huh? This should be handled entirely inside MemorySSA?
		// ...
		// load p
		//
		// and
		//
		// x = load p
		// ...
		// store x, p
		//
		// in both cases we want to verify that there are no possible writes to the
		// memory referenced by p between the earlier and later instruction.
		bool EarlyCSE::isSameMemGeneration(unsigned EarlierGeneration,
		unsigned LaterGeneration,
		Instruction *EarlierInst,
		Instruction *LaterInst) {
		// Check the simple memory generation tracking first.
		// FIXME: There are cases that the current implementation of MemorySSA/BasicAA
		// won't catch but the simple memory generation tracking will. One issue is
		// that due to the decomposed GEP limit in BasicAA, some non-aliasing memory
		// operations will show up as clobbers in MemorySSA due to BasicAA giving
		// conservative results.
		// FIXME: Another issue is the conservative way MemorySSA treats fence release
		// instructions. These will appear as MemoryClobbers for unordered stores,
		// even though there is no ordering relationship between these two operations.

		DEBUG(if (MSSA &&
		!isSameMemGenerationMemSSA(EarlierInst, LaterInst) &&
		EarlierGeneration == LaterGeneration)
		dberlinUnsubmitted Not Done Reply Inline Actions For loads, you don't have to ask for the clobbering access. It's already optimized such that getDefiningAccess == the clobbering access For stores, not sure if you realize this, but given store q (lets's call this a) x = load p store q (let's call this b) if you call getClobberingMemoryAccess on b, it will return a. dberlin: 1. For loads, you don't have to ask for the clobbering access. It's already optimized such that…
		gberryAuthorUnsubmitted Not Done Reply Inline Actions For 1., I was not clear on whether this holds true after store removal. For 2., yeah I get this, I'm not sure what you're getting at though. The removal of this second store by EarlyCSE doesn't use MemorySSA to check for intervening loads in this change. It uses the 'LastStore' tracking to know when a store made redundant by a second store can be removed. gberry: For 1., I was not clear on whether this holds true after store removal. For 2., yeah I get…
		dberlinUnsubmitted Not Done Reply Inline Actions Updates have to make it hold after store removal :) The problem is that if we don't keep this invariant up to date, it means everyone uses getClobberingAccess, which does a bunch of work to discover the load already points to the same thing. Everyone doing that is much higher than the cost of one person updating the dominating def. (there is one case were getClobberingAccess will give you a better answer, and that is on cases where we gave up during use optimization. I only have one testcase this occurs on. We only give up on optimizing a load if it's going to be super expensive, and you probably do not want to try to get better answers in that case). As for updating when you remove stores, you should simply be able to replace any loads the store uses with getClobberingAccess(store) using RAUW. Under the covers, removeMemoryAccess calls RAUW with the DefiningAccess. We could change it to use getClobberingMemoryAccess for loads, and DefiningAccess for stores. ah, okay. dberlin: 1. Updates have to make it hold after store removal :) The problem is that if we don't keep…
		gberryAuthorUnsubmitted Not Done Reply Inline Actions Okay, I get why just checking the defining access for loads is better (we get to skip the AA check). For stores, we may be able to do something faster than calling getClobberingAccess(store). We could instead walk up the store defining access chain and stop if we get to a point that dominates the earlier load or a clobbering write. I'll have to time this to see if it makes a difference. I guess it will depend on what percentage of the time the clobber cache has been thrown away. As for updating when removing stores: it seems like doing RAUW getClobberingAccess(store) is not optimal in some cases. For example: store @G1 ; 1 = MD(entry) store @G2 ; 2 = MD(1) store %p ; 3 = MD(2) load @G1 ; MU(3) load @G2 ; MU(3) If we remove 3 and RUAW getClobberingAccess(3) (=2) we get: store @G1 ; 1 = MD(entry) store @G2 ; 2 = MD(1) load @G1 ; MU(2) load @G2 ; MU(2) but the load @G1 would be more precise if it was MU(1) (and the invariant that defining access == clobbering access would be broken). Is this just a compile-time/precision trade-off? Maybe for that reason it makes more sense to let the client decide if they want to do the simple RAUW getClobberingAccess(Store) or optimize each use separately? gberry: Okay, I get why just checking the defining access for loads is better (we get to skip the AA…
		dbgs() << "EarlyCSE: MemorySSA heap generation too conservative: "
		<< EarlierInst->getFunction()->getName() << "\n"
		<< " " << *EarlierInst << "\n"
		<< " " << *LaterInst << "\n";);

		if (EarlierGeneration == LaterGeneration)
		return true;

		return isSameMemGenerationMemSSA(EarlierInst, LaterInst);
		}

		bool EarlyCSE::isSameMemGenerationMemSSA(Instruction *EarlierInst,
		Instruction *LaterInst) {
		if (!MSSA)
		return false;

		MemorySSAWalker *MSSAWalker = MSSA->getWalker();

		MemoryAccess *LaterHeapGen = MSSAWalker->getClobberingMemoryAccess(LaterInst);

		if (MSSA->isLiveOnEntryDef(LaterHeapGen))
		return true;

		reamesUnsubmitted Not Done Reply Inline Actions Having two sets of variables, one integers, one pointers with similar names is highly confusing. I'd suggest pulling out a MemorySSA specific impl function and calling it from here to wrap the desired asserts. reames: Having two sets of variables, one integers, one pointers with similar names is highly…
		MemoryAccess *EarlierMA = MSSA->getMemoryAccess(EarlierInst);

		// Handle cases like this:
		// x = load atomic, p
		// ... (no aliasing memory ops)
		// store unordered x, p
		// In this case LaterHeapGen (i.e. the clobbering memory access for the store)
		// is the atomic load (i.e. the EarlierInst MemoryAccess itself).
		mcrosierUnsubmitted Not Done Reply Inline Actions Please capitalize handle and add a period. mcrosier: Please capitalize handle and add a period.
		reamesUnsubmitted Not Done Reply Inline Actions This comment doesn't make sense where placed? reames: This comment doesn't make sense where placed?
		// FIXME: This may be better handled by having MemorySSA be less conservative
		// when deciding if atomic loads should be clobbers or not.
		if (LaterHeapGen == EarlierMA)
		return true;

		MemoryAccess *EarlierHeapGen =
		MSSAWalker->getClobberingMemoryAccess(EarlierInst);
		if (EarlierHeapGen == LaterHeapGen)
		george.burgess.ivUnsubmitted Done Reply Inline Actions Nit: `return EarlierHeapGen == LaterHeapGen`? george.burgess.iv: Nit: `return EarlierHeapGen == LaterHeapGen`?
		return true;

		return false;
		mcrosierUnsubmitted Not Done Reply Inline Actions Please add a descriptive && "Error message!". mcrosier: Please add a descriptive && "Error message!".
		}

bool EarlyCSE::processNode(DomTreeNode *Node) {		bool EarlyCSE::processNode(DomTreeNode *Node) {
bool Changed = false;		bool Changed = false;
BasicBlock *BB = Node->getBlock();		BasicBlock *BB = Node->getBlock();

// If this block has a single predecessor, then the predecessor is the parent		// If this block has a single predecessor, then the predecessor is the parent
// of the domtree node and all of the live out memory values are still current		// of the domtree node and all of the live out memory values are still current
// in this block. If this block has multiple predecessors, then they could		// in this block. If this block has multiple predecessors, then they could
// have invalidated the live-out memory values of our parent value. For now,		// have invalidated the live-out memory values of our parent value. For now,
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	bool EarlyCSE::processNode(DomTreeNode *Node) {
// See if any instructions in the block can be eliminated. If so, do it. If		// See if any instructions in the block can be eliminated. If so, do it. If
// not, add them to AvailableValues.		// not, add them to AvailableValues.
for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {		for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {
Instruction Inst = &I++;		Instruction Inst = &I++;

// Dead instructions should just be removed.		// Dead instructions should just be removed.
if (isInstructionTriviallyDead(Inst, &TLI)) {		if (isInstructionTriviallyDead(Inst, &TLI)) {
DEBUG(dbgs() << "EarlyCSE DCE: " << *Inst << '\n');		DEBUG(dbgs() << "EarlyCSE DCE: " << *Inst << '\n');
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumSimplify;		++NumSimplify;
continue;		continue;
}		}

// Skip assume intrinsics, they don't really have side effects (although		// Skip assume intrinsics, they don't really have side effects (although
// they're marked as such to ensure preservation of control dependencies),		// they're marked as such to ensure preservation of control dependencies),
Show All 20 Lines	if (match(Inst, m_Intrinsic<Intrinsic::experimental_guard>())) {
continue;		continue;
}		}

// If the instruction can be simplified (e.g. X+0 = X) then replace it with		// If the instruction can be simplified (e.g. X+0 = X) then replace it with
// its simpler value.		// its simpler value.
if (Value *V = SimplifyInstruction(Inst, DL, &TLI, &DT, &AC)) {		if (Value *V = SimplifyInstruction(Inst, DL, &TLI, &DT, &AC)) {
DEBUG(dbgs() << "EarlyCSE Simplify: " << Inst << " to: " << V << '\n');		DEBUG(dbgs() << "EarlyCSE Simplify: " << Inst << " to: " << V << '\n');
Inst->replaceAllUsesWith(V);		Inst->replaceAllUsesWith(V);
		// This relies on SimplifyInstruction not removing any instructions that
		// have MemoryAccesses. It may make them dead, in which case they will
		// get removed in the code above and MemorySSA updated correctly.
		// FIXME: Perhaps MemorySSA should use ValueHandles?
		reamesUnsubmitted Not Done Reply Inline Actions Code like this strongly hints that MemorySSA should be using ValueHandles. reames: Code like this strongly hints that MemorySSA should be using ValueHandles.
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumSimplify;		++NumSimplify;
continue;		continue;
}		}

// If this is a simple instruction that we can value number, process it.		// If this is a simple instruction that we can value number, process it.
if (SimpleValue::canHandle(Inst)) {		if (SimpleValue::canHandle(Inst)) {
// See if the instruction has an available value. If so, use it.		// See if the instruction has an available value. If so, use it.
if (Value *V = AvailableValues.lookup(Inst)) {		if (Value *V = AvailableValues.lookup(Inst)) {
DEBUG(dbgs() << "EarlyCSE CSE: " << Inst << " to: " << V << '\n');		DEBUG(dbgs() << "EarlyCSE CSE: " << Inst << " to: " << V << '\n');
if (auto *I = dyn_cast<Instruction>(V))		if (auto *I = dyn_cast<Instruction>(V))
I->andIRFlags(Inst);		I->andIRFlags(Inst);
Inst->replaceAllUsesWith(V);		Inst->replaceAllUsesWith(V);
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumCSE;		++NumCSE;
continue;		continue;
}		}

// Otherwise, just remember that this value is available.		// Otherwise, just remember that this value is available.
AvailableValues.insert(Inst, Inst);		AvailableValues.insert(Inst, Inst);
continue;		continue;
}		}

ParseMemoryInst MemInst(Inst, TTI);		ParseMemoryInst MemInst(Inst, TTI);
// If this is a non-volatile load, process it.		// If this is a non-volatile load, process it.
if (MemInst.isValid() && MemInst.isLoad()) {		if (MemInst.isValid() && MemInst.isLoad()) {
// (conservatively) we can't peak past the ordering implied by this		// (conservatively) we can't peak past the ordering implied by this
// operation, but we can add this load to our set of available values		// operation, but we can add this load to our set of available values
if (MemInst.isVolatile() \|\| !MemInst.isUnordered()) {		if (MemInst.isVolatile() \|\| !MemInst.isUnordered()) {
LastStore = nullptr;		LastStore = nullptr;
++CurrentGeneration;		++CurrentGeneration;
}		}

// If we have an available version of this load, and if it is the right		// If we have an available version of this load, and if it is the right
// generation, replace this instruction.		// generation, replace this instruction.
LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());		LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());
if (InVal.DefInst != nullptr && InVal.Generation == CurrentGeneration &&		if (InVal.DefInst != nullptr &&
InVal.MatchingId == MemInst.getMatchingId() &&		InVal.MatchingId == MemInst.getMatchingId() &&
// We don't yet handle removing loads with ordering of any kind.		// We don't yet handle removing loads with ordering of any kind.
!MemInst.isVolatile() && MemInst.isUnordered() &&		!MemInst.isVolatile() && MemInst.isUnordered() &&
// We can't replace an atomic load with one which isn't also atomic.		// We can't replace an atomic load with one which isn't also atomic.
InVal.IsAtomic >= MemInst.isAtomic()) {		InVal.IsAtomic >= MemInst.isAtomic() &&
		isSameMemGeneration(InVal.Generation, CurrentGeneration,
		InVal.DefInst, Inst)) {
Value *Op = getOrCreateResult(InVal.DefInst, Inst->getType());		Value *Op = getOrCreateResult(InVal.DefInst, Inst->getType());
if (Op != nullptr) {		if (Op != nullptr) {
DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst		DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst
<< " to: " << *InVal.DefInst << '\n');		<< " to: " << *InVal.DefInst << '\n');
if (!Inst->use_empty())		if (!Inst->use_empty())
Inst->replaceAllUsesWith(Op);		Inst->replaceAllUsesWith(Op);
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumCSELoad;		++NumCSELoad;
continue;		continue;
}		}
}		}

// Otherwise, remember that we have this instruction.		// Otherwise, remember that we have this instruction.
Show All 14 Lines	if (Inst->mayReadFromMemory() &&
!(MemInst.isValid() && !MemInst.mayReadFromMemory()))		!(MemInst.isValid() && !MemInst.mayReadFromMemory()))
LastStore = nullptr;		LastStore = nullptr;

// If this is a read-only call, process it.		// If this is a read-only call, process it.
if (CallValue::canHandle(Inst)) {		if (CallValue::canHandle(Inst)) {
// If we have an available version of this call, and if it is the right		// If we have an available version of this call, and if it is the right
// generation, replace this instruction.		// generation, replace this instruction.
std::pair<Instruction *, unsigned> InVal = AvailableCalls.lookup(Inst);		std::pair<Instruction *, unsigned> InVal = AvailableCalls.lookup(Inst);
if (InVal.first != nullptr && InVal.second == CurrentGeneration) {		if (InVal.first != nullptr &&
		isSameMemGeneration(InVal.second, CurrentGeneration, InVal.first,
		Inst)) {
		mcrosierUnsubmitted Not Done Reply Inline Actions Separate commit. mcrosier: Separate commit.
DEBUG(dbgs() << "EarlyCSE CSE CALL: " << *Inst		DEBUG(dbgs() << "EarlyCSE CSE CALL: " << *Inst
<< " to: " << *InVal.first << '\n');		<< " to: " << *InVal.first << '\n');
if (!Inst->use_empty())		if (!Inst->use_empty())
Inst->replaceAllUsesWith(InVal.first);		Inst->replaceAllUsesWith(InVal.first);
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumCSECall;		++NumCSECall;
continue;		continue;
}		}

// Otherwise, remember that we have this instruction.		// Otherwise, remember that we have this instruction.
AvailableCalls.insert(		AvailableCalls.insert(
Inst, std::pair<Instruction *, unsigned>(Inst, CurrentGeneration));		Inst, std::pair<Instruction *, unsigned>(Inst, CurrentGeneration));
		mcrosierUnsubmitted Not Done Reply Inline Actions Separate commit. Maybe typdef? mcrosier: Separate commit. Maybe typdef?
continue;		continue;
}		}

// A release fence requires that all stores complete before it, but does		// A release fence requires that all stores complete before it, but does
// not prevent the reordering of following loads 'before' the fence. As a		// not prevent the reordering of following loads 'before' the fence. As a
// result, we don't need to consider it as writing to memory and don't need		// result, we don't need to consider it as writing to memory and don't need
// to advance the generation. We do need to prevent DSE across the fence,		// to advance the generation. We do need to prevent DSE across the fence,
// but that's handled above.		// but that's handled above.
if (FenceInst *FI = dyn_cast<FenceInst>(Inst))		if (FenceInst *FI = dyn_cast<FenceInst>(Inst))
if (FI->getOrdering() == AtomicOrdering::Release) {		if (FI->getOrdering() == AtomicOrdering::Release) {
assert(Inst->mayReadFromMemory() && "relied on to prevent DSE above");		assert(Inst->mayReadFromMemory() && "relied on to prevent DSE above");
continue;		continue;
}		}

// write back DSE - If we write back the same value we just loaded from		// write back DSE - If we write back the same value we just loaded from
// the same location and haven't passed any intervening writes or ordering		// the same location and haven't passed any intervening writes or ordering
// operations, we can remove the write. The primary benefit is in allowing		// operations, we can remove the write. The primary benefit is in allowing
// the available load table to remain valid and value forward past where		// the available load table to remain valid and value forward past where
// the store originally was.		// the store originally was.
if (MemInst.isValid() && MemInst.isStore()) {		if (MemInst.isValid() && MemInst.isStore()) {
LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());		LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());
if (InVal.DefInst &&		if (InVal.DefInst &&
InVal.DefInst == getOrCreateResult(Inst, InVal.DefInst->getType()) &&		InVal.DefInst == getOrCreateResult(Inst, InVal.DefInst->getType()) &&
InVal.Generation == CurrentGeneration &&
InVal.MatchingId == MemInst.getMatchingId() &&		InVal.MatchingId == MemInst.getMatchingId() &&
// We don't yet handle removing stores with ordering of any kind.		// We don't yet handle removing stores with ordering of any kind.
!MemInst.isVolatile() && MemInst.isUnordered()) {		!MemInst.isVolatile() && MemInst.isUnordered() &&
		isSameMemGeneration(InVal.Generation, CurrentGeneration,
		InVal.DefInst, Inst)) {
assert((!LastStore \|\|		assert((!LastStore \|\|
ParseMemoryInst(LastStore, TTI).getPointerOperand() ==		ParseMemoryInst(LastStore, TTI).getPointerOperand() ==
MemInst.getPointerOperand()) &&		MemInst.getPointerOperand() \|\|
"can't have an intervening store!");		MSSA) &&
		"can't have an intervening store if not using MemorySSA!");
DEBUG(dbgs() << "EarlyCSE DSE (writeback): " << *Inst << '\n');		DEBUG(dbgs() << "EarlyCSE DSE (writeback): " << *Inst << '\n');
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumDSE;		++NumDSE;
// We can avoid incrementing the generation count since we were able		// We can avoid incrementing the generation count since we were able
// to eliminate this store.		// to eliminate this store.
continue;		continue;
}		}
}		}
Show All 15 Lines	if (Inst->mayWriteToMemory()) {
if (LastStore) {		if (LastStore) {
ParseMemoryInst LastStoreMemInst(LastStore, TTI);		ParseMemoryInst LastStoreMemInst(LastStore, TTI);
assert(LastStoreMemInst.isUnordered() &&		assert(LastStoreMemInst.isUnordered() &&
!LastStoreMemInst.isVolatile() &&		!LastStoreMemInst.isVolatile() &&
"Violated invariant");		"Violated invariant");
if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {		if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {
DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore		DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore
<< " due to: " << *Inst << '\n');		<< " due to: " << *Inst << '\n');
		removeMSSA(LastStore);
LastStore->eraseFromParent();		LastStore->eraseFromParent();
Changed = true;		Changed = true;
++NumDSE;		++NumDSE;
LastStore = nullptr;		LastStore = nullptr;
}		}
// fallthrough - we can exploit information about this store		// fallthrough - we can exploit information about this store
}		}

▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
}		}

PreservedAnalyses EarlyCSEPass::run(Function &F,		PreservedAnalyses EarlyCSEPass::run(Function &F,
AnalysisManager<Function> &AM) {		AnalysisManager<Function> &AM) {
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
auto &TTI = AM.getResult<TargetIRAnalysis>(F);		auto &TTI = AM.getResult<TargetIRAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
		auto *MSSA = AM.getCachedResult<MemorySSAAnalysis>(F);

EarlyCSE CSE(TLI, TTI, DT, AC);		EarlyCSE CSE(TLI, TTI, DT, AC, MSSA);

if (!CSE.run())		if (!CSE.run())
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// CSE preserves the dominator tree because it doesn't mutate the CFG.		// CSE preserves the dominator tree because it doesn't mutate the CFG.
// FIXME: Bundle this with other CFG-preservation.		// FIXME: Bundle this with other CFG-preservation.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserve<DominatorTreeAnalysis>();		PA.preserve<DominatorTreeAnalysis>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
		PA.preserve<MemorySSAAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {
/// \brief A simple and fast domtree-based CSE pass.		/// \brief A simple and fast domtree-based CSE pass.
///		///
/// This pass does a simple depth-first walk over the dominator tree,		/// This pass does a simple depth-first walk over the dominator tree,
/// eliminating trivially redundant instructions and using instsimplify to		/// eliminating trivially redundant instructions and using instsimplify to
Show All 11 Lines	public:
bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
		auto *MSSAPass = getAnalysisIfAvailable<MemorySSAWrapperPass>();
		auto *MSSA = MSSAPass ? &MSSAPass->getMSSA() : nullptr;

EarlyCSE CSE(TLI, TTI, DT, AC);		EarlyCSE CSE(TLI, TTI, DT, AC, MSSA);

return CSE.run();		return CSE.run();
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
		AU.addUsedIfAvailable<MemorySSAWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
AU.setPreservesCFG();		AU.setPreservesCFG();
}		}
};		};
}		}

char EarlyCSELegacyPass::ID = 0;		char EarlyCSELegacyPass::ID = 0;

FunctionPass *llvm::createEarlyCSEPass() { return new EarlyCSELegacyPass(); }		FunctionPass *llvm::createEarlyCSEPass() { return new EarlyCSELegacyPass(); }

INITIALIZE_PASS_BEGIN(EarlyCSELegacyPass, "early-cse", "Early CSE", false,		INITIALIZE_PASS_BEGIN(EarlyCSELegacyPass, "early-cse", "Early CSE", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(EarlyCSELegacyPass, "early-cse", "Early CSE", false, false)		INITIALIZE_PASS_END(EarlyCSELegacyPass, "early-cse", "Early CSE", false, false)

lib/Transforms/Utils/MemorySSA.cpp

	Show First 20 Lines • Show All 749 Lines • ▼ Show 20 Lines
	}			}

	void MemorySSAWrapperPass::verifyAnalysis() const { MSSA->verifyMemorySSA(); }			void MemorySSAWrapperPass::verifyAnalysis() const { MSSA->verifyMemorySSA(); }

	void MemorySSAWrapperPass::print(raw_ostream &OS, const Module *M) const {			void MemorySSAWrapperPass::print(raw_ostream &OS, const Module *M) const {
	MSSA->print(OS);			MSSA->print(OS);
	}			}

				FunctionPass *createMemorySSAPass() { return new MemorySSAWrapperPass(); }

	MemorySSAWalker::MemorySSAWalker(MemorySSA *M) : MSSA(M) {}			MemorySSAWalker::MemorySSAWalker(MemorySSA *M) : MSSA(M) {}

	CachingMemorySSAWalker::CachingMemorySSAWalker(MemorySSA M, AliasAnalysis A,			CachingMemorySSAWalker::CachingMemorySSAWalker(MemorySSA M, AliasAnalysis A,
	DominatorTree *D)			DominatorTree *D)
	: MemorySSAWalker(M), AA(A), DT(D) {}			: MemorySSAWalker(M), AA(A), DT(D) {}

	CachingMemorySSAWalker::~CachingMemorySSAWalker() {}			CachingMemorySSAWalker::~CachingMemorySSAWalker() {}

	▲ Show 20 Lines • Show All 377 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/AArch64/intrinsics.ll

	; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -early-cse \| FileCheck %s			; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -early-cse \| FileCheck %s
				; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -basicaa -memoryssa -early-cse \| FileCheck %s
	; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -passes=early-cse \| FileCheck %s			; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -passes=early-cse \| FileCheck %s

	define <4 x i32> @test_cse(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {			define <4 x i32> @test_cse(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {
	entry:			entry:
	; Check that @llvm.aarch64.neon.ld2 is optimized away by Early CSE.			; Check that @llvm.aarch64.neon.ld2 is optimized away by Early CSE.
	; CHECK-LABEL: @test_cse			; CHECK-LABEL: @test_cse
	; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8			; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8
	%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0			%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0
	▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/AArch64/ldstN.ll

	; RUN: opt -S -early-cse < %s \| FileCheck %s			; RUN: opt -S -early-cse < %s \| FileCheck %s
				; RUN: opt -S -basicaa -memoryssa -early-cse < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	declare { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld4.v4i16.p0v4i16(<4 x i16>*)			declare { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld4.v4i16.p0v4i16(<4 x i16>*)

	; Although the store and the ld4 are using the same pointer, the			; Although the store and the ld4 are using the same pointer, the
	; data can not be reused because ld4 accesses multiple elements.			; data can not be reused because ld4 accesses multiple elements.
	define { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @foo() {			define { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @foo() {
	Show All 9 Lines

test/Transforms/EarlyCSE/atomics.ll

	; RUN: opt < %s -S -early-cse \| FileCheck %s			; RUN: opt < %s -S -early-cse \| FileCheck %s
				; RUN: opt < %s -S -basicaa -memoryssa -early-cse \| FileCheck %s

	; CHECK-LABEL: @test12(			; CHECK-LABEL: @test12(
	define i32 @test12(i1 %B, i32* %P1, i32* %P2) {			define i32 @test12(i1 %B, i32* %P1, i32* %P2) {
	%load0 = load i32, i32* %P1			%load0 = load i32, i32* %P1
	%1 = load atomic i32, i32* %P2 seq_cst, align 4			%1 = load atomic i32, i32* %P2 seq_cst, align 4
	%load1 = load i32, i32* %P1			%load1 = load i32, i32* %P1
	%sel = select i1 %B, i32 %load0, i32 %load1			%sel = select i1 %B, i32 %load0, i32 %load1
	ret i32 %sel			ret i32 %sel
	▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/basic.ll

	; RUN: opt < %s -S -early-cse \| FileCheck %s			; RUN: opt < %s -S -early-cse \| FileCheck %s
				; RUN: opt < %s -S -basicaa -memoryssa -early-cse \| FileCheck %s
	; RUN: opt < %s -S -passes=early-cse \| FileCheck %s			; RUN: opt < %s -S -passes=early-cse \| FileCheck %s

	declare void @llvm.assume(i1) nounwind			declare void @llvm.assume(i1) nounwind

	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	define void @test1(i8 %V, i32 *%P) {			define void @test1(i8 %V, i32 *%P) {
	%A = bitcast i64 42 to double ;; dead			%A = bitcast i64 42 to double ;; dead
	%B = add i32 4, 19 ;; constant folds			%B = add i32 4, 19 ;; constant folds
	▲ Show 20 Lines • Show All 269 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/commute.ll

	; RUN: opt < %s -S -early-cse \| FileCheck %s			; RUN: opt < %s -S -early-cse \| FileCheck %s
				; RUN: opt < %s -S -basicaa -memoryssa -early-cse \| FileCheck %s

	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	define void @test1(float %A, float %B, float* %PA, float* %PB) {			define void @test1(float %A, float %B, float* %PA, float* %PB) {
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: store			; CHECK-NEXT: store
	; CHECK-NEXT: store			; CHECK-NEXT: store
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%C = fadd float %A, %B			%C = fadd float %A, %B
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/fence.ll

	; RUN: opt -S -early-cse < %s \| FileCheck %s			; RUN: opt -S -early-cse < %s \| FileCheck %s
				; RUN: opt < %s -S -basicaa -memoryssa -early-cse \| FileCheck %s
	; NOTE: This file is testing the current implementation. Some of			; NOTE: This file is testing the current implementation. Some of
	; the transforms used as negative tests below would be legal, but			; the transforms used as negative tests below would be legal, but
	; only if reached through a chain of logic which EarlyCSE is incapable			; only if reached through a chain of logic which EarlyCSE is incapable
	; of performing. To say it differently, this file tests a conservative			; of performing. To say it differently, this file tests a conservative
	; version of the memory model. If we want to extend EarlyCSE to be more			; version of the memory model. If we want to extend EarlyCSE to be more
	; aggressive in the future, we may need to relax some of the negative tests.			; aggressive in the future, we may need to relax some of the negative tests.

	; We can value forward across the fence since we can (semantically)			; We can value forward across the fence since we can (semantically)
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/guards.ll

	; RUN: opt -S -early-cse < %s \| FileCheck %s			; RUN: opt -S -early-cse < %s \| FileCheck %s
				; RUN: opt < %s -S -basicaa -memoryssa -early-cse \| FileCheck %s

	declare void @llvm.experimental.guard(i1,...)			declare void @llvm.experimental.guard(i1,...)

	define i32 @test0(i32* %ptr, i1 %cond) {			define i32 @test0(i32* %ptr, i1 %cond) {
	; We can do store to load forwarding over a guard, since it does not			; We can do store to load forwarding over a guard, since it does not
	; clobber memory			; clobber memory

	; CHECK-LABEL: @test0(			; CHECK-LABEL: @test0(
	▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/memoryssa.ll

This file was added.

				; RUN: opt < %s -S -early-cse \| FileCheck %s --check-prefix=CHECK-NOMEMSSA
				; RUN: opt < %s -S -basicaa -memoryssa -early-cse \| FileCheck %s
				; RUN: opt < %s -S -aa-pipeline=basic-aa -passes='require<memoryssa>,early-cse' \| FileCheck %s

				@G1 = global i32 zeroinitializer
				@G2 = global i32 zeroinitializer

				;; Simple load value numbering across non-clobbering store.
				reamesUnsubmitted Not Done Reply Inline Actions If we do go this way, you'll need far far more tests. reames: If we do go this way, you'll need far far more tests.
				; CHECK-LABEL: @test1(
				; CHECK-NOMEMSSA-LABEL: @test1(
				define i32 @test1() {
				%V1 = load i32, i32* @G1
				store i32 0, i32* @G2
				%V2 = load i32, i32* @G1
				; CHECK-NOMEMSSA: sub i32 %V1, %V2
				%Diff = sub i32 %V1, %V2
				ret i32 %Diff
				; CHECK: ret i32 0
				}

				;; Simple dead store elimination across non-clobbering store.
				; CHECK-LABEL: @test2(
				; CHECK-NOMEMSSA-LABEL: @test2(
				define void @test2() {
				entry:
				%V1 = load i32, i32* @G1
				; CHECK: store i32 0, i32* @G2
				store i32 0, i32* @G2
				; CHECK-NOT: store
				; CHECK-NOMEMSSA: store i32 %V1, i32* @G1
				store i32 %V1, i32* @G1
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[EarlyCSE] Optionally use MemorySSA. NFC.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 60768

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Utils/MemorySSA.h

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/Scalar/EarlyCSE.cpp

lib/Transforms/Utils/MemorySSA.cpp

test/Transforms/EarlyCSE/AArch64/intrinsics.ll

test/Transforms/EarlyCSE/AArch64/ldstN.ll

test/Transforms/EarlyCSE/atomics.ll

test/Transforms/EarlyCSE/basic.ll

test/Transforms/EarlyCSE/commute.ll

test/Transforms/EarlyCSE/fence.ll

test/Transforms/EarlyCSE/guards.ll

test/Transforms/EarlyCSE/memoryssa.ll

[EarlyCSE] Optionally use MemorySSA. NFC.
ClosedPublic