This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/
-
llvm-c/Transforms/
-
Transforms/
-
Scalar.h
-
llvm/
-
InitializePasses.h
-
Transforms/
-
Scalar.h
-
Scalar/
-
EarlyCSE.h
-
lib/
-
Passes/
-
PassRegistry.def
-
Transforms/Scalar/
-
Scalar/
2/15
EarlyCSE.cpp
-
Scalar.cpp
-
test/Transforms/EarlyCSE/
-
Transforms/
-
EarlyCSE/
-
AArch64/
-
intrinsics.ll
-
ldstN.ll
-
atomics.ll
-
basic.ll
-
commute.ll
-
conditional.ll
-
edge.ll
-
fence.ll
-
flags.ll
-
floatingpoint.ll
-
guards.ll
-
instsimplify-dom.ll
-
invariant-loads.ll
1
memoryssa.ll
-
read-reg.ll

Differential D19821

[EarlyCSE] Optionally use MemorySSA. NFC.
ClosedPublic

Authored by gberry on May 2 2016, 12:18 PM.

Download Raw Diff

Details

Reviewers

reames
majnemer
gberry
• dberlin
sanjoy
deadalnix

Commits

rG8d84605f25d9: [EarlyCSE] Optionally use MemorySSA. NFC.
rL280279: [EarlyCSE] Optionally use MemorySSA. NFC.

Summary

Use MemorySSA, if requested, to do less conservative memory dependency checking.

This change doesn't enable the MemorySSA enhanced EarlyCSE in the default pipelines, so should be NFC.

Diff Detail

Event Timeline

gberry updated this revision to Diff 55861.May 2 2016, 12:18 PM

gberry retitled this revision from to [EarlyCSE] Port to use MemorySSA (disabled by default). NFC..

gberry updated this object.

gberry added reviewers: • dberlin, sanjoy, reames, majnemer.

gberry added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptMay 2 2016, 12:18 PM

gberry added a parent revision: D19664: [MemorySSA] Port to new pass manager.May 2 2016, 12:19 PM

A few nits in passing.

lib/Transforms/Scalar/EarlyCSE.cpp
308–309	Can this be committed as a separate change?
568	Please capitalize handle and add a period.
579	Please add a descriptive && "Error message!".
749–751	Separate commit.
765	Separate commit. Maybe typdef?

george.burgess.iv added a subscriber: george.burgess.iv.May 4 2016, 12:45 PM

At a meta level, I'm not convinced that updating EarlyCSE to work with MemorySSA is the right approach. EarlyCSE is focused on being really really fast at cleaning up stupidly redundant IR so that the rest of the pass pipeline doesn't need to worry about it. MemorySSA is relatively expensive to construct. Given that, I'm really not sure putting it at the very beginning of the pipeline is a good design choice.

Now, having said that, it might make sense to have a dominance order CSE pass based on MemorySSA for use later in the pass pipeline. Currently we use EarlyCSE for two distinct purposes, its possible that it might be time to split them.

Can you justify why this is the right approach?

lib/Transforms/Scalar/EarlyCSE.cpp
509	Huh? This should be handled entirely inside MemorySSA?
560	Having two sets of variables, one integers, one pointers with similar names is highly confusing. I'd suggest pulling out a MemorySSA specific impl function and calling it from here to wrap the desired asserts.
568	This comment doesn't make sense where placed?
650	Code like this strongly hints that MemorySSA should be using ValueHandles.
test/Transforms/EarlyCSE/memoryssa.ll
8	If we do go this way, you'll need far far more tests.

This revision now requires changes to proceed.May 5 2016, 6:31 PM

So, if it's not actually slower in practice, would that address your
objection

In particular, i'm trying to understand if your concerns are *mostly* "we
want to keep this fast", or broader than that.
If they are broader than that, i'd like to understand the objection.
Because the speed one is simply "either we can make it fast enough or we
can't" (and i agree if we can't we shouldn't do it :P)

junbuml added a subscriber: junbuml.May 9 2016, 10:47 AM

Update based on reames review feedback

@reames I've attempted to resolved most of your individual concerns (or at least made them explicit in the change). The bigger question of whether this is worth the compile time remains to be determined. Do you think more tests need to be added in addition to the already existing EarlyCSE tests? Adding additional run lines to those tests to enable -early-cse-use-memoryssa seems like overkill to me, but I don't feel to strongly about it. Or are you more concerned about adding new tests for cases that are only caught by MemorySSA (both positive and negative)?

@dberlin, @george.burgess.iv There are a couple of FIXME comments in this change that identify cases where MemorySSA is maybe being too conservative (e.g. when dealing with fence release instructions and load atomic instructions). Do you think it is reasonable to refine these cases in MemorySSA or is the conservatism restricted to EarlyCSE's usage, in which case we should deal with it in EarlyCSE? Similarly, what do you think of Phillip's suggestion to look at using ValueHandles in MemorySSA to make removal invalidating more automated?

@reames @dberlin Regarding the compile time impact, do you think it would be worth pursuing a change to make EarlyCSE's use of MemorySSA optional? That way we could avoid using it for early passes EarlyCSE and only use it for later ones, perhaps even influenced by optimization level? A related aspect of the plan for MemorySSA that I'd like to understand is how well we think we'll be able to amortize the cost of building it by preserving/maintaining it across passes. Daniel, can you share your thoughts on that?

So, i would like to see real numbers that say this is going to slow down
anything (or speed it up).
As I said, if the objection is speed, yes, we should look into that, and if
something needs to be done, we should do it.

We can amortize the cost quite well. It should essentially cost nearly
nothing past initial setup cost (it's not harder than the SSA updates we do
today, which are not expensive).

The entire plan is actually to amortize the cost.
Right now, the default pass schedules put things that use memdep mostly in
a row.

At the outset, with a little work, we should have to compute memoryssa
twice (once before MLSM/GVN/MemCpyOpt, once before DSE).
Getting all the way to DSE is harder in the sense that it's a longer way to
go to preserve passes.

But it's also interesting to note that none of these passes preserve memdep
today, and the cost of doing memdep queries on every store (as DSE would)
with no cache, should be more than the cost of memoryssa building + usage.

It definitely can be made to be so.
So that part doesn't worry me.

Shoving this in EarlyCSE, if it's fast enough, seems reasonable at a
glance. In a perfect world, it would be good to preserve it everywhere.
I'm not sure, at the beginning, it makes sense to try to preserve it across
tons and tons of passes that won't ever use it, but do touch memory
heavily. So i would expect EarlyCSE to end up as another computation point
for quite a while.

That needs to be traded off past how much better/easier/etc it makes
EarlyCSE.

Update to use MemorySSA in EarlyCSE if it is available, and make it
available for -O3 or greater in first EarlyCSE pass added by
addFunctionSimplificationPasses().

Herald added a subscriber: mehdi_amini. · View Herald TranscriptJun 14 2016, 3:01 PM

gberry retitled this revision from [EarlyCSE] Port to use MemorySSA (disabled by default). NFC. to [EarlyCSE] Use MemorySSA if available..Jun 14 2016, 3:03 PM

gberry updated this object.

Compile time test for the llvm test suite on aarch64 (at -O3) were mostly a wash, some faster, some slower, no big outliers. The net change was slightly better compile times.

Notable performance improvements (no significant regressions):
MultiSource/Benchmarks/Trimaran/enc-md5/enc-md5 -6%
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 -5%
MultiSource/Benchmarks/sim/sim -3%
SingleSource/Benchmarks/McGill/chomp -4%
MultiSource/Benchmarks/Ptrdist/anagram/anagram -2%
spec2000/bzip2 -10%

@reames I've added some additional lit test coverage, is there more lit test coverage you'd like to see?

Hi Geoff, what are the numbers of the top slower ones?

I'm also interested.
We already know of some performance issues related to caching and use
optimization with weird testcases (many many nested blocks) that we are
fixing.
If we have significant perf regressions, it would be useful if for no other
reason than to inform the stuff george is taking a look at.

Here are the worst llvm test-suite compile time regressions. I've filtered out the very small test cases. The data shown are the percent diffs of compile times from 5 different runs with and without the above change.

llvm-test-suite/SingleSource/Benchmarks/Misc/ffbench:normal PASS +1.079%, +2.837%, +17.951%, +21.615%, +33.206%
llvm-test-suite/SingleSource/Benchmarks/Shootout-C++/shootout-cxx-moments:normal PASS +1.148%, +9.801%, +14.607%, +15.686%, +19.707%
llvm-test-suite/MultiSource/Applications/spiff/spiff:normal PASS -0.773%, +2.938%, +13.511%, +16.846%, +21.552%
llvm-test-suite/SingleSource/Benchmarks/Shootout-C++/shootout-cxx-nestedloop:normal PASS -1.623%, +1.635%, +12.996%, +13.561%, +14.354%
llvm-test-suite/MultiSource/Benchmarks/Prolangs-C++/garage/garage:normal PASS +1.092%, +8.865%, +12.830%, +14.206%, +15.722%
llvm-test-suite/MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl:normal PASS +4.676%, +5.196%, +10.725%, +12.248%, +12.967%
llvm-test-suite/MultiSource/Benchmarks/MallocBench/espresso/espresso:normal PASS -2.227%, +4.387%, +9.187%, +11.195%, +13.738%
llvm-test-suite/MultiSource/Benchmarks/BitBench/uuencode/uuencode:normal PASS -0.223%, +3.068%, +7.235%, +9.723%, +18.255%
llvm-test-suite/SingleSource/Benchmarks/Misc-C++/stepanov_container:normal PASS +1.185%, +2.054%, +4.369%, +8.627%, +11.474%
llvm-test-suite/MultiSource/Benchmarks/Prolangs-C++/family/family:normal PASS +1.386%, +1.767%, +4.300%, +10.017%, +19.309%
llvm-test-suite/MultiSource/Benchmarks/mediabench/adpcm/rawcaudio/rawcaudio:normal PASS +1.995%, +2.907%, +3.697%, +12.124%, +12.976%
llvm-test-suite/SingleSource/Benchmarks/Dhrystone/fldry:normal PASS +1.920%, +2.027%, +2.843%, +14.746%, +14.803%
llvm-test-suite/MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4:normal PASS +1.305%, +1.674%, +2.269%, +2.961%, +9.660%

This looks super noisy even at 5 runs :)

@dberlin Yeah, I'm in the process of double checking some of these to make sure my testing methodology was sound.

Updated llvm-test-suite compile time regressions using LNT methodology:
llvm-test-suite/SingleSource/Benchmarks/Misc/flops-6 +4.019%
llvm-test-suite/SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding +4.967%
llvm-test-suite/SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect +5.268%
llvm-test-suite/SingleSource/UnitTests/Vectorizer/gcc-loops +5.538%

Looks like the affected benchmarks changed in the new measurements, does that hold for performance improvements as well?

George, can you stare at these quickly and see if any of your caching
changes/etc will help?

(It's fine if not, just trying to avoid duplicating work)

In D19821#462310, @bruno wrote:

Looks like the affected benchmarks changed in the new measurements, does that hold for performance improvements as well?

I don't have that data yet, I'll update when I do.

George, can you stare at these quickly and see if any of your caching changes/etc will help?

That depends on what exactly is slowing the benchmarks down so much. If our usage pattern is query -> remove -> query -> remove, then our cache may become useless, since (worst case) we drop the entire thing on each removal. If we primarily query defs, then this pattern gives us the same effectively-n^2 behavior of MemDep. One of the big goals of the new cache/walker is to allow us to drop as little as possible.

In terms of pure walker/cache speed, the current walker is happy to do a lot of potentially useless work walking phis we can't optimize; the one I'm working on will do as little work as possible in that case. Also, the current walker potentially does a lot of domtree queries when caching results, whereas the one I'm working on does none (except in asserts). Glancing at some of the benchmarks, I'm not sure if any of that is what's slowing us down here, though.

If you'd like, I'm happy to profile/poke around and give you a more definitive answer.

lib/Transforms/Scalar/EarlyCSE.cpp
505	Nit: Please use ///
576	Nit: `return EarlierHeapGen == LaterHeapGen`?

In D19821#462327, @gberry wrote:

In D19821#462310, @bruno wrote:

Looks like the affected benchmarks changed in the new measurements, does that hold for performance improvements as well?

I don't have that data yet, I'll update when I do.

The performance changes are mostly the same. Updated perf data (this is on an A57-like OoO aarch64 core, time deltas (negative is better)):
llvm-test-suite/SingleSource/Benchmarks/CoyoteBench/fftbench -21.811%, -21.633%, -21.488%, -21.074%, -20.747%, -20.168%, -20.084%, -18.987%, -18.615%, -18.487%
llvm-test-suite/MultiSource/Benchmarks/Trimaran/enc-md5/enc-md5 -6.557%, -6.557%, -6.557%, -6.077%, -6.077%, -5.525%, -5.525%, -5.525%, -5.525%, -5.000%
llvm-test-suite/MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 -3.306%, -3.306%, -3.306%, -2.479%, -2.479%, -2.479%, -2.479%, -2.479%, -1.681%, -1.667%
llvm-test-suite/MultiSource/Benchmarks/Prolangs-C/gnugo/gnugo -25.000%, -12.500%, -12.500%, -12.500%, -12.500%, -12.500%, +0.000%, +0.000%, +0.000%, +0.000%
llvm-test-suite/MultiSource/Benchmarks/sim/sim -12.815%, -2.667%, -2.667%, -2.400%, -1.872%, -1.114%, -1.111%, -1.070%, -1.067%, -1.067%
llvm-test-suite/MultiSource/Benchmarks/MallocBench/cfrac/cfrac -2.426%, -2.162%, -1.729%, -1.445%, -1.124%, -1.124%, -1.124%, -1.124%, -1.124%, -0.845%

llvm-test-suite/MultiSource/Benchmarks/mafft/pairlocalalign +0.399%, +0.430%, +0.458%, +0.507%, +0.593%, +0.626%, +0.778%, +1.097%, +1.195%, +1.309%

Update to address George's comments

I re-ran the llvm test-suite compile-time numbers with more samples and found no significant changes (improvements or regressions) in compile time.

I've put this on hold until I can re-run compile-time numbers after George's changes to the MemorySSA caching code go in (http://reviews.llvm.org/D21777)

gberry added a parent revision: D21777: [MemorySSA] Switch to a different walker.Jul 5 2016, 11:00 AM

Sorry for not responding to this for so long.

My objection is primarily from a compile time concern. Right now, EarlyCSE is a *very* cheap pass to run. If you can keep it fast (even when we have to reconstruct MemorySSA) I don't object to having EarlyCSE MemorySSA based. I think that is a very hard bar to pass in practice. In particular, the bar is not total O3 time. It's EarlyCSE time. I fully expect that the more precise analysis may speed up other passes, but we can't assume that happens for all inputs. (As I write this, I'm recognizing that this might be too high a bar to set. If you think I'm being unreasonable, argue why and what a better line should be.)

Given I'm not going to have time to be active involved in this thread, I'm going to defer to other reviewers. If they think this is a good idea, I will not actively block the thread.

p.s. The newer structure of using the original fast path check with memory ssa as a backup which is optional if available is much cleaner than the original code. I'm okay with something like this landing (once other reviewers have signed off) even if the compile time question isn't fully resolved provided that the force memory SSA pass is under an off by default option. As structured, this doesn't complicate the existing code much at all.

New version that adds a pass parameter to control whether MemorySSA is used.

Also changed the memory generation check to do a simpler MemorySSA
dominance check.

Herald added a reviewer: deadalnix. · View Herald TranscriptAug 22 2016, 2:17 PM

gberry retitled this revision from [EarlyCSE] Use MemorySSA if available. to [EarlyCSE] Optionally use MemorySSA. NFC..Aug 22 2016, 2:18 PM

gberry updated this object.

gberry edited edge metadata.

I've collected some compile time stats when enabling MemorySSA EarlyCSE just for the EarlyCSE pass added at the beginning of addFunctionSimplificationPasses at O2 and higher.
There were 8 benchmarks in the llvm test-suite whose compile time increased by more than 1%. The biggest increase was in consumer-typeset. Drilling down a bit, the MemorySSA construction time for compiling the z44.c input to this benchmark is reported as 2% of runtime.

• dberlin added inline comments.Aug 22 2016, 2:27 PM

lib/Transforms/Scalar/EarlyCSE.cpp
537	For loads, you don't have to ask for the clobbering access. It's already optimized such that getDefiningAccess == the clobbering access For stores, not sure if you realize this, but given store q (lets's call this a) x = load p store q (let's call this b) if you call getClobberingMemoryAccess on b, it will return a.

gberry added inline comments.Aug 22 2016, 2:57 PM

lib/Transforms/Scalar/EarlyCSE.cpp
537	For 1., I was not clear on whether this holds true after store removal. For 2., yeah I get this, I'm not sure what you're getting at though. The removal of this second store by EarlyCSE doesn't use MemorySSA to check for intervening loads in this change. It uses the 'LastStore' tracking to know when a store made redundant by a second store can be removed.

• dberlin added inline comments.Aug 22 2016, 3:04 PM

lib/Transforms/Scalar/EarlyCSE.cpp
537	Updates have to make it hold after store removal :) The problem is that if we don't keep this invariant up to date, it means everyone uses getClobberingAccess, which does a bunch of work to discover the load already points to the same thing. Everyone doing that is much higher than the cost of one person updating the dominating def. (there is one case were getClobberingAccess will give you a better answer, and that is on cases where we gave up during use optimization. I only have one testcase this occurs on. We only give up on optimizing a load if it's going to be super expensive, and you probably do not want to try to get better answers in that case). As for updating when you remove stores, you should simply be able to replace any loads the store uses with getClobberingAccess(store) using RAUW. Under the covers, removeMemoryAccess calls RAUW with the DefiningAccess. We could change it to use getClobberingMemoryAccess for loads, and DefiningAccess for stores. ah, okay.

gberry added inline comments.Aug 23 2016, 10:17 AM

lib/Transforms/Scalar/EarlyCSE.cpp
537	Okay, I get why just checking the defining access for loads is better (we get to skip the AA check). For stores, we may be able to do something faster than calling getClobberingAccess(store). We could instead walk up the store defining access chain and stop if we get to a point that dominates the earlier load or a clobbering write. I'll have to time this to see if it makes a difference. I guess it will depend on what percentage of the time the clobber cache has been thrown away. As for updating when removing stores: it seems like doing RAUW getClobberingAccess(store) is not optimal in some cases. For example: store @G1 ; 1 = MD(entry) store @G2 ; 2 = MD(1) store %p ; 3 = MD(2) load @G1 ; MU(3) load @G2 ; MU(3) If we remove 3 and RUAW getClobberingAccess(3) (=2) we get: store @G1 ; 1 = MD(entry) store @G2 ; 2 = MD(1) load @G1 ; MU(2) load @G2 ; MU(2) but the load @G1 would be more precise if it was MU(1) (and the invariant that defining access == clobbering access would be broken). Is this just a compile-time/precision trade-off? Maybe for that reason it makes more sense to let the client decide if they want to do the simple RAUW getClobberingAccess(Store) or optimize each use separately?

Approved by @dberlin over email

Closed by commit rL280279: [EarlyCSE] Optionally use MemorySSA. NFC. (authored by gberry). · Explain WhyAug 31 2016, 12:32 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm-c/

Transforms/

Scalar.h

2 lines

llvm/

InitializePasses.h

1 line

Transforms/

Scalar.h

2 lines

Scalar/

EarlyCSE.h

4 lines

lib/

Passes/

PassRegistry.def

3 lines

Transforms/

Scalar/

EarlyCSE.cpp

135 lines

Scalar.cpp

5 lines

test/

Transforms/

EarlyCSE/

AArch64/

2 lines

1 line

1 line

1 line

1 line

1 line

1 line

1 line

1 line

1 line

1 line

1 line

15 lines

34 lines

1 line

Diff 68911

include/llvm-c/Transforms/Scalar.h

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines

	/** See llvm::createVerifierPass function. */			/** See llvm::createVerifierPass function. */
	void LLVMAddVerifierPass(LLVMPassManagerRef PM);			void LLVMAddVerifierPass(LLVMPassManagerRef PM);

	/** See llvm::createCorrelatedValuePropagationPass function */			/** See llvm::createCorrelatedValuePropagationPass function */
	void LLVMAddCorrelatedValuePropagationPass(LLVMPassManagerRef PM);			void LLVMAddCorrelatedValuePropagationPass(LLVMPassManagerRef PM);

	/** See llvm::createEarlyCSEPass function */			/** See llvm::createEarlyCSEPass function */
	void LLVMAddEarlyCSEPass(LLVMPassManagerRef PM);			void LLVMAddEarlyCSEPass(LLVMPassManagerRef PM, int UseMemorySSA);

	/** See llvm::createLowerExpectIntrinsicPass function */			/** See llvm::createLowerExpectIntrinsicPass function */
	void LLVMAddLowerExpectIntrinsicPass(LLVMPassManagerRef PM);			void LLVMAddLowerExpectIntrinsicPass(LLVMPassManagerRef PM);

	/** See llvm::createTypeBasedAliasAnalysisPass function */			/** See llvm::createTypeBasedAliasAnalysisPass function */
	void LLVMAddTypeBasedAliasAnalysisPass(LLVMPassManagerRef PM);			void LLVMAddTypeBasedAliasAnalysisPass(LLVMPassManagerRef PM);

	/** See llvm::createScopedNoAliasAAPass function */			/** See llvm::createScopedNoAliasAAPass function */
	Show All 14 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	void initializeDomOnlyPrinterPass(PassRegistry&);			void initializeDomOnlyPrinterPass(PassRegistry&);
	void initializeDomOnlyViewerPass(PassRegistry&);			void initializeDomOnlyViewerPass(PassRegistry&);
	void initializeDomPrinterPass(PassRegistry&);			void initializeDomPrinterPass(PassRegistry&);
	void initializeDomViewerPass(PassRegistry&);			void initializeDomViewerPass(PassRegistry&);
	void initializeDominanceFrontierWrapperPassPass(PassRegistry&);			void initializeDominanceFrontierWrapperPassPass(PassRegistry&);
	void initializeDominatorTreeWrapperPassPass(PassRegistry&);			void initializeDominatorTreeWrapperPassPass(PassRegistry&);
	void initializeDwarfEHPreparePass(PassRegistry&);			void initializeDwarfEHPreparePass(PassRegistry&);
	void initializeEarlyCSELegacyPassPass(PassRegistry &);			void initializeEarlyCSELegacyPassPass(PassRegistry &);
				void initializeEarlyCSEMemSSALegacyPassPass(PassRegistry &);
	void initializeEarlyIfConverterPass(PassRegistry&);			void initializeEarlyIfConverterPass(PassRegistry&);
	void initializeEdgeBundlesPass(PassRegistry&);			void initializeEdgeBundlesPass(PassRegistry&);
	void initializeEfficiencySanitizerPass(PassRegistry&);			void initializeEfficiencySanitizerPass(PassRegistry&);
	void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry &);			void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry &);
	void initializeGVNHoistLegacyPassPass(PassRegistry &);			void initializeGVNHoistLegacyPassPass(PassRegistry &);
	void initializeExpandISelPseudosPass(PassRegistry&);			void initializeExpandISelPseudosPass(PassRegistry&);
	void initializeExpandPostRAPass(PassRegistry&);			void initializeExpandPostRAPass(PassRegistry&);
	void initializeExternalAAWrapperPassPass(PassRegistry&);			void initializeExternalAAWrapperPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 221 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines
	Pass *createLCSSAPass();			Pass *createLCSSAPass();
	extern char &LCSSAID;			extern char &LCSSAID;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// EarlyCSE - This pass performs a simple and fast CSE pass over the dominator			// EarlyCSE - This pass performs a simple and fast CSE pass over the dominator
	// tree.			// tree.
	//			//
	FunctionPass *createEarlyCSEPass();			FunctionPass *createEarlyCSEPass(bool UseMemorySSA = false);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// GVNHoist - This pass performs a simple and fast GVN pass over the dominator			// GVNHoist - This pass performs a simple and fast GVN pass over the dominator
	// tree to hoist common expressions from sibling branches.			// tree to hoist common expressions from sibling branches.
	//			//
	FunctionPass *createGVNHoistPass();			FunctionPass *createGVNHoistPass();

	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar/EarlyCSE.h

	Show All 21 Lines
	/// \brief A simple and fast domtree-based CSE pass.			/// \brief A simple and fast domtree-based CSE pass.
	///			///
	/// This pass does a simple depth-first walk over the dominator tree,			/// This pass does a simple depth-first walk over the dominator tree,
	/// eliminating trivially redundant instructions and using instsimplify to			/// eliminating trivially redundant instructions and using instsimplify to
	/// canonicalize things as it goes. It is intended to be fast and catch obvious			/// canonicalize things as it goes. It is intended to be fast and catch obvious
	/// cases so that instcombine and other passes are more effective. It is			/// cases so that instcombine and other passes are more effective. It is
	/// expected that a later pass of GVN will catch the interesting/hard cases.			/// expected that a later pass of GVN will catch the interesting/hard cases.
	struct EarlyCSEPass : PassInfoMixin<EarlyCSEPass> {			struct EarlyCSEPass : PassInfoMixin<EarlyCSEPass> {
				EarlyCSEPass(bool UseMemorySSA = false) : UseMemorySSA(UseMemorySSA) {}

	/// \brief Run the pass over the function.			/// \brief Run the pass over the function.
	PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);			PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);

				bool UseMemorySSA;
	};			};

	}			}

	#endif			#endif

lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("add-discriminators", AddDiscriminatorsPass())			FUNCTION_PASS("add-discriminators", AddDiscriminatorsPass())
	FUNCTION_PASS("alignment-from-assumptions", AlignmentFromAssumptionsPass())			FUNCTION_PASS("alignment-from-assumptions", AlignmentFromAssumptionsPass())
	FUNCTION_PASS("bdce", BDCEPass())			FUNCTION_PASS("bdce", BDCEPass())
	FUNCTION_PASS("break-crit-edges", BreakCriticalEdgesPass())			FUNCTION_PASS("break-crit-edges", BreakCriticalEdgesPass())
	FUNCTION_PASS("consthoist", ConstantHoistingPass())			FUNCTION_PASS("consthoist", ConstantHoistingPass())
	FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())			FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())
	FUNCTION_PASS("dce", DCEPass())			FUNCTION_PASS("dce", DCEPass())
	FUNCTION_PASS("dse", DSEPass())			FUNCTION_PASS("dse", DSEPass())
	FUNCTION_PASS("early-cse", EarlyCSEPass())			FUNCTION_PASS("early-cse", EarlyCSEPass(/UseMemorySSA=/false))
				FUNCTION_PASS("early-cse-memssa", EarlyCSEPass(/UseMemorySSA=/true))
	FUNCTION_PASS("gvn-hoist", GVNHoistPass())			FUNCTION_PASS("gvn-hoist", GVNHoistPass())
	FUNCTION_PASS("instcombine", InstCombinePass())			FUNCTION_PASS("instcombine", InstCombinePass())
	FUNCTION_PASS("instsimplify", InstSimplifierPass())			FUNCTION_PASS("instsimplify", InstSimplifierPass())
	FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())			FUNCTION_PASS("invalidate<all>", InvalidateAllAnalysesPass())
	FUNCTION_PASS("float2int", Float2IntPass())			FUNCTION_PASS("float2int", Float2IntPass())
	FUNCTION_PASS("no-op-function", NoOpFunctionPass())			FUNCTION_PASS("no-op-function", NoOpFunctionPass())
	FUNCTION_PASS("loweratomic", LowerAtomicPass())			FUNCTION_PASS("loweratomic", LowerAtomicPass())
	FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())			FUNCTION_PASS("lower-expect", LowerExpectIntrinsicPass())
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

lib/Transforms/Scalar/EarlyCSE.cpp

Show All 26 Lines
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/RecyclingAllocator.h"		#include "llvm/Support/RecyclingAllocator.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
		#include "llvm/Transforms/Utils/MemorySSA.h"
#include <deque>		#include <deque>
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "early-cse"		#define DEBUG_TYPE "early-cse"

STATISTIC(NumSimplify, "Number of instructions simplified or DCE'd");		STATISTIC(NumSimplify, "Number of instructions simplified or DCE'd");
STATISTIC(NumCSE, "Number of instructions CSE'd");		STATISTIC(NumCSE, "Number of instructions CSE'd");
▲ Show 20 Lines • Show All 203 Lines • ▼ Show 20 Lines
/// cases so that instcombine and other passes are more effective. It is		/// cases so that instcombine and other passes are more effective. It is
/// expected that a later pass of GVN will catch the interesting/hard cases.		/// expected that a later pass of GVN will catch the interesting/hard cases.
class EarlyCSE {		class EarlyCSE {
public:		public:
const TargetLibraryInfo &TLI;		const TargetLibraryInfo &TLI;
const TargetTransformInfo &TTI;		const TargetTransformInfo &TTI;
DominatorTree &DT;		DominatorTree &DT;
AssumptionCache &AC;		AssumptionCache &AC;
		MemorySSA *MSSA;
typedef RecyclingAllocator<		typedef RecyclingAllocator<
BumpPtrAllocator, ScopedHashTableVal<SimpleValue, Value *>> AllocatorTy;		BumpPtrAllocator, ScopedHashTableVal<SimpleValue, Value *>> AllocatorTy;
typedef ScopedHashTable<SimpleValue, Value *, DenseMapInfo<SimpleValue>,		typedef ScopedHashTable<SimpleValue, Value *, DenseMapInfo<SimpleValue>,
AllocatorTy> ScopedHTType;		AllocatorTy> ScopedHTType;

/// \brief A scoped hash table of the current values of all of our simple		/// \brief A scoped hash table of the current values of all of our simple
/// scalar expressions.		/// scalar expressions.
///		///
Show All 36 Lines	public:
typedef ScopedHashTable<Value , LoadValue, DenseMapInfo<Value >,		typedef ScopedHashTable<Value , LoadValue, DenseMapInfo<Value >,
LoadMapAllocator> LoadHTType;		LoadMapAllocator> LoadHTType;
LoadHTType AvailableLoads;		LoadHTType AvailableLoads;

/// \brief A scoped hash table of the current values of read-only call		/// \brief A scoped hash table of the current values of read-only call
/// values.		/// values.
///		///
/// It uses the same generation count as loads.		/// It uses the same generation count as loads.
typedef ScopedHashTable<CallValue, std::pair<Instruction *, unsigned>>		typedef ScopedHashTable<CallValue, std::pair<Instruction *, unsigned>>
CallHTType;		CallHTType;
		mcrosierUnsubmitted Not Done Reply Inline Actions Can this be committed as a separate change? mcrosier: Can this be committed as a separate change?
CallHTType AvailableCalls;		CallHTType AvailableCalls;

/// \brief This is the current generation of the memory value.		/// \brief This is the current generation of the memory value.
unsigned CurrentGeneration;		unsigned CurrentGeneration;

/// \brief Set up the EarlyCSE runner for a particular function.		/// \brief Set up the EarlyCSE runner for a particular function.
EarlyCSE(const TargetLibraryInfo &TLI, const TargetTransformInfo &TTI,		EarlyCSE(const TargetLibraryInfo &TLI, const TargetTransformInfo &TTI,
DominatorTree &DT, AssumptionCache &AC)		DominatorTree &DT, AssumptionCache &AC, MemorySSA *MSSA)
: TLI(TLI), TTI(TTI), DT(DT), AC(AC), CurrentGeneration(0) {}		: TLI(TLI), TTI(TTI), DT(DT), AC(AC), MSSA(MSSA), CurrentGeneration(0) {}

bool run();		bool run();

private:		private:
// Almost a POD, but needs to call the constructors for the scoped hash		// Almost a POD, but needs to call the constructors for the scoped hash
// tables so that a new scope gets pushed on. These are RAII so that the		// tables so that a new scope gets pushed on. These are RAII so that the
// scope gets popped when the NodeScope is destroyed.		// scope gets popped when the NodeScope is destroyed.
class NodeScope {		class NodeScope {
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	Value getOrCreateResult(Value Inst, Type *ExpectedType) const {
if (LoadInst *LI = dyn_cast<LoadInst>(Inst))		if (LoadInst *LI = dyn_cast<LoadInst>(Inst))
return LI;		return LI;
else if (StoreInst *SI = dyn_cast<StoreInst>(Inst))		else if (StoreInst *SI = dyn_cast<StoreInst>(Inst))
return SI->getValueOperand();		return SI->getValueOperand();
assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");		assert(isa<IntrinsicInst>(Inst) && "Instruction not supported");
return TTI.getOrCreateResultFromMemIntrinsic(cast<IntrinsicInst>(Inst),		return TTI.getOrCreateResultFromMemIntrinsic(cast<IntrinsicInst>(Inst),
ExpectedType);		ExpectedType);
}		}

		bool isSameMemGeneration(unsigned EarlierGeneration, unsigned LaterGeneration,
		Instruction EarlierInst, Instruction LaterInst);

		void removeMSSA(Instruction *Inst) {
		if (!MSSA)
		return;
		if (MemoryAccess *MA = MSSA->getMemoryAccess(Inst))
		MSSA->removeMemoryAccess(MA);
		}
};		};
}		}

		/// Determine if the memory referenced by LaterInst is from the same heap version
		george.burgess.ivUnsubmitted Done Reply Inline Actions Nit: Please use /// george.burgess.iv: Nit: Please use ///
		/// as EarlierInst.
		/// This is currently called in two scenarios:
		///
		/// load p
		reamesUnsubmitted Not Done Reply Inline Actions Huh? This should be handled entirely inside MemorySSA? reames: Huh? This should be handled entirely inside MemorySSA?
		/// ...
		/// load p
		///
		/// and
		///
		/// x = load p
		/// ...
		/// store x, p
		///
		/// in both cases we want to verify that there are no possible writes to the
		/// memory referenced by p between the earlier and later instruction.
		bool EarlyCSE::isSameMemGeneration(unsigned EarlierGeneration,
		unsigned LaterGeneration,
		Instruction *EarlierInst,
		Instruction *LaterInst) {
		// Check the simple memory generation tracking first.
		if (EarlierGeneration == LaterGeneration)
		return true;

		if (!MSSA)
		return false;

		// Since we know LaterDef dominates LaterInst and EarlierInst dominates
		// LaterInst, if LaterDef dominates EarlierInst then it can't occur between
		// EarlierInst and LaterInst and neither can any other write that potentially
		// clobbers LaterInst.
		MemoryAccess *LaterDef =
		MSSA->getWalker()->getClobberingMemoryAccess(LaterInst);
		dberlinUnsubmitted Not Done Reply Inline Actions For loads, you don't have to ask for the clobbering access. It's already optimized such that getDefiningAccess == the clobbering access For stores, not sure if you realize this, but given store q (lets's call this a) x = load p store q (let's call this b) if you call getClobberingMemoryAccess on b, it will return a. dberlin: 1. For loads, you don't have to ask for the clobbering access. It's already optimized such that…
		gberryAuthorUnsubmitted Not Done Reply Inline Actions For 1., I was not clear on whether this holds true after store removal. For 2., yeah I get this, I'm not sure what you're getting at though. The removal of this second store by EarlyCSE doesn't use MemorySSA to check for intervening loads in this change. It uses the 'LastStore' tracking to know when a store made redundant by a second store can be removed. gberry: For 1., I was not clear on whether this holds true after store removal. For 2., yeah I get…
		dberlinUnsubmitted Not Done Reply Inline Actions Updates have to make it hold after store removal :) The problem is that if we don't keep this invariant up to date, it means everyone uses getClobberingAccess, which does a bunch of work to discover the load already points to the same thing. Everyone doing that is much higher than the cost of one person updating the dominating def. (there is one case were getClobberingAccess will give you a better answer, and that is on cases where we gave up during use optimization. I only have one testcase this occurs on. We only give up on optimizing a load if it's going to be super expensive, and you probably do not want to try to get better answers in that case). As for updating when you remove stores, you should simply be able to replace any loads the store uses with getClobberingAccess(store) using RAUW. Under the covers, removeMemoryAccess calls RAUW with the DefiningAccess. We could change it to use getClobberingMemoryAccess for loads, and DefiningAccess for stores. ah, okay. dberlin: 1. Updates have to make it hold after store removal :) The problem is that if we don't keep…
		gberryAuthorUnsubmitted Not Done Reply Inline Actions Okay, I get why just checking the defining access for loads is better (we get to skip the AA check). For stores, we may be able to do something faster than calling getClobberingAccess(store). We could instead walk up the store defining access chain and stop if we get to a point that dominates the earlier load or a clobbering write. I'll have to time this to see if it makes a difference. I guess it will depend on what percentage of the time the clobber cache has been thrown away. As for updating when removing stores: it seems like doing RAUW getClobberingAccess(store) is not optimal in some cases. For example: store @G1 ; 1 = MD(entry) store @G2 ; 2 = MD(1) store %p ; 3 = MD(2) load @G1 ; MU(3) load @G2 ; MU(3) If we remove 3 and RUAW getClobberingAccess(3) (=2) we get: store @G1 ; 1 = MD(entry) store @G2 ; 2 = MD(1) load @G1 ; MU(2) load @G2 ; MU(2) but the load @G1 would be more precise if it was MU(1) (and the invariant that defining access == clobbering access would be broken). Is this just a compile-time/precision trade-off? Maybe for that reason it makes more sense to let the client decide if they want to do the simple RAUW getClobberingAccess(Store) or optimize each use separately? gberry: Okay, I get why just checking the defining access for loads is better (we get to skip the AA…
		return MSSA->dominates(LaterDef, MSSA->getMemoryAccess(EarlierInst));
		}

bool EarlyCSE::processNode(DomTreeNode *Node) {		bool EarlyCSE::processNode(DomTreeNode *Node) {
bool Changed = false;		bool Changed = false;
BasicBlock *BB = Node->getBlock();		BasicBlock *BB = Node->getBlock();

// If this block has a single predecessor, then the predecessor is the parent		// If this block has a single predecessor, then the predecessor is the parent
// of the domtree node and all of the live out memory values are still current		// of the domtree node and all of the live out memory values are still current
// in this block. If this block has multiple predecessors, then they could		// in this block. If this block has multiple predecessors, then they could
// have invalidated the live-out memory values of our parent value. For now,		// have invalidated the live-out memory values of our parent value. For now,
// just be conservative and invalidate memory if this block has multiple		// just be conservative and invalidate memory if this block has multiple
// predecessors.		// predecessors.
if (!BB->getSinglePredecessor())		if (!BB->getSinglePredecessor())
++CurrentGeneration;		++CurrentGeneration;

// If this node has a single predecessor which ends in a conditional branch,		// If this node has a single predecessor which ends in a conditional branch,
// we can infer the value of the branch condition given that we took this		// we can infer the value of the branch condition given that we took this
// path. We need the single predecessor to ensure there's not another path		// path. We need the single predecessor to ensure there's not another path
// which reaches this block where the condition might hold a different		// which reaches this block where the condition might hold a different
// value. Since we're adding this to the scoped hash table (like any other		// value. Since we're adding this to the scoped hash table (like any other
// def), it will have been popped if we encounter a future merge block.		// def), it will have been popped if we encounter a future merge block.
if (BasicBlock *Pred = BB->getSinglePredecessor())		if (BasicBlock *Pred = BB->getSinglePredecessor())
		reamesUnsubmitted Not Done Reply Inline Actions Having two sets of variables, one integers, one pointers with similar names is highly confusing. I'd suggest pulling out a MemorySSA specific impl function and calling it from here to wrap the desired asserts. reames: Having two sets of variables, one integers, one pointers with similar names is highly…
if (auto *BI = dyn_cast<BranchInst>(Pred->getTerminator()))		if (auto *BI = dyn_cast<BranchInst>(Pred->getTerminator()))
if (BI->isConditional())		if (BI->isConditional())
if (auto *CondInst = dyn_cast<Instruction>(BI->getCondition()))		if (auto *CondInst = dyn_cast<Instruction>(BI->getCondition()))
if (SimpleValue::canHandle(CondInst)) {		if (SimpleValue::canHandle(CondInst)) {
assert(BI->getSuccessor(0) == BB \|\| BI->getSuccessor(1) == BB);		assert(BI->getSuccessor(0) == BB \|\| BI->getSuccessor(1) == BB);
auto *ConditionalConstant = (BI->getSuccessor(0) == BB) ?		auto *ConditionalConstant = (BI->getSuccessor(0) == BB) ?
ConstantInt::getTrue(BB->getContext()) :		ConstantInt::getTrue(BB->getContext()) :
ConstantInt::getFalse(BB->getContext());		ConstantInt::getFalse(BB->getContext());
		mcrosierUnsubmitted Not Done Reply Inline Actions Please capitalize handle and add a period. mcrosier: Please capitalize handle and add a period.
		reamesUnsubmitted Not Done Reply Inline Actions This comment doesn't make sense where placed? reames: This comment doesn't make sense where placed?
AvailableValues.insert(CondInst, ConditionalConstant);		AvailableValues.insert(CondInst, ConditionalConstant);
DEBUG(dbgs() << "EarlyCSE CVP: Add conditional value for '"		DEBUG(dbgs() << "EarlyCSE CVP: Add conditional value for '"
<< CondInst->getName() << "' as " << *ConditionalConstant		<< CondInst->getName() << "' as " << *ConditionalConstant
<< " in " << BB->getName() << "\n");		<< " in " << BB->getName() << "\n");
// Replace all dominated uses with the known value.		// Replace all dominated uses with the known value.
if (unsigned Count =		if (unsigned Count =
replaceDominatedUsesWith(CondInst, ConditionalConstant, DT,		replaceDominatedUsesWith(CondInst, ConditionalConstant, DT,
BasicBlockEdge(Pred, BB))) {		BasicBlockEdge(Pred, BB))) {
		george.burgess.ivUnsubmitted Done Reply Inline Actions Nit: `return EarlierHeapGen == LaterHeapGen`? george.burgess.iv: Nit: `return EarlierHeapGen == LaterHeapGen`?
Changed = true;		Changed = true;
NumCSECVP = NumCSECVP + Count;		NumCSECVP = NumCSECVP + Count;
}		}
		mcrosierUnsubmitted Not Done Reply Inline Actions Please add a descriptive && "Error message!". mcrosier: Please add a descriptive && "Error message!".
}		}

/// LastStore - Keep track of the last non-volatile store that we saw... for		/// LastStore - Keep track of the last non-volatile store that we saw... for
/// as long as there in no instruction that reads memory. If we see a store		/// as long as there in no instruction that reads memory. If we see a store
/// to the same location, we delete the dead store. This zaps trivial dead		/// to the same location, we delete the dead store. This zaps trivial dead
/// stores which can occur in bitfield code among other things.		/// stores which can occur in bitfield code among other things.
Instruction *LastStore = nullptr;		Instruction *LastStore = nullptr;

const DataLayout &DL = BB->getModule()->getDataLayout();		const DataLayout &DL = BB->getModule()->getDataLayout();

// See if any instructions in the block can be eliminated. If so, do it. If		// See if any instructions in the block can be eliminated. If so, do it. If
// not, add them to AvailableValues.		// not, add them to AvailableValues.
for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {		for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {
Instruction Inst = &I++;		Instruction Inst = &I++;

// Dead instructions should just be removed.		// Dead instructions should just be removed.
if (isInstructionTriviallyDead(Inst, &TLI)) {		if (isInstructionTriviallyDead(Inst, &TLI)) {
DEBUG(dbgs() << "EarlyCSE DCE: " << *Inst << '\n');		DEBUG(dbgs() << "EarlyCSE DCE: " << *Inst << '\n');
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumSimplify;		++NumSimplify;
continue;		continue;
}		}

// Skip assume intrinsics, they don't really have side effects (although		// Skip assume intrinsics, they don't really have side effects (although
// they're marked as such to ensure preservation of control dependencies),		// they're marked as such to ensure preservation of control dependencies),
Show All 35 Lines	for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {

// If the instruction can be simplified (e.g. X+0 = X) then replace it with		// If the instruction can be simplified (e.g. X+0 = X) then replace it with
// its simpler value.		// its simpler value.
if (Value *V = SimplifyInstruction(Inst, DL, &TLI, &DT, &AC)) {		if (Value *V = SimplifyInstruction(Inst, DL, &TLI, &DT, &AC)) {
DEBUG(dbgs() << "EarlyCSE Simplify: " << Inst << " to: " << V << '\n');		DEBUG(dbgs() << "EarlyCSE Simplify: " << Inst << " to: " << V << '\n');
bool Killed = false;		bool Killed = false;
if (!Inst->use_empty()) {		if (!Inst->use_empty()) {
Inst->replaceAllUsesWith(V);		Inst->replaceAllUsesWith(V);
Changed = true;		Changed = true;
		reamesUnsubmitted Not Done Reply Inline Actions Code like this strongly hints that MemorySSA should be using ValueHandles. reames: Code like this strongly hints that MemorySSA should be using ValueHandles.
}		}
if (isInstructionTriviallyDead(Inst, &TLI)) {		if (isInstructionTriviallyDead(Inst, &TLI)) {
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
Killed = true;		Killed = true;
}		}
if (Changed)		if (Changed)
++NumSimplify;		++NumSimplify;
if (Killed)		if (Killed)
continue;		continue;
}		}

// If this is a simple instruction that we can value number, process it.		// If this is a simple instruction that we can value number, process it.
if (SimpleValue::canHandle(Inst)) {		if (SimpleValue::canHandle(Inst)) {
// See if the instruction has an available value. If so, use it.		// See if the instruction has an available value. If so, use it.
if (Value *V = AvailableValues.lookup(Inst)) {		if (Value *V = AvailableValues.lookup(Inst)) {
DEBUG(dbgs() << "EarlyCSE CSE: " << Inst << " to: " << V << '\n');		DEBUG(dbgs() << "EarlyCSE CSE: " << Inst << " to: " << V << '\n');
if (auto *I = dyn_cast<Instruction>(V))		if (auto *I = dyn_cast<Instruction>(V))
I->andIRFlags(Inst);		I->andIRFlags(Inst);
Inst->replaceAllUsesWith(V);		Inst->replaceAllUsesWith(V);
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumCSE;		++NumCSE;
continue;		continue;
}		}

// Otherwise, just remember that this value is available.		// Otherwise, just remember that this value is available.
AvailableValues.insert(Inst, Inst);		AvailableValues.insert(Inst, Inst);
Show All 15 Lines	if (MemInst.isValid() && MemInst.isLoad()) {
// replace this instruction.		// replace this instruction.
//		//
// A dominating invariant load implies that the location loaded from is		// A dominating invariant load implies that the location loaded from is
// unchanging beginning at the point of the invariant load, so the load		// unchanging beginning at the point of the invariant load, so the load
// we're CSE'ing _away_ does not need to be invariant, only the available		// we're CSE'ing _away_ does not need to be invariant, only the available
// load we're CSE'ing _to_ does.		// load we're CSE'ing _to_ does.
LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());		LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());
if (InVal.DefInst != nullptr &&		if (InVal.DefInst != nullptr &&
(InVal.Generation == CurrentGeneration \|\| InVal.IsInvariant) &&
InVal.MatchingId == MemInst.getMatchingId() &&		InVal.MatchingId == MemInst.getMatchingId() &&
// We don't yet handle removing loads with ordering of any kind.		// We don't yet handle removing loads with ordering of any kind.
!MemInst.isVolatile() && MemInst.isUnordered() &&		!MemInst.isVolatile() && MemInst.isUnordered() &&
// We can't replace an atomic load with one which isn't also atomic.		// We can't replace an atomic load with one which isn't also atomic.
InVal.IsAtomic >= MemInst.isAtomic()) {		InVal.IsAtomic >= MemInst.isAtomic() &&
		(InVal.IsInvariant \|\|
		isSameMemGeneration(InVal.Generation, CurrentGeneration,
		InVal.DefInst, Inst))) {
Value *Op = getOrCreateResult(InVal.DefInst, Inst->getType());		Value *Op = getOrCreateResult(InVal.DefInst, Inst->getType());
if (Op != nullptr) {		if (Op != nullptr) {
DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst		DEBUG(dbgs() << "EarlyCSE CSE LOAD: " << *Inst
<< " to: " << *InVal.DefInst << '\n');		<< " to: " << *InVal.DefInst << '\n');
if (!Inst->use_empty())		if (!Inst->use_empty())
Inst->replaceAllUsesWith(Op);		Inst->replaceAllUsesWith(Op);
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumCSELoad;		++NumCSELoad;
continue;		continue;
}		}
}		}

// Otherwise, remember that we have this instruction.		// Otherwise, remember that we have this instruction.
Show All 14 Lines	if (Inst->mayReadFromMemory() &&
!(MemInst.isValid() && !MemInst.mayReadFromMemory()))		!(MemInst.isValid() && !MemInst.mayReadFromMemory()))
LastStore = nullptr;		LastStore = nullptr;

// If this is a read-only call, process it.		// If this is a read-only call, process it.
if (CallValue::canHandle(Inst)) {		if (CallValue::canHandle(Inst)) {
// If we have an available version of this call, and if it is the right		// If we have an available version of this call, and if it is the right
// generation, replace this instruction.		// generation, replace this instruction.
std::pair<Instruction *, unsigned> InVal = AvailableCalls.lookup(Inst);		std::pair<Instruction *, unsigned> InVal = AvailableCalls.lookup(Inst);
if (InVal.first != nullptr && InVal.second == CurrentGeneration) {		if (InVal.first != nullptr &&
		isSameMemGeneration(InVal.second, CurrentGeneration, InVal.first,
		Inst)) {
		mcrosierUnsubmitted Not Done Reply Inline Actions Separate commit. mcrosier: Separate commit.
DEBUG(dbgs() << "EarlyCSE CSE CALL: " << *Inst		DEBUG(dbgs() << "EarlyCSE CSE CALL: " << *Inst
<< " to: " << *InVal.first << '\n');		<< " to: " << *InVal.first << '\n');
if (!Inst->use_empty())		if (!Inst->use_empty())
Inst->replaceAllUsesWith(InVal.first);		Inst->replaceAllUsesWith(InVal.first);
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumCSECall;		++NumCSECall;
continue;		continue;
}		}

// Otherwise, remember that we have this instruction.		// Otherwise, remember that we have this instruction.
AvailableCalls.insert(		AvailableCalls.insert(
Inst, std::pair<Instruction *, unsigned>(Inst, CurrentGeneration));		Inst, std::pair<Instruction *, unsigned>(Inst, CurrentGeneration));
		mcrosierUnsubmitted Not Done Reply Inline Actions Separate commit. Maybe typdef? mcrosier: Separate commit. Maybe typdef?
continue;		continue;
}		}

// A release fence requires that all stores complete before it, but does		// A release fence requires that all stores complete before it, but does
// not prevent the reordering of following loads 'before' the fence. As a		// not prevent the reordering of following loads 'before' the fence. As a
// result, we don't need to consider it as writing to memory and don't need		// result, we don't need to consider it as writing to memory and don't need
// to advance the generation. We do need to prevent DSE across the fence,		// to advance the generation. We do need to prevent DSE across the fence,
// but that's handled above.		// but that's handled above.
if (FenceInst *FI = dyn_cast<FenceInst>(Inst))		if (FenceInst *FI = dyn_cast<FenceInst>(Inst))
if (FI->getOrdering() == AtomicOrdering::Release) {		if (FI->getOrdering() == AtomicOrdering::Release) {
assert(Inst->mayReadFromMemory() && "relied on to prevent DSE above");		assert(Inst->mayReadFromMemory() && "relied on to prevent DSE above");
continue;		continue;
}		}

// write back DSE - If we write back the same value we just loaded from		// write back DSE - If we write back the same value we just loaded from
// the same location and haven't passed any intervening writes or ordering		// the same location and haven't passed any intervening writes or ordering
// operations, we can remove the write. The primary benefit is in allowing		// operations, we can remove the write. The primary benefit is in allowing
// the available load table to remain valid and value forward past where		// the available load table to remain valid and value forward past where
// the store originally was.		// the store originally was.
if (MemInst.isValid() && MemInst.isStore()) {		if (MemInst.isValid() && MemInst.isStore()) {
LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());		LoadValue InVal = AvailableLoads.lookup(MemInst.getPointerOperand());
if (InVal.DefInst &&		if (InVal.DefInst &&
InVal.DefInst == getOrCreateResult(Inst, InVal.DefInst->getType()) &&		InVal.DefInst == getOrCreateResult(Inst, InVal.DefInst->getType()) &&
InVal.Generation == CurrentGeneration &&
InVal.MatchingId == MemInst.getMatchingId() &&		InVal.MatchingId == MemInst.getMatchingId() &&
// We don't yet handle removing stores with ordering of any kind.		// We don't yet handle removing stores with ordering of any kind.
!MemInst.isVolatile() && MemInst.isUnordered()) {		!MemInst.isVolatile() && MemInst.isUnordered() &&
		isSameMemGeneration(InVal.Generation, CurrentGeneration,
		InVal.DefInst, Inst)) {
		// It is okay to have a LastStore to a different pointer here if MemorySSA
		// tells us that the load and store are from the same memory generation.
		// In that case, LastStore should keep its present value since we're
		// removing the current store.
assert((!LastStore \|\|		assert((!LastStore \|\|
ParseMemoryInst(LastStore, TTI).getPointerOperand() ==		ParseMemoryInst(LastStore, TTI).getPointerOperand() ==
MemInst.getPointerOperand()) &&		MemInst.getPointerOperand() \|\|
"can't have an intervening store!");		MSSA) &&
		"can't have an intervening store if not using MemorySSA!");
DEBUG(dbgs() << "EarlyCSE DSE (writeback): " << *Inst << '\n');		DEBUG(dbgs() << "EarlyCSE DSE (writeback): " << *Inst << '\n');
		removeMSSA(Inst);
Inst->eraseFromParent();		Inst->eraseFromParent();
Changed = true;		Changed = true;
++NumDSE;		++NumDSE;
// We can avoid incrementing the generation count since we were able		// We can avoid incrementing the generation count since we were able
// to eliminate this store.		// to eliminate this store.
continue;		continue;
}		}
}		}
Show All 15 Lines	if (Inst->mayWriteToMemory()) {
if (LastStore) {		if (LastStore) {
ParseMemoryInst LastStoreMemInst(LastStore, TTI);		ParseMemoryInst LastStoreMemInst(LastStore, TTI);
assert(LastStoreMemInst.isUnordered() &&		assert(LastStoreMemInst.isUnordered() &&
!LastStoreMemInst.isVolatile() &&		!LastStoreMemInst.isVolatile() &&
"Violated invariant");		"Violated invariant");
if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {		if (LastStoreMemInst.isMatchingMemLoc(MemInst)) {
DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore		DEBUG(dbgs() << "EarlyCSE DEAD STORE: " << *LastStore
<< " due to: " << *Inst << '\n');		<< " due to: " << *Inst << '\n');
		removeMSSA(LastStore);
LastStore->eraseFromParent();		LastStore->eraseFromParent();
Changed = true;		Changed = true;
++NumDSE;		++NumDSE;
LastStore = nullptr;		LastStore = nullptr;
}		}
// fallthrough - we can exploit information about this store		// fallthrough - we can exploit information about this store
}		}

▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
}		}

PreservedAnalyses EarlyCSEPass::run(Function &F,		PreservedAnalyses EarlyCSEPass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
auto &TTI = AM.getResult<TargetIRAnalysis>(F);		auto &TTI = AM.getResult<TargetIRAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
		auto *MSSA =
		UseMemorySSA ? &AM.getResult<MemorySSAAnalysis>(F).getMSSA() : nullptr;

EarlyCSE CSE(TLI, TTI, DT, AC);		EarlyCSE CSE(TLI, TTI, DT, AC, MSSA);

if (!CSE.run())		if (!CSE.run())
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// CSE preserves the dominator tree because it doesn't mutate the CFG.		// CSE preserves the dominator tree because it doesn't mutate the CFG.
// FIXME: Bundle this with other CFG-preservation.		// FIXME: Bundle this with other CFG-preservation.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserve<DominatorTreeAnalysis>();		PA.preserve<DominatorTreeAnalysis>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
		if (UseMemorySSA)
		PA.preserve<MemorySSAAnalysis>();
return PA;		return PA;
}		}

namespace {		namespace {
/// \brief A simple and fast domtree-based CSE pass.		/// \brief A simple and fast domtree-based CSE pass.
///		///
/// This pass does a simple depth-first walk over the dominator tree,		/// This pass does a simple depth-first walk over the dominator tree,
/// eliminating trivially redundant instructions and using instsimplify to		/// eliminating trivially redundant instructions and using instsimplify to
/// canonicalize things as it goes. It is intended to be fast and catch obvious		/// canonicalize things as it goes. It is intended to be fast and catch obvious
/// cases so that instcombine and other passes are more effective. It is		/// cases so that instcombine and other passes are more effective. It is
/// expected that a later pass of GVN will catch the interesting/hard cases.		/// expected that a later pass of GVN will catch the interesting/hard cases.
class EarlyCSELegacyPass : public FunctionPass {		template<bool UseMemorySSA>
		class EarlyCSELegacyCommonPass : public FunctionPass {
public:		public:
static char ID;		static char ID;

EarlyCSELegacyPass() : FunctionPass(ID) {		EarlyCSELegacyCommonPass() : FunctionPass(ID) {
		if (UseMemorySSA)
		initializeEarlyCSEMemSSALegacyPassPass(*PassRegistry::getPassRegistry());
		else
initializeEarlyCSELegacyPassPass(*PassRegistry::getPassRegistry());		initializeEarlyCSELegacyPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
		auto *MSSA =
		UseMemorySSA ? &getAnalysis<MemorySSAWrapperPass>().getMSSA() : nullptr;

EarlyCSE CSE(TLI, TTI, DT, AC);		EarlyCSE CSE(TLI, TTI, DT, AC, MSSA);

return CSE.run();		return CSE.run();
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
		if (UseMemorySSA) {
		AU.addRequired<MemorySSAWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
		}
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
AU.setPreservesCFG();		AU.setPreservesCFG();
}		}
};		};
}		}

char EarlyCSELegacyPass::ID = 0;		using EarlyCSELegacyPass = EarlyCSELegacyCommonPass</UseMemorySSA=/false>;

FunctionPass *llvm::createEarlyCSEPass() { return new EarlyCSELegacyPass(); }		template<>
		char EarlyCSELegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(EarlyCSELegacyPass, "early-cse", "Early CSE", false,		INITIALIZE_PASS_BEGIN(EarlyCSELegacyPass, "early-cse", "Early CSE", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(EarlyCSELegacyPass, "early-cse", "Early CSE", false, false)		INITIALIZE_PASS_END(EarlyCSELegacyPass, "early-cse", "Early CSE", false, false)

		using EarlyCSEMemSSALegacyPass =
		EarlyCSELegacyCommonPass</UseMemorySSA=/true>;

		template<>
		char EarlyCSEMemSSALegacyPass::ID = 0;

		FunctionPass *llvm::createEarlyCSEPass(bool UseMemorySSA) {
		if (UseMemorySSA)
		return new EarlyCSEMemSSALegacyPass();
		else
		return new EarlyCSELegacyPass();
		}

		INITIALIZE_PASS_BEGIN(EarlyCSEMemSSALegacyPass, "early-cse-memssa",
		"Early CSE w/ MemorySSA", false, false)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
		INITIALIZE_PASS_END(EarlyCSEMemSSALegacyPass, "early-cse-memssa",
		"Early CSE w/ MemorySSA", false, false)

lib/Transforms/Scalar/Scalar.cpp

Show All 38 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeCorrelatedValuePropagationPass(Registry);		initializeCorrelatedValuePropagationPass(Registry);
initializeDCELegacyPassPass(Registry);		initializeDCELegacyPassPass(Registry);
initializeDeadInstEliminationPass(Registry);		initializeDeadInstEliminationPass(Registry);
initializeScalarizerPass(Registry);		initializeScalarizerPass(Registry);
initializeDSELegacyPassPass(Registry);		initializeDSELegacyPassPass(Registry);
initializeGuardWideningLegacyPassPass(Registry);		initializeGuardWideningLegacyPassPass(Registry);
initializeGVNLegacyPassPass(Registry);		initializeGVNLegacyPassPass(Registry);
initializeEarlyCSELegacyPassPass(Registry);		initializeEarlyCSELegacyPassPass(Registry);
		initializeEarlyCSEMemSSALegacyPassPass(Registry);
initializeGVNHoistLegacyPassPass(Registry);		initializeGVNHoistLegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyLegacyPassPass(Registry);		initializeIndVarSimplifyLegacyPassPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLegacyLICMPassPass(Registry);		initializeLegacyLICMPassPass(Registry);
initializeLoopDataPrefetchLegacyPassPass(Registry);		initializeLoopDataPrefetchLegacyPassPass(Registry);
initializeLoopDeletionLegacyPassPass(Registry);		initializeLoopDeletionLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
void LLVMAddVerifierPass(LLVMPassManagerRef PM) {		void LLVMAddVerifierPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createVerifierPass());		unwrap(PM)->add(createVerifierPass());
}		}

void LLVMAddCorrelatedValuePropagationPass(LLVMPassManagerRef PM) {		void LLVMAddCorrelatedValuePropagationPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createCorrelatedValuePropagationPass());		unwrap(PM)->add(createCorrelatedValuePropagationPass());
}		}

void LLVMAddEarlyCSEPass(LLVMPassManagerRef PM) {		void LLVMAddEarlyCSEPass(LLVMPassManagerRef PM, int UseMemorySSA) {
unwrap(PM)->add(createEarlyCSEPass());		unwrap(PM)->add(createEarlyCSEPass(UseMemorySSA));
}		}

void LLVMAddGVNHoistLegacyPass(LLVMPassManagerRef PM) {		void LLVMAddGVNHoistLegacyPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createGVNHoistPass());		unwrap(PM)->add(createGVNHoistPass());
}		}

void LLVMAddTypeBasedAliasAnalysisPass(LLVMPassManagerRef PM) {		void LLVMAddTypeBasedAliasAnalysisPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createTypeBasedAAWrapperPass());		unwrap(PM)->add(createTypeBasedAAWrapperPass());
Show All 13 Lines

test/Transforms/EarlyCSE/AArch64/intrinsics.ll

	; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -early-cse \| FileCheck %s			; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -early-cse \| FileCheck %s
				; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -basicaa -early-cse-memssa \| FileCheck %s
	; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -passes=early-cse \| FileCheck %s			; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -passes=early-cse \| FileCheck %s
				; RUN: opt < %s -S -mtriple=aarch64-none-linux-gnu -mattr=+neon -aa-pipeline=basic-aa -passes=early-cse-memssa \| FileCheck %s

	define <4 x i32> @test_cse(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {			define <4 x i32> @test_cse(i32* %a, [2 x <4 x i32>] %s.coerce, i32 %n) {
	entry:			entry:
	; Check that @llvm.aarch64.neon.ld2 is optimized away by Early CSE.			; Check that @llvm.aarch64.neon.ld2 is optimized away by Early CSE.
	; CHECK-LABEL: @test_cse			; CHECK-LABEL: @test_cse
	; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8			; CHECK-NOT: call { <4 x i32>, <4 x i32> } @llvm.aarch64.neon.ld2.v4i32.p0i8
	%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0			%s.coerce.fca.0.extract = extractvalue [2 x <4 x i32>] %s.coerce, 0
	%s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1			%s.coerce.fca.1.extract = extractvalue [2 x <4 x i32>] %s.coerce, 1
	▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/AArch64/ldstN.ll

	; RUN: opt -S -early-cse < %s \| FileCheck %s			; RUN: opt -S -early-cse < %s \| FileCheck %s
				; RUN: opt -S -basicaa -early-cse-memssa < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	declare { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld4.v4i16.p0v4i16(<4 x i16>*)			declare { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @llvm.aarch64.neon.ld4.v4i16.p0v4i16(<4 x i16>*)

	; Although the store and the ld4 are using the same pointer, the			; Although the store and the ld4 are using the same pointer, the
	; data can not be reused because ld4 accesses multiple elements.			; data can not be reused because ld4 accesses multiple elements.
	define { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @foo() {			define { <4 x i16>, <4 x i16>, <4 x i16>, <4 x i16> } @foo() {
	Show All 9 Lines

test/Transforms/EarlyCSE/atomics.ll

	; RUN: opt < %s -S -early-cse \| FileCheck %s			; RUN: opt < %s -S -early-cse \| FileCheck %s
				; RUN: opt < %s -S -basicaa -early-cse-memssa \| FileCheck %s

	; CHECK-LABEL: @test12(			; CHECK-LABEL: @test12(
	define i32 @test12(i1 %B, i32* %P1, i32* %P2) {			define i32 @test12(i1 %B, i32* %P1, i32* %P2) {
	%load0 = load i32, i32* %P1			%load0 = load i32, i32* %P1
	%1 = load atomic i32, i32* %P2 seq_cst, align 4			%1 = load atomic i32, i32* %P2 seq_cst, align 4
	%load1 = load i32, i32* %P1			%load1 = load i32, i32* %P1
	%sel = select i1 %B, i32 %load0, i32 %load1			%sel = select i1 %B, i32 %load0, i32 %load1
	ret i32 %sel			ret i32 %sel
	▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/basic.ll

	; RUN: opt < %s -S -early-cse \| FileCheck %s			; RUN: opt < %s -S -early-cse \| FileCheck %s
				; RUN: opt < %s -S -basicaa -early-cse-memssa \| FileCheck %s
	; RUN: opt < %s -S -passes=early-cse \| FileCheck %s			; RUN: opt < %s -S -passes=early-cse \| FileCheck %s

	declare void @llvm.assume(i1) nounwind			declare void @llvm.assume(i1) nounwind

	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	define void @test1(i8 %V, i32 *%P) {			define void @test1(i8 %V, i32 *%P) {
	%A = bitcast i64 42 to double ;; dead			%A = bitcast i64 42 to double ;; dead
	%B = add i32 4, 19 ;; constant folds			%B = add i32 4, 19 ;; constant folds
	▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/commute.ll

	; RUN: opt < %s -S -early-cse \| FileCheck %s			; RUN: opt < %s -S -early-cse \| FileCheck %s
				; RUN: opt < %s -S -basicaa -early-cse-memssa \| FileCheck %s

	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	define void @test1(float %A, float %B, float* %PA, float* %PB) {			define void @test1(float %A, float %B, float* %PA, float* %PB) {
	; CHECK-NEXT: fadd			; CHECK-NEXT: fadd
	; CHECK-NEXT: store			; CHECK-NEXT: store
	; CHECK-NEXT: store			; CHECK-NEXT: store
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%C = fadd float %A, %B			%C = fadd float %A, %B
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/conditional.ll

	; RUN: opt -early-cse -S < %s \| FileCheck %s			; RUN: opt -early-cse -S < %s \| FileCheck %s
				; RUN: opt -basicaa -early-cse-memssa -S < %s \| FileCheck %s

	; Can we CSE a known condition to a constant?			; Can we CSE a known condition to a constant?
	define i1 @test(i8* %p) {			define i1 @test(i8* %p) {
	; CHECK-LABEL: @test			; CHECK-LABEL: @test
	entry:			entry:
	%cnd1 = icmp eq i8* %p, null			%cnd1 = icmp eq i8* %p, null
	br i1 %cnd1, label %taken, label %untaken			br i1 %cnd1, label %taken, label %untaken

	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/edge.ll

	; RUN: opt -early-cse -S < %s \| FileCheck %s			; RUN: opt -early-cse -S < %s \| FileCheck %s
				; RUN: opt -basicaa -early-cse-memssa -S < %s \| FileCheck %s
	; Same as GVN/edge.ll, but updated to reflect EarlyCSE's less powerful			; Same as GVN/edge.ll, but updated to reflect EarlyCSE's less powerful
	; implementation. EarlyCSE currently doesn't exploit equality comparisons			; implementation. EarlyCSE currently doesn't exploit equality comparisons
	; against constants.			; against constants.

	define i32 @f1(i32 %x) {			define i32 @f1(i32 %x) {
	; CHECK-LABEL: define i32 @f1(			; CHECK-LABEL: define i32 @f1(
	bb0:			bb0:
	%cmp = icmp eq i32 %x, 0			%cmp = icmp eq i32 %x, 0
	▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/fence.ll

	; RUN: opt -S -early-cse < %s \| FileCheck %s			; RUN: opt -S -early-cse < %s \| FileCheck %s
				; RUN: opt < %s -S -basicaa -early-cse-memssa \| FileCheck %s
	; NOTE: This file is testing the current implementation. Some of			; NOTE: This file is testing the current implementation. Some of
	; the transforms used as negative tests below would be legal, but			; the transforms used as negative tests below would be legal, but
	; only if reached through a chain of logic which EarlyCSE is incapable			; only if reached through a chain of logic which EarlyCSE is incapable
	; of performing. To say it differently, this file tests a conservative			; of performing. To say it differently, this file tests a conservative
	; version of the memory model. If we want to extend EarlyCSE to be more			; version of the memory model. If we want to extend EarlyCSE to be more
	; aggressive in the future, we may need to relax some of the negative tests.			; aggressive in the future, we may need to relax some of the negative tests.

	; We can value forward across the fence since we can (semantically)			; We can value forward across the fence since we can (semantically)
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/flags.ll

	; RUN: opt -early-cse -S < %s \| FileCheck %s			; RUN: opt -early-cse -S < %s \| FileCheck %s
				; RUN: opt -basicaa -early-cse-memssa -S < %s \| FileCheck %s

	declare void @use(i1)			declare void @use(i1)

	define void @test1(float %x, float %y) {			define void @test1(float %x, float %y) {
	entry:			entry:
	%cmp1 = fcmp nnan oeq float %y, %x			%cmp1 = fcmp nnan oeq float %y, %x
	%cmp2 = fcmp oeq float %x, %y			%cmp2 = fcmp oeq float %x, %y
	call void @use(i1 %cmp1)			call void @use(i1 %cmp1)
	Show All 9 Lines

test/Transforms/EarlyCSE/floatingpoint.ll

	; RUN: opt < %s -S -early-cse \| FileCheck %s			; RUN: opt < %s -S -early-cse \| FileCheck %s
				; RUN: opt < %s -S -basicaa -early-cse-memssa \| FileCheck %s

	; Ensure we don't simplify away additions vectors of +0.0's (same as scalars).			; Ensure we don't simplify away additions vectors of +0.0's (same as scalars).
	define <4 x float> @fV( <4 x float> %a) {			define <4 x float> @fV( <4 x float> %a) {
	; CHECK: %b = fadd <4 x float> %a, zeroinitializer			; CHECK: %b = fadd <4 x float> %a, zeroinitializer
	%b = fadd <4 x float> %a, <float 0.0,float 0.0,float 0.0,float 0.0>			%b = fadd <4 x float> %a, <float 0.0,float 0.0,float 0.0,float 0.0>
	ret <4 x float> %b			ret <4 x float> %b
	}			}

	define <4 x float> @fW( <4 x float> %a) {			define <4 x float> @fW( <4 x float> %a) {
	; CHECK: ret <4 x float> %a			; CHECK: ret <4 x float> %a
	%b = fadd <4 x float> %a, <float -0.0,float -0.0,float -0.0,float -0.0>			%b = fadd <4 x float> %a, <float -0.0,float -0.0,float -0.0,float -0.0>
	ret <4 x float> %b			ret <4 x float> %b
	}			}

test/Transforms/EarlyCSE/guards.ll

	; RUN: opt -S -early-cse < %s \| FileCheck %s			; RUN: opt -S -early-cse < %s \| FileCheck %s
				; RUN: opt < %s -S -basicaa -early-cse-memssa \| FileCheck %s

	declare void @llvm.experimental.guard(i1,...)			declare void @llvm.experimental.guard(i1,...)

	define i32 @test0(i32* %ptr, i1 %cond) {			define i32 @test0(i32* %ptr, i1 %cond) {
	; We can do store to load forwarding over a guard, since it does not			; We can do store to load forwarding over a guard, since it does not
	; clobber memory			; clobber memory

	; CHECK-LABEL: @test0(			; CHECK-LABEL: @test0(
	▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/instsimplify-dom.ll

	; RUN: opt -early-cse -S < %s \| FileCheck %s			; RUN: opt -early-cse -S < %s \| FileCheck %s
				; RUN: opt -basicaa -early-cse-memssa -S < %s \| FileCheck %s
	; PR12231			; PR12231

	declare i32 @f()			declare i32 @f()

	define i32 @fn() {			define i32 @fn() {
	entry:			entry:
	br label %lbl_1215			br label %lbl_1215

	Show All 10 Lines

test/Transforms/EarlyCSE/invariant-loads.ll

; RUN: opt -S -early-cse < %s \| FileCheck %s		; RUN: opt -S -early-cse < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-NOMEMSSA
		; RUN: opt -S -basicaa -early-cse-memssa < %s \| FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-MEMSSA

declare void @clobber_and_use(i32)		declare void @clobber_and_use(i32)

define void @f_0(i32* %ptr) {		define void @f_0(i32* %ptr) {
; CHECK-LABEL: @f_0(		; CHECK-LABEL: @f_0(
; CHECK: %val0 = load i32, i32* %ptr, !invariant.load !0		; CHECK: %val0 = load i32, i32* %ptr, !invariant.load !0
; CHECK: call void @clobber_and_use(i32 %val0)		; CHECK: call void @clobber_and_use(i32 %val0)
; CHECK: call void @clobber_and_use(i32 %val0)		; CHECK: call void @clobber_and_use(i32 %val0)
Show All 23 Lines	; CHECK: call void @clobber_and_use(i32 %val0)
call void @clobber_and_use(i32 %val0)		call void @clobber_and_use(i32 %val0)
%val1 = load i32, i32* %ptr		%val1 = load i32, i32* %ptr
call void @clobber_and_use(i32 %val1)		call void @clobber_and_use(i32 %val1)
ret void		ret void
}		}

define void @f_2(i32* %ptr) {		define void @f_2(i32* %ptr) {
; Negative test -- we can't forward a non-invariant load into an		; Negative test -- we can't forward a non-invariant load into an
; invariant load.		; invariant load. We can eliminate the second load when using
		; MemorySSA since it tells us both loads are from the same heap state
		; (LiveOnEntry).

		; FIXME: I think this test should be changed to check that the second load is eliminated.
		; Waiting on Sajoy/community input on !invariant.load semantics and LangRef clarifications.

; CHECK-LABEL: @f_2(		; CHECK-LABEL: @f_2(
; CHECK: %val0 = load i32, i32* %ptr		; CHECK: %val0 = load i32, i32* %ptr
; CHECK: call void @clobber_and_use(i32 %val0)		; CHECK: call void @clobber_and_use(i32 %val0)
; CHECK: %val1 = load i32, i32* %ptr, !invariant.load !0		; CHECK-NOMEMSSA: %val1 = load i32, i32* %ptr, !invariant.load !0
; CHECK: call void @clobber_and_use(i32 %val1)		; CHECK-NOMEMSSA-NEXT: call void @clobber_and_use(i32 %val1)
		; CHECK-MEMSSA-NEXT: call void @clobber_and_use(i32 %val0)

%val0 = load i32, i32* %ptr		%val0 = load i32, i32* %ptr
call void @clobber_and_use(i32 %val0)		call void @clobber_and_use(i32 %val0)
%val1 = load i32, i32* %ptr, !invariant.load !{}		%val1 = load i32, i32* %ptr, !invariant.load !{}
call void @clobber_and_use(i32 %val1)		call void @clobber_and_use(i32 %val1)
ret void		ret void
}		}

▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Transforms/EarlyCSE/memoryssa.ll

This file was added.

				; RUN: opt < %s -S -early-cse \| FileCheck %s --check-prefix=CHECK-NOMEMSSA
				; RUN: opt < %s -S -basicaa -early-cse-memssa \| FileCheck %s
				; RUN: opt < %s -S -passes='early-cse' \| FileCheck %s --check-prefix=CHECK-NOMEMSSA
				; RUN: opt < %s -S -aa-pipeline=basic-aa -passes='early-cse-memssa' \| FileCheck %s

				@G1 = global i32 zeroinitializer
				@G2 = global i32 zeroinitializer

				reamesUnsubmitted Not Done Reply Inline Actions If we do go this way, you'll need far far more tests. reames: If we do go this way, you'll need far far more tests.
				;; Simple load value numbering across non-clobbering store.
				; CHECK-LABEL: @test1(
				; CHECK-NOMEMSSA-LABEL: @test1(
				define i32 @test1() {
				%V1 = load i32, i32* @G1
				store i32 0, i32* @G2
				%V2 = load i32, i32* @G1
				; CHECK-NOMEMSSA: sub i32 %V1, %V2
				%Diff = sub i32 %V1, %V2
				ret i32 %Diff
				; CHECK: ret i32 0
				}

				;; Simple dead store elimination across non-clobbering store.
				; CHECK-LABEL: @test2(
				; CHECK-NOMEMSSA-LABEL: @test2(
				define void @test2() {
				entry:
				%V1 = load i32, i32* @G1
				; CHECK: store i32 0, i32* @G2
				store i32 0, i32* @G2
				; CHECK-NOT: store
				; CHECK-NOMEMSSA: store i32 %V1, i32* @G1
				store i32 %V1, i32* @G1
				ret void
				}

test/Transforms/EarlyCSE/read-reg.ll

	; RUN: opt -S -early-cse < %s \| FileCheck %s			; RUN: opt -S -early-cse < %s \| FileCheck %s
				; RUN: opt -S -basicaa -early-cse-memssa < %s \| FileCheck %s
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define i64 @f(i64 %x) #0 {			define i64 @f(i64 %x) #0 {
	entry:			entry:
	%0 = call i64 @llvm.read_register.i64(metadata !0)			%0 = call i64 @llvm.read_register.i64(metadata !0)
	call void bitcast (void (...)* @foo to void ()*)()			call void bitcast (void (...)* @foo to void ()*)()
	Show All 25 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[EarlyCSE] Optionally use MemorySSA. NFC.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 68911

include/llvm-c/Transforms/Scalar.h

include/llvm/InitializePasses.h

include/llvm/Transforms/Scalar.h

include/llvm/Transforms/Scalar/EarlyCSE.h

lib/Passes/PassRegistry.def

lib/Transforms/Scalar/EarlyCSE.cpp

lib/Transforms/Scalar/Scalar.cpp

test/Transforms/EarlyCSE/AArch64/intrinsics.ll

test/Transforms/EarlyCSE/AArch64/ldstN.ll

test/Transforms/EarlyCSE/atomics.ll

test/Transforms/EarlyCSE/basic.ll

test/Transforms/EarlyCSE/commute.ll

test/Transforms/EarlyCSE/conditional.ll

test/Transforms/EarlyCSE/edge.ll

test/Transforms/EarlyCSE/fence.ll

test/Transforms/EarlyCSE/flags.ll

test/Transforms/EarlyCSE/floatingpoint.ll

test/Transforms/EarlyCSE/guards.ll

test/Transforms/EarlyCSE/instsimplify-dom.ll

test/Transforms/EarlyCSE/invariant-loads.ll

test/Transforms/EarlyCSE/memoryssa.ll

test/Transforms/EarlyCSE/read-reg.ll

[EarlyCSE] Optionally use MemorySSA. NFC.
ClosedPublic