This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
-
MemorySSAUpdater.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
54/55
LICM.cpp
1/1
LoopSink.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
argmemonly-call.ll
-
hoist-bitcast-load.ll
-
hoist-debuginvariant.ll
-
hoist-deref-load.ll
-
hoist-fast-fdiv.ll
-
hoist-invariant-load.ll
-
hoist-nounwind.ll
-
hoist-phi.ll
-
hoist-round.ll
-
hoisting.ll
-
sink-promote.ll
-
sink.ll
1/1
sinking.ll
-
volatile-alias.ll

Differential D40375

Use MemorySSA in LICM to do sinking and hoisting.
ClosedPublic

Authored by asbirlea on Nov 22 2017, 2:54 PM.

Download Raw Diff

Details

Reviewers

sanjoy
davide
gberry
george.burgess.iv

Commits

rGcae12edaaa34: Use MemorySSA in LICM to do sinking and hoisting.
rL350879: Use MemorySSA in LICM to do sinking and hoisting.

Summary

Step 2 in using MemorySSA in LICM:
Use MemorySSA in LICM to do sinking and hoisting, all under "EnableMSSALoopDependency" flag.
Promotion is disabled.

Enable flag in LICM sink/hoist tests to test correctness of this change. Moved one test which
relied on promotion, in order to test all sinking tests.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 26543
Build 26542: arc lint + arc unit

Event Timeline

asbirlea created this revision.Nov 22 2017, 2:54 PM

Harbormaster completed remote builds in B12411: Diff 124012.Nov 22 2017, 2:54 PM

Herald added a subscriber: Prazek. · View Herald TranscriptNov 22 2017, 2:54 PM

Gentle ping!

Note this is still NFC since flag to enable use of MemorySSA remains disabled during this transition.
However I enabled testing to maintain correctness.

asbirlea added a parent revision: D40274: Add MemorySSA as loop dependency, disabled by default [NFC]..Dec 4 2017, 3:58 PM

asbirlea added a child revision: D35741: Add MemorySSA alternative to AliasSetTracker in LICM..Dec 5 2017, 5:15 PM

Initial round of comments.

lib/Transforms/Scalar/LICM.cpp
173	Can this be an `std::unique_ptr`? And why is it on the class as opposed to being a stack variable in `runOnLoop` like `CurAST`?
308	This seems unnecessary.
1036–1043	Probably better to leave this un-initialized; that way the compiler will warn if we forget to initialize it in one branch. You could also use a ternary here.
1104	This can probably be a single `llvm::any_of` on `CurLoop->blocks()`.
2121	How about call it `isVolatileLoadOrLiveOnEntry`?
2123	The usual idiom I've seen for this is: if (!Processed.insert(Acc).second) return false;
2124	Should we be returning `true` in this recursive case? IE if we have a `MemoryPhi` that has one of its inputs as itself and the other input as live-on-entry then shouldn't that be equivalent to just a live-on-entry?
2130	I'd avoid testing for `MemoryDef` early -- instead I'd suggest doing this as: if (auto Phi = dyn_cast_or_null<MemoryPhi>(Acc)) { } else if (auto Def = dyn_cast_or_null<MemoryDef>(Acc)) { }
2134	How about a range for on `Phi->operands()`? Or even better -- `return llvm::all_of(Phi->operands(), <lambda>);`
2141	Is this correct even if defining access for `Def` is a store in the loop?
2142	LLVM style is avoiding braces for one liners. You could also fold the assignment into the condition, like `if (MemoryUseOrDef *AccI = ...)`
2156	This can just be `if (MemoryUse *MU = dyn_cast_or_null<MemoryUse>(MUD))`
2173	s/if/If/
2174	This seems fishy -- if `volatile` loads do not invalidate pointers then why are they treated as defs by MSSA?
lib/Transforms/Scalar/LoopSink.cpp
307	How about adding `/MemorySSAUpdater=/nullptr` etc. here?

This revision now requires changes to proceed.Dec 14 2017, 11:55 PM

Address comments.

lib/Transforms/Scalar/LICM.cpp
2123	Code in question removed.
2124	Right, but code in question removed.
2134	If using `Phi->operands()` I need to re-add the conversion from Use to MemoryAccess hidden in the macro in getIncomingValue(). New code would be: return llvm::all_of(Phi->operands(), [MSSA, &Processed](const Use &Arg) { return isVolatileLoadOrLiveOnEntry( cast_or_null<MemoryAccess>(Arg.get()), MSSA, Processed); }); Does this look better than existing? Scratch that, code in question removed.
2141	Code updated, see comment below.
2174	Well, because they should be treated as defs. Motivation for special casing this was test/Transforms/LICM/volatile-alias.ll. I updated the code to no longer consider volatile loads acceptable, meaning test in question will fail. And I believe that's the right thing.

Some of the comments inline may be naive since I'm not very familiar with MemorySSA.

lib/Transforms/Scalar/LICM.cpp
418	Can this be a `std::unique_ptr`?
435	Should this assert be checking `CurAST != nullptr ^ (MSSAUpdater != nullptr && MSSA != nullptr)`?
1081	Minor nit: I usually prefer leaving variables like `Invalidated` uninitialized; that way if I forget to assign to it on one branch of the `if` I'll get a warning and/or an ASan failure.
1088	We should probably early exit if `!Op->getType()->isPointerTy()` (and assert `Op->getType()->isPointerTy()` in `pointerInvalidatedByLoopWithMSSA` etc.
1096–1097	Minor nit: even though unnecessary, I'd prefer using braces here since for readability.
1281	I'm not sure that setting the defining access to nullptr here is correct -- if there is some subtle reason why it is, please add a comment.
1283	Instead of two `dyn_cast_or_null`s how about: if (NewMemAcc) { if (auto MemDef = dyn_cast<MemoryDef>(NewMemAcc)) { ... } else { auto MemUse = cast<MemoryUse>(NewMemAcc); } } ?
1285	I'd prefer `/RenameUses=/true`.
1291	I'm not clear on how this works -- IIUC you're calling into `getClobberingMemoryAccess` with `NewMemUse`'s `DefiningAccess` set to `nullptr`, but: Why don't we have to do this for `MemoryDef`s? What if `MSSA->getWalker()` is a `DoNothingMemorySSAWalker`? There are places where we should be crashing, like: `CachingWalker::getClobberingMemoryAccess` line 2078 that calls `getClobberingMemoryAccess(DefiningAccess, Q)` that calls `ClobberWalker::findWalker` which has `if (auto *MU = dyn_cast<MemoryUse>(Start))`.
1570–1571	Can this be `cast_or_null`?
2135	Stray comment?
2138	Minor nit: probably cleaner to use `if (MSSA->isLiveOnEntryDef(Source) \|\| !CurLoop->contains(Source->getBlock()))` here.

This revision now requires changes to proceed.Dec 20 2017, 4:31 PM

Comments.

asbirlea added a reviewer: george.burgess.iv.Dec 21 2017, 4:33 PM

asbirlea added inline comments.

lib/Transforms/Scalar/LICM.cpp
435	At some point I was testing with both enabled, but considering the added compile time when doing that, yes :).
1291	AFAIU: For Defs the DefiningAccess is properly updated. For Uses, there is a DefiningAccess set, but that one is not optimized. When adding an access like in this case, calling getClobberingMemoryAccess will optimize it. If the walker is DoNothingMemorySSAWalker, it won't optimize? Pulling in george.burgess.iv to clarify, it's possible I'm missing something here.

Only looked at the bit I was mentioned in.

lib/Transforms/Scalar/LICM.cpp
1291	IIUC you're calling into getClobberingMemoryAccess with NewMemUse's DefiningAccess set to nullptr I'm assuming you meant `MemUse`, in which case `insertUse` sets up the defining access to something non-null. It's sort of sketchy that we're trying to eagerly update the defining access using the walker, though. Is there a reason that we can't call `MSSA->getWalker()->getClobberingMemoryAccess(MemUse);` when we need optimized clobbers? Doing so should be incredibly cheap when the use has already been optimized, and it lets us be lazy about the work we do (if we don't need an accurate clobber for `MemUse`, we'll just never compute it)

asbirlea marked an inline comment as done.Dec 22 2017, 11:35 AM

asbirlea added a subscriber: • dberlin.

asbirlea added inline comments.

lib/Transforms/Scalar/LICM.cpp
1291	I'm assuming you meant MemUse, in which case insertUse sets up the defining access to something non-null. Thanks for clarifying this. It's sort of sketchy that we're trying to eagerly update the defining access using the walker, though. Is there a reason that we can't call MSSA->getWalker()->getClobberingMemoryAccess(MemUse); when we need optimized clobbers? Doing so should be incredibly cheap when the use has already been optimized, and it lets us be lazy about the work we do (if we don't need an accurate clobber for MemUse, we'll just never compute it) The reasoning here is that when making code changes such as hoist/sink, I need accesses to be (reasonably) optimized. I'm not necessarily to looking to get full precision here, I'd be perfectly happy with an API to optimize up to the MemorySSA optimize-uses cap. In LICM.cpp:1652 (and in the future for promotion) I'm relying on getting the defining access, not the walker->clobbering access, allowing for some imprecision in exchange for not paying the lookup time for blocks with large load/store count. I had come across a case where after a hoist or sink, this approach did not work for a very small testcase without using the walker to get clobbering. So the choice I made was to use the walker ahead of time to update accesses I insert when cloning, allowing me to use getDefining afterwards. Hopefully I clarified the intention here :). Do you think there's a better way to address this problem? I'm open to suggestions. @dberlin So, this use renaming is expensive, and i'd like to understand the goal (It requires walking all stores and loads in the dominator subtree rooted at your memory access) It's really meant for when you've inserted defs that may alias existing uses. If your goal is to replace all uses of an old memory access with some new memoryaccess, you want to use the replace API :) The code here is creating a new memory access to handle cloning an access in all exit blocks, where that access may alias existing uses, so I think using the replace API isn't enough. Please let me know if there is a way to handle this differently.

Gentle ping.

Herald added a subscriber: jlebar. · View Herald TranscriptJan 22 2018, 7:10 AM

Add a flag for the decision of using getDefining vs Walker->getClobbering, for optimizing pathological cases.

Harbormaster completed remote builds in B14182: Diff 131215.Jan 24 2018, 3:07 AM

lebedev.ri added a subscriber: lebedev.ri.Jan 24 2018, 3:44 AM

lebedev.ri added inline comments.

test/Transforms/LICM/sinking.ll
363	But there is no `sink-promote.ll` in this differential? Is that in some other patch?

Added sink-promote.ll. Thanks for catching this!

Harbormaster completed remote builds in B14185: Diff 131230.Jan 24 2018, 5:08 AM

george.burgess.iv added inline comments.Jan 24 2018, 7:58 PM

lib/Transforms/Scalar/LICM.cpp
1291	(To be clear: the addition of the flag resolved my MemorySSA comments. While it's sort of hacky that we need to have it in the first place, the best place to handle "this {block,function} is pathologically big; let's give up instead of taking forever to walk it, and rewalk it, and rewalk it, and ..." is in MSSA, IMO. Until we make MSSA smart enough to actually do that, though, a targeted escape hatch seems like the best way forward to me.)

Thanks George, marking as Done. Also note that the optimization enables by this flag remains disabled.

Other reviewers: please let me know if there are remaining issues or concerns.
Please note that the use of MemorySSA remains disabled with this patch.

sanjoy added inline comments.Feb 21 2018, 1:11 PM

lib/Transforms/Scalar/LICM.cpp
1	Stray edit?
109	Line too long?
435	Probably better to have two asserts here: `assert((MSSAUpdater == nullptr) == (MSSA == nullptr))` `assert((CurAST != nullptr) ^ (MSSA != nullptr))`
1289	Note: I did not revisit this bit since it looks like @george.burgess.iv settled it.
1576	I not convinced this is a good idea -- it looks like we're coding to the implementation of `getClobberingMemoryAccess` here instead of coding to its interface. But if @george.burgess.iv is fine with it, I am too.
1576	It looks like `MemorySSA::CachingWalker::getClobberingMemoryAccess` has an early exit if `OldMemAcc` was optimized. Are we okay not updating the defining access in that case?
2127	If `MUD` is always guaranteed to be a use then this should be `cast_or_null`. Otherwise please update the comment.

Address comments.

asbirlea marked an inline comment as done.Feb 22 2018, 4:39 PM

asbirlea added inline comments.

lib/Transforms/Scalar/LICM.cpp
1576	If it's been optimized, then the defining access should have been updated, and viceversa. @george.burgess.iv: Could you clarify this please? I don't see resetOptimized being called on any path for the call chain starting with moreToPlace->insertDef, though it would make sense for this to happen. Am I missing something here?

Harbormaster completed remote builds in B15336: Diff 135561.Feb 22 2018, 4:41 PM

george.burgess.iv added inline comments.Feb 22 2018, 9:36 PM

lib/Transforms/Scalar/LICM.cpp
1576	I not convinced this is a good idea -- it looks like we're coding to the implementation of getClobberingMemoryAccess here instead of coding to its interface. But if @george.burgess.iv is fine with it, I am too In general, I agree, but Walkers are a sticky concept. The model we've sold people is "always call `getWalker()->getClobberingMemoryAddress(MA)` if you need the optimized access; the walker will handle making repeated calls fast," and we're specifically calling this on `MSSA->getWalker()`. I'll try to find a way to structure MSSA so that this guarantee is more obvious. Am I missing something here? Our theoretical `isOptimized` bit depends on two things: being flipped by a call to `setOptimized`, and either our defining access or optimized access having the same ID as it did when `setOptimized` was called. (...It also depends on stores not appearing out of thin air ;) ) So, when you move a Def, all Uses are implicitly invalided by the `replaceAllUsesWith` (since their defining access no longer has the same ID as before), and the Def's cached clobber is invalidated when you call `setDefiningAccess` on it. Similarly, when you move a Use, we just set its defining access to whatever's closest. So, its cache will be invalidated (or not, if you move it right under its old clobber. But then the optimized use is still correct). What I'm unsure about is what happens what happens to `Def`s optimized to a `Def` when said optimized-to `Def` is moved. I feel like I asked this during the original reviews for the caching/updater stuff, but I don't remember the answer. I'll do archaeology and get back to you. :) (If the Def/Def case is uninteresting to this pass, feel free to not block this patch on my search)

george.burgess.iv added inline comments.Feb 26 2018, 11:41 PM

lib/Transforms/Scalar/LICM.cpp
1576	OK, so assuming the fixes stick, https://bugs.llvm.org/show_bug.cgi?id=36529 should fix any def/def craziness we may see here. Moving things around above the defs should also be fine, since if you have a Def A which has an "optimized" clobber B and you add a clobber between them, that sounds like a functional change (or, if nothing can observe that clobber, a waste of CPU time ;) ). Similarly, if B is moved such that a pre-existing clobber should become A's clobber, that sounds like a functional change. I think this resolves all the questions. Please let me know if not.

Thank you for the fix, George. Yes, I think all issue are addressed now.

Rebase and ping.

Harbormaster completed remote builds in B18300: Diff 147428.May 17 2018, 7:00 PM

asbirlea mentioned this in D45300: Update MemorySSA BasicBlockUtils..Aug 14 2018, 10:58 AM

Rebase patch after resect changes.

Updates:
Change test sink-promote due to PR38989.
Sink is incomplete with MSSA due to recent changes, added TODO and will extend in separate patch.

Herald added a subscriber: jfb. · View Herald TranscriptSep 24 2018, 1:08 PM

Harbormaster completed remote builds in B22998: Diff 166735.Sep 24 2018, 1:11 PM

Cleaned up after a series of MemorySSA updates, and rebased with ToT changes.

Harbormaster completed remote builds in B26543: Diff 180775.Jan 8 2019, 5:29 PM

I didn't carefully review the MSSA specific stuff since you're the domain expert here. I do have some general comments.

lib/Transforms/Scalar/LICM.cpp
110	Not sure if it is beneficial to split out a separate `[Verbose]` section -- maybe we can combine both into a large single comment?
124	Just to be clear, if this is true then `opt -licm -some_pass_using_mssa` is not the same as `opt -licm \| opt -some_pass_using_mssa`? If yes, that needs to be explicitly (and loudly) mentioned.
655	Just to be clear, we don't have to tell `MSSAU` about these new blocks (and the edges between these new blocks)?
964	Update comment?
995	So if there is a memory phi then it has to be `Acc->begin()`? Btw, it might be cleaner to express this as a loop: int nonphi = 0; for (auto X : Acc) { if (is_phi) { continue; } if (X->getInst() != I \|\| nonphi++ == 1) { return false; } } return true;
1122	Can be `return isOnlyMemoryAccess(FI, CurLoop, MSSAU)`.

This revision now requires changes to proceed.Jan 9 2019, 1:53 PM

Address comments.

I also cleaned up the usage of EnableLicmCap. It is only used in 2 places now, with no forced calls to getClobbering (the issue in MemorySSA requiring this is now resolved).
FWIW, I'm planning to replace this with an integer cap in a future patch (i.e. allow a limited number of calls to getClobbering, vs all or none like we have now).

lib/Transforms/Scalar/LICM.cpp
124	Hmm, let me think this out loud. If the flag is true, there may be more opportunities for sink/hoist found at the cost of compile time, so the end result (module) may be different. But the users of the resulting MemorySSA should not be affected by the flag being true or false. There may be more things already optimized in MemorySSA with the flag set to false. And more things to be optimized on demand with the flag set to true. So printing MemorySSA after LICM may have differences. But the pass running after LICM should get the same results to MemorySSA clobbering queries. Does this make sense?
655	`MSSAU` doesn't keep track of blocks with no memory accesses. This code if just duplicating the control flow from inside the loop to outside the loop. `MSSAU` only starts caring when instructions that touch memory are moved in there. It also matters that the correct block predecessors are included in MemoryPhis, so when control flow changes to modify that, we need to update `MSSA`. The 3 lines below address this.
995	Yes, a block may have a single `MemoryPhi`, and it's always the first one in the list of Defs. SG! Updated with loop.

Harbormaster completed remote builds in B26605: Diff 180956.Jan 9 2019, 4:05 PM

lgtm

This revision is now accepted and ready to land.Jan 9 2019, 4:40 PM

Closed by commit rL350879: Use MemorySSA in LICM to do sinking and hoisting. (authored by asbirlea). · Explain WhyJan 10 2019, 11:33 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Analysis/

MemorySSAUpdater.h

1 line

Transforms/

Utils/

LoopUtils.h

8 lines

lib/

Transforms/

Scalar/

LICM.cpp

379 lines

LoopSink.cpp

2 lines

test/

Transforms/

LICM/

argmemonly-call.ll

25 lines

hoist-bitcast-load.ll

1 line

hoist-debuginvariant.ll

3 lines

hoist-deref-load.ll

2 lines

hoist-fast-fdiv.ll

1 line

hoist-invariant-load.ll

1 line

1 line

5 lines

1 line

1 line

50 lines

1 line

47 lines

1 line

Diff 180775

include/llvm/Analysis/MemorySSAUpdater.h

	Show All 29 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_ANALYSIS_MEMORYSSAUPDATER_H			#ifndef LLVM_ANALYSIS_MEMORYSSAUPDATER_H
	#define LLVM_ANALYSIS_MEMORYSSAUPDATER_H			#define LLVM_ANALYSIS_MEMORYSSAUPDATER_H

	#include "llvm/ADT/SmallPtrSet.h"			#include "llvm/ADT/SmallPtrSet.h"
	#include "llvm/ADT/SmallSet.h"			#include "llvm/ADT/SmallSet.h"
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/Analysis/LoopIterator.h"			#include "llvm/Analysis/LoopIterator.h"
	#include "llvm/Analysis/MemorySSA.h"			#include "llvm/Analysis/MemorySSA.h"
	#include "llvm/IR/BasicBlock.h"			#include "llvm/IR/BasicBlock.h"
	#include "llvm/IR/CFGDiff.h"			#include "llvm/IR/CFGDiff.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	#include "llvm/IR/Module.h"			#include "llvm/IR/Module.h"
	#include "llvm/IR/OperandTraits.h"			#include "llvm/IR/OperandTraits.h"
	#include "llvm/IR/Type.h"			#include "llvm/IR/Type.h"
	▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/LoopUtils.h

	Show All 35 Lines
	namespace llvm {			namespace llvm {

	class AliasSet;			class AliasSet;
	class AliasSetTracker;			class AliasSetTracker;
	class BasicBlock;			class BasicBlock;
	class DataLayout;			class DataLayout;
	class Loop;			class Loop;
	class LoopInfo;			class LoopInfo;
				class MemorySSAUpdater;
	class OptimizationRemarkEmitter;			class OptimizationRemarkEmitter;
	class PredicatedScalarEvolution;			class PredicatedScalarEvolution;
	class PredIteratorCache;			class PredIteratorCache;
	class ScalarEvolution;			class ScalarEvolution;
	class SCEV;			class SCEV;
	class TargetLibraryInfo;			class TargetLibraryInfo;
	class TargetTransformInfo;			class TargetTransformInfo;

	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	/// reverse depth first order w.r.t the DominatorTree. This allows us to visit			/// reverse depth first order w.r.t the DominatorTree. This allows us to visit
	/// uses before definitions, allowing us to sink a loop body in one pass without			/// uses before definitions, allowing us to sink a loop body in one pass without
	/// iteration. Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree,			/// iteration. Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree,
	/// DataLayout, TargetLibraryInfo, Loop, AliasSet information for all			/// DataLayout, TargetLibraryInfo, Loop, AliasSet information for all
	/// instructions of the loop and loop safety information as			/// instructions of the loop and loop safety information as
	/// arguments. Diagnostics is emitted via \p ORE. It returns changed status.			/// arguments. Diagnostics is emitted via \p ORE. It returns changed status.
	bool sinkRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,			bool sinkRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,
	TargetLibraryInfo , TargetTransformInfo , Loop *,			TargetLibraryInfo , TargetTransformInfo , Loop *,
	AliasSetTracker , ICFLoopSafetyInfo ,			AliasSetTracker , MemorySSAUpdater , ICFLoopSafetyInfo *,
	OptimizationRemarkEmitter *ORE);			OptimizationRemarkEmitter *ORE);

	/// Walk the specified region of the CFG (defined by all blocks			/// Walk the specified region of the CFG (defined by all blocks
	/// dominated by the specified block, and that are in the current loop) in depth			/// dominated by the specified block, and that are in the current loop) in depth
	/// first order w.r.t the DominatorTree. This allows us to visit definitions			/// first order w.r.t the DominatorTree. This allows us to visit definitions
	/// before uses, allowing us to hoist a loop body in one pass without iteration.			/// before uses, allowing us to hoist a loop body in one pass without iteration.
	/// Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree, DataLayout,			/// Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree, DataLayout,
	/// TargetLibraryInfo, Loop, AliasSet information for all instructions of the			/// TargetLibraryInfo, Loop, AliasSet information for all instructions of the
	/// loop and loop safety information as arguments. Diagnostics is emitted via \p			/// loop and loop safety information as arguments. Diagnostics is emitted via \p
	/// ORE. It returns changed status.			/// ORE. It returns changed status.
	bool hoistRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,			bool hoistRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,
	TargetLibraryInfo , Loop , AliasSetTracker *,			TargetLibraryInfo , Loop , AliasSetTracker *,
	ICFLoopSafetyInfo , OptimizationRemarkEmitter ORE);			MemorySSAUpdater , ICFLoopSafetyInfo ,
				OptimizationRemarkEmitter *ORE);

	/// This function deletes dead loops. The caller of this function needs to			/// This function deletes dead loops. The caller of this function needs to
	/// guarantee that the loop is infact dead.			/// guarantee that the loop is infact dead.
	/// The function requires a bunch or prerequisites to be present:			/// The function requires a bunch or prerequisites to be present:
	/// - The loop needs to be in LCSSA form			/// - The loop needs to be in LCSSA form
	/// - The loop needs to have a Preheader			/// - The loop needs to have a Preheader
	/// - A unique dedicated exit block must exist			/// - A unique dedicated exit block must exist
	///			///
	▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	/// do efficiently from within this routine.			/// do efficiently from within this routine.
	/// \p TargetExecutesOncePerLoop is true only when it is guaranteed that the			/// \p TargetExecutesOncePerLoop is true only when it is guaranteed that the
	/// target executes at most once per execution of the loop body. This is used			/// target executes at most once per execution of the loop body. This is used
	/// to assess the legality of duplicating atomic loads. Generally, this is			/// to assess the legality of duplicating atomic loads. Generally, this is
	/// true when moving out of loop and not true when moving into loops.			/// true when moving out of loop and not true when moving into loops.
	/// If \p ORE is set use it to emit optimization remarks.			/// If \p ORE is set use it to emit optimization remarks.
	bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,			bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
	Loop CurLoop, AliasSetTracker CurAST,			Loop CurLoop, AliasSetTracker CurAST,
	bool TargetExecutesOncePerLoop,			MemorySSAUpdater *MSSAU, bool TargetExecutesOncePerLoop,
	OptimizationRemarkEmitter *ORE = nullptr);			OptimizationRemarkEmitter *ORE = nullptr);

	/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.			/// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
	Value *createMinMaxOp(IRBuilder<> &Builder,			Value *createMinMaxOp(IRBuilder<> &Builder,
	RecurrenceDescriptor::MinMaxRecurrenceKind RK,			RecurrenceDescriptor::MinMaxRecurrenceKind RK,
	Value Left, Value Right);			Value Left, Value Right);

	/// Generates an ordered vector reduction using extracts to reduce the value.			/// Generates an ordered vector reduction using extracts to reduce the value.
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LICM.cpp

//===-- LICM.cpp - Loop Invariant Code Motion Pass ------------------------===//		//===-- LICM.cpp - Loop Invariant Code Motion Pass ------------------------===//
		sanjoyUnsubmitted Done Reply Inline Actions Stray edit? sanjoy: Stray edit?
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
Show All 31 Lines
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/GuardUtils.h"		#include "llvm/Analysis/GuardUtils.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopIterator.h"		#include "llvm/Analysis/LoopIterator.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"		#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/PredIteratorCache.h"		#include "llvm/IR/PredIteratorCache.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Scalar/LoopPassManager.h"		#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/SSAUpdater.h"		#include "llvm/Transforms/Utils/SSAUpdater.h"
#include <algorithm>		#include <algorithm>
#include <utility>		#include <utility>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "licm"		#define DEBUG_TYPE "licm"

Show All 20 Lines	cl::desc("Max num uses visited for identifying load "
"invariance in loop using invariant start (default = 8)"));		"invariance in loop using invariant start (default = 8)"));

// Default value of zero implies we use the regular alias set tracker mechanism		// Default value of zero implies we use the regular alias set tracker mechanism
// instead of the cross product using AA to identify aliasing of the memory		// instead of the cross product using AA to identify aliasing of the memory
// location we are interested in.		// location we are interested in.
static cl::opt<int>		static cl::opt<int>
LICMN2Theshold("licm-n2-threshold", cl::Hidden, cl::init(0),		LICMN2Theshold("licm-n2-threshold", cl::Hidden, cl::init(0),
cl::desc("How many instruction to cross product using AA"));		cl::desc("How many instruction to cross product using AA"));

		sanjoyUnsubmitted Done Reply Inline Actions Line too long? sanjoy: Line too long?
		// Experimental option to allow imprecision in LICM (use MemorySSA cap) in
		sanjoyUnsubmitted Done Reply Inline Actions Not sure if it is beneficial to split out a separate `[Verbose]` section -- maybe we can combine both into a large single comment? sanjoy: Not sure if it is beneficial to split out a separate `[Verbose]` section -- maybe we can…
		// pathological cases, in exchange for faster compile. This is to be removed
		// if MemorySSA starts to address the same issue.

		// [Verbose] This flag applies only when LICM uses MemorySSA instead on
		// AliasSetTracker.
		// When flag is disabled (default), LICM calls MemorySSAWalker's
		// getClobberingMemoryAccess, which gets perfect accuracy. When flag is enabled,
		// LICM will call into MemorySSA's getDefiningAccess, which may not be precise,
		// since optimizeUses is capped. This means that doing sink, hoist and promotion
		// will not incur a performance penalty for loops with large load and store
		// count. For the updates done by sink/hoist/promotion (insertUse, moveToPlace),
		// we will need to explicitly call getClobberingMemoryAccesses. This will get
		// perfect results for small tests, and imprecise but fast for large ones.
		static cl::opt<bool> EnableLicmCap(
		sanjoyUnsubmitted Not Done Reply Inline Actions Just to be clear, if this is true then `opt -licm -some_pass_using_mssa` is not the same as `opt -licm \| opt -some_pass_using_mssa`? If yes, that needs to be explicitly (and loudly) mentioned. sanjoy: Just to be clear, if this is true then `opt -licm -some_pass_using_mssa` is not the same as…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions Hmm, let me think this out loud. If the flag is true, there may be more opportunities for sink/hoist found at the cost of compile time, so the end result (module) may be different. But the users of the resulting MemorySSA should not be affected by the flag being true or false. There may be more things already optimized in MemorySSA with the flag set to false. And more things to be optimized on demand with the flag set to true. So printing MemorySSA after LICM may have differences. But the pass running after LICM should get the same results to MemorySSA clobbering queries. Does this make sense? asbirlea: Hmm, let me think this out loud. If the flag is true, there may be more opportunities for…
		"enable-licm-cap", cl::init(false), cl::Hidden,
		cl::desc("Enable imprecision in LICM (uses MemorySSA cap) in "
		"pathological cases, in exchange for faster compile"));

static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);		static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);
static bool isNotUsedOrFreeInLoop(const Instruction &I, const Loop *CurLoop,		static bool isNotUsedOrFreeInLoop(const Instruction &I, const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
TargetTransformInfo *TTI, bool &FreeInLoop);		TargetTransformInfo *TTI, bool &FreeInLoop);
static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,		BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,
OptimizationRemarkEmitter *ORE);		MemorySSAUpdater MSSAU, OptimizationRemarkEmitter ORE);
static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,		static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,
const Loop CurLoop, ICFLoopSafetyInfo SafetyInfo,		const Loop CurLoop, ICFLoopSafetyInfo SafetyInfo,
OptimizationRemarkEmitter *ORE, bool FreeInLoop);		MemorySSAUpdater MSSAU, OptimizationRemarkEmitter ORE,
		bool FreeInLoop);
static bool isSafeToExecuteUnconditionally(Instruction &Inst,		static bool isSafeToExecuteUnconditionally(Instruction &Inst,
const DominatorTree *DT,		const DominatorTree *DT,
const Loop *CurLoop,		const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
const Instruction *CtxI = nullptr);		const Instruction *CtxI = nullptr);
static bool pointerInvalidatedByLoop(MemoryLocation MemLoc,		static bool pointerInvalidatedByLoop(MemoryLocation MemLoc,
AliasSetTracker CurAST, Loop CurLoop,		AliasSetTracker CurAST, Loop CurLoop,
AliasAnalysis *AA);		AliasAnalysis *AA);
		static bool pointerInvalidatedByLoopWithMSSA(MemorySSA MSSA, MemoryUse MU,
static Instruction *		Loop *CurLoop);
CloneInstructionInExitBlock(Instruction &I, BasicBlock &ExitBlock, PHINode &PN,		static Instruction *CloneInstructionInExitBlock(
const LoopInfo *LI,		Instruction &I, BasicBlock &ExitBlock, PHINode &PN, const LoopInfo *LI,
const LoopSafetyInfo *SafetyInfo);		const LoopSafetyInfo SafetyInfo, MemorySSAUpdater MSSAU);

static void eraseInstruction(Instruction &I, ICFLoopSafetyInfo &SafetyInfo,		static void eraseInstruction(Instruction &I, ICFLoopSafetyInfo &SafetyInfo,
AliasSetTracker *AST);		AliasSetTracker AST, MemorySSAUpdater MSSAU);

static void moveInstructionBefore(Instruction &I, Instruction &Dest,		static void moveInstructionBefore(Instruction &I, Instruction &Dest,
ICFLoopSafetyInfo &SafetyInfo);		ICFLoopSafetyInfo &SafetyInfo);

namespace {		namespace {
struct LoopInvariantCodeMotion {		struct LoopInvariantCodeMotion {
using ASTrackerMapTy = DenseMap<Loop *, std::unique_ptr<AliasSetTracker>>;		using ASTrackerMapTy = DenseMap<Loop *, std::unique_ptr<AliasSetTracker>>;
bool runOnLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,		bool runOnLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,
TargetLibraryInfo TLI, TargetTransformInfo TTI,		TargetLibraryInfo TLI, TargetTransformInfo TTI,
ScalarEvolution SE, MemorySSA MSSA,		ScalarEvolution SE, MemorySSA MSSA,
OptimizationRemarkEmitter *ORE, bool DeleteAST);		OptimizationRemarkEmitter *ORE, bool DeleteAST);

ASTrackerMapTy &getLoopToAliasSetMap() { return LoopToAliasSetMap; }		ASTrackerMapTy &getLoopToAliasSetMap() { return LoopToAliasSetMap; }

private:		private:
ASTrackerMapTy LoopToAliasSetMap;		ASTrackerMapTy LoopToAliasSetMap;

		sanjoyUnsubmitted Done Reply Inline Actions Can this be an `std::unique_ptr`? And why is it on the class as opposed to being a stack variable in `runOnLoop` like `CurAST`? sanjoy: Can this be an `std::unique_ptr`? And why is it on the class as opposed to being a stack…
std::unique_ptr<AliasSetTracker>		std::unique_ptr<AliasSetTracker>
collectAliasInfoForLoop(Loop L, LoopInfo LI, AliasAnalysis *AA);		collectAliasInfoForLoop(Loop L, LoopInfo LI, AliasAnalysis *AA);
};		};

struct LegacyLICMPass : public LoopPass {		struct LegacyLICMPass : public LoopPass {
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
LegacyLICMPass() : LoopPass(ID) {		LegacyLICMPass() : LoopPass(ID) {
initializeLegacyLICMPassPass(*PassRegistry::getPassRegistry());		initializeLegacyLICMPassPass(*PassRegistry::getPassRegistry());
Show All 28 Lines	struct LegacyLICMPass : public LoopPass {

/// This transformation requires natural loop information & requires that		/// This transformation requires natural loop information & requires that
/// loop preheaders be inserted into the CFG...		/// loop preheaders be inserted into the CFG...
///		///
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<LoopInfoWrapperPass>();		AU.addPreserved<LoopInfoWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
if (EnableMSSALoopDependency)		if (EnableMSSALoopDependency) {
AU.addRequired<MemorySSAWrapperPass>();		AU.addRequired<MemorySSAWrapperPass>();
		AU.addPreserved<MemorySSAWrapperPass>();
		}
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
getLoopAnalysisUsage(AU);		getLoopAnalysisUsage(AU);
}		}

using llvm::Pass::doFinalization;		using llvm::Pass::doFinalization;

bool doFinalization() override {		bool doFinalization() override {
assert(LICM.getLoopToAliasSetMap().empty() &&		assert(LICM.getLoopToAliasSetMap().empty() &&
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
bool LoopInvariantCodeMotion::runOnLoop(		bool LoopInvariantCodeMotion::runOnLoop(
Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,		Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,
TargetLibraryInfo TLI, TargetTransformInfo TTI, ScalarEvolution *SE,		TargetLibraryInfo TLI, TargetTransformInfo TTI, ScalarEvolution *SE,
MemorySSA MSSA, OptimizationRemarkEmitter ORE, bool DeleteAST) {		MemorySSA MSSA, OptimizationRemarkEmitter ORE, bool DeleteAST) {
bool Changed = false;		bool Changed = false;

assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");		assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");

std::unique_ptr<AliasSetTracker> CurAST = collectAliasInfoForLoop(L, LI, AA);		std::unique_ptr<AliasSetTracker> CurAST;
		std::unique_ptr<MemorySSAUpdater> MSSAU;
		if (!MSSA) {
		LLVM_DEBUG(dbgs() << "LICM: Using Alias Set Tracker.\n");
		CurAST = collectAliasInfoForLoop(L, LI, AA);
		} else {
		LLVM_DEBUG(dbgs() << "LICM: Using MemorySSA. Promotion disabled.\n");
		MSSAU = make_unique<MemorySSAUpdater>(MSSA);
		sanjoyUnsubmitted Done Reply Inline Actions This seems unnecessary. sanjoy: This seems unnecessary.
		}

// Get the preheader block to move instructions into...		// Get the preheader block to move instructions into...
BasicBlock *Preheader = L->getLoopPreheader();		BasicBlock *Preheader = L->getLoopPreheader();

// Compute loop safety information.		// Compute loop safety information.
ICFLoopSafetyInfo SafetyInfo(DT);		ICFLoopSafetyInfo SafetyInfo(DT);
SafetyInfo.computeLoopSafetyInfo(L);		SafetyInfo.computeLoopSafetyInfo(L);

// We want to visit all of the instructions in this loop... that are not parts		// We want to visit all of the instructions in this loop... that are not parts
// of our subloops (they have already had their invariants hoisted out of		// of our subloops (they have already had their invariants hoisted out of
// their loop, into this loop, so there is no need to process the BODIES of		// their loop, into this loop, so there is no need to process the BODIES of
// the subloops).		// the subloops).
//		//
// Traverse the body of the loop in depth first order on the dominator tree so		// Traverse the body of the loop in depth first order on the dominator tree so
// that we are guaranteed to see definitions before we see uses. This allows		// that we are guaranteed to see definitions before we see uses. This allows
// us to sink instructions in one pass, without iteration. After sinking		// us to sink instructions in one pass, without iteration. After sinking
// instructions, we perform another pass to hoist them out of the loop.		// instructions, we perform another pass to hoist them out of the loop.
//		//
if (L->hasDedicatedExits())		if (L->hasDedicatedExits())
Changed \|= sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, TTI, L,		Changed \|= sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, TTI, L,
CurAST.get(), &SafetyInfo, ORE);		CurAST.get(), MSSAU.get(), &SafetyInfo, ORE);
if (Preheader)		if (Preheader)
Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, L,		Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, L,
CurAST.get(), &SafetyInfo, ORE);		CurAST.get(), MSSAU.get(), &SafetyInfo, ORE);

// Now that all loop invariants have been removed from the loop, promote any		// Now that all loop invariants have been removed from the loop, promote any
// memory references to scalars that we can.		// memory references to scalars that we can.
// Don't sink stores from loops without dedicated block exits. Exits		// Don't sink stores from loops without dedicated block exits. Exits
// containing indirect branches are not transformed by loop simplify,		// containing indirect branches are not transformed by loop simplify,
// make sure we catch that. An additional load may be generated in the		// make sure we catch that. An additional load may be generated in the
// preheader for SSA updater, so also avoid sinking when no preheader		// preheader for SSA updater, so also avoid sinking when no preheader
// is available.		// is available.
Show All 12 Lines	if (!HasCatchSwitch) {
InsertPts.reserve(ExitBlocks.size());		InsertPts.reserve(ExitBlocks.size());
for (BasicBlock *ExitBlock : ExitBlocks)		for (BasicBlock *ExitBlock : ExitBlocks)
InsertPts.push_back(&*ExitBlock->getFirstInsertionPt());		InsertPts.push_back(&*ExitBlock->getFirstInsertionPt());

PredIteratorCache PIC;		PredIteratorCache PIC;

bool Promoted = false;		bool Promoted = false;

		if (CurAST.get()) {
// Loop over all of the alias sets in the tracker object.		// Loop over all of the alias sets in the tracker object.
for (AliasSet &AS : *CurAST) {		for (AliasSet &AS : *CurAST) {
// We can promote this alias set if it has a store, if it is a "Must"		// We can promote this alias set if it has a store, if it is a "Must"
// alias set, if the pointer is loop invariant, and if we are not		// alias set, if the pointer is loop invariant, and if we are not
// eliminating any volatile loads or stores.		// eliminating any volatile loads or stores.
if (AS.isForwardingAliasSet() \|\| !AS.isMod() \|\| !AS.isMustAlias() \|\|		if (AS.isForwardingAliasSet() \|\| !AS.isMod() \|\| !AS.isMustAlias() \|\|
!L->isLoopInvariant(AS.begin()->getValue()))		!L->isLoopInvariant(AS.begin()->getValue()))
continue;		continue;

assert(		assert(
!AS.empty() &&		!AS.empty() &&
"Must alias set should have at least one pointer element in it!");		"Must alias set should have at least one pointer element in it!");

SmallSetVector<Value *, 8> PointerMustAliases;		SmallSetVector<Value *, 8> PointerMustAliases;
for (const auto &ASI : AS)		for (const auto &ASI : AS)
PointerMustAliases.insert(ASI.getValue());		PointerMustAliases.insert(ASI.getValue());

Promoted \|= promoteLoopAccessesToScalars(		Promoted \|= promoteLoopAccessesToScalars(
PointerMustAliases, ExitBlocks, InsertPts, PIC, LI, DT, TLI, L,		PointerMustAliases, ExitBlocks, InsertPts, PIC, LI, DT, TLI, L,
CurAST.get(), &SafetyInfo, ORE);		CurAST.get(), &SafetyInfo, ORE);
}		}
		}
		// FIXME: Promotion initially disabled when using MemorySSA.

// Once we have promoted values across the loop body we have to		// Once we have promoted values across the loop body we have to
// recursively reform LCSSA as any nested loop may now have values defined		// recursively reform LCSSA as any nested loop may now have values defined
// within the loop used in the outer loop.		// within the loop used in the outer loop.
// FIXME: This is really heavy handed. It would be a bit better to use an		// FIXME: This is really heavy handed. It would be a bit better to use an
// SSAUpdater strategy during promotion that was LCSSA aware and reformed		// SSAUpdater strategy during promotion that was LCSSA aware and reformed
// it as it went.		// it as it went.
if (Promoted)		if (Promoted)
formLCSSARecursively(L, DT, LI, SE);		formLCSSARecursively(L, DT, LI, SE);

Changed \|= Promoted;		Changed \|= Promoted;
}		}
}		}

// Check that neither this loop nor its parent have had LCSSA broken. LICM is		// Check that neither this loop nor its parent have had LCSSA broken. LICM is
// specifically moving instructions across the loop boundary and so it is		// specifically moving instructions across the loop boundary and so it is
// especially in need of sanity checking here.		// especially in need of sanity checking here.
assert(L->isLCSSAForm(*DT) && "Loop not left in LCSSA form after LICM!");		assert(L->isLCSSAForm(*DT) && "Loop not left in LCSSA form after LICM!");
assert((!L->getParentLoop() \|\| L->getParentLoop()->isLCSSAForm(*DT)) &&		assert((!L->getParentLoop() \|\| L->getParentLoop()->isLCSSAForm(*DT)) &&
"Parent loop not left in LCSSA form after LICM!");		"Parent loop not left in LCSSA form after LICM!");

// If this loop is nested inside of another one, save the alias information		// If this loop is nested inside of another one, save the alias information
// for when we process the outer loop.		// for when we process the outer loop.
if (L->getParentLoop() && !DeleteAST)		if (CurAST.get() && L->getParentLoop() && !DeleteAST)
LoopToAliasSetMap[L] = std::move(CurAST);		LoopToAliasSetMap[L] = std::move(CurAST);

		if (MSSAU.get() && VerifyMemorySSA)
		MSSAU->getMemorySSA()->verifyMemorySSA();

if (Changed && SE)		if (Changed && SE)
SE->forgetLoopDispositions(L);		SE->forgetLoopDispositions(L);
return Changed;		return Changed;
}		}
		sanjoyUnsubmitted Done Reply Inline Actions Can this be a `std::unique_ptr`? sanjoy: Can this be a `std::unique_ptr`?

/// Walk the specified region of the CFG (defined by all blocks dominated by		/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in reverse depth		/// the specified block, and that are in the current loop) in reverse depth
/// first order w.r.t the DominatorTree. This allows us to visit uses before		/// first order w.r.t the DominatorTree. This allows us to visit uses before
/// definitions, allowing us to sink a loop body in one pass without iteration.		/// definitions, allowing us to sink a loop body in one pass without iteration.
///		///
bool llvm::sinkRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,		bool llvm::sinkRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,
DominatorTree DT, TargetLibraryInfo TLI,		DominatorTree DT, TargetLibraryInfo TLI,
TargetTransformInfo TTI, Loop CurLoop,		TargetTransformInfo TTI, Loop CurLoop,
AliasSetTracker CurAST, ICFLoopSafetyInfo SafetyInfo,		AliasSetTracker CurAST, MemorySSAUpdater MSSAU,
		ICFLoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {

// Verify inputs.		// Verify inputs.
assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&		assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&
CurLoop != nullptr && CurAST && SafetyInfo != nullptr &&		CurLoop != nullptr && SafetyInfo != nullptr &&
"Unexpected input to sinkRegion");		"Unexpected input to sinkRegion.");
		sanjoyUnsubmitted Done Reply Inline Actions Should this assert be checking `CurAST != nullptr ^ (MSSAUpdater != nullptr && MSSA != nullptr)`? sanjoy: Should this assert be checking `CurAST != nullptr ^ (MSSAUpdater != nullptr && MSSA !=…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions At some point I was testing with both enabled, but considering the added compile time when doing that, yes :). asbirlea: At some point I was testing with both enabled, but considering the added compile time when…
		sanjoyUnsubmitted Done Reply Inline Actions Probably better to have two asserts here: `assert((MSSAUpdater == nullptr) == (MSSA == nullptr))` `assert((CurAST != nullptr) ^ (MSSA != nullptr))` sanjoy: Probably better to have two asserts here: - `assert((MSSAUpdater == nullptr) == (MSSA ==…
		assert(((CurAST != nullptr) ^ (MSSAU != nullptr)) &&
		"Either AliasSetTracker or MemorySSA should be initialized.");

// We want to visit children before parents. We will enque all the parents		// We want to visit children before parents. We will enque all the parents
// before their children in the worklist and process the worklist in reverse		// before their children in the worklist and process the worklist in reverse
// order.		// order.
SmallVector<DomTreeNode *, 16> Worklist = collectChildrenInLoop(N, CurLoop);		SmallVector<DomTreeNode *, 16> Worklist = collectChildrenInLoop(N, CurLoop);

bool Changed = false;		bool Changed = false;
for (DomTreeNode *DTN : reverse(Worklist)) {		for (DomTreeNode *DTN : reverse(Worklist)) {
BasicBlock *BB = DTN->getBlock();		BasicBlock *BB = DTN->getBlock();
// Only need to process the contents of this block if it is not part of a		// Only need to process the contents of this block if it is not part of a
// subloop (which would already have been processed).		// subloop (which would already have been processed).
if (inSubLoop(BB, CurLoop, LI))		if (inSubLoop(BB, CurLoop, LI))
continue;		continue;

for (BasicBlock::iterator II = BB->end(); II != BB->begin();) {		for (BasicBlock::iterator II = BB->end(); II != BB->begin();) {
Instruction &I = *--II;		Instruction &I = *--II;

// If the instruction is dead, we would try to sink it because it isn't		// If the instruction is dead, we would try to sink it because it isn't
// used in the loop, instead, just delete it.		// used in the loop, instead, just delete it.
if (isInstructionTriviallyDead(&I, TLI)) {		if (isInstructionTriviallyDead(&I, TLI)) {
LLVM_DEBUG(dbgs() << "LICM deleting dead inst: " << I << '\n');		LLVM_DEBUG(dbgs() << "LICM deleting dead inst: " << I << '\n');
salvageDebugInfo(I);		salvageDebugInfo(I);
++II;		++II;
eraseInstruction(I, *SafetyInfo, CurAST);		eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);
Changed = true;		Changed = true;
continue;		continue;
}		}

// Check to see if we can sink this instruction to the exit blocks		// Check to see if we can sink this instruction to the exit blocks
// of the loop. We can do this if the all users of the instruction are		// of the loop. We can do this if the all users of the instruction are
// outside of the loop. In this case, it doesn't even matter if the		// outside of the loop. In this case, it doesn't even matter if the
// operands of the instruction are loop invariant.		// operands of the instruction are loop invariant.
//		//
bool FreeInLoop = false;		bool FreeInLoop = false;
if (isNotUsedOrFreeInLoop(I, CurLoop, SafetyInfo, TTI, FreeInLoop) &&		if (isNotUsedOrFreeInLoop(I, CurLoop, SafetyInfo, TTI, FreeInLoop) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, true, ORE) &&		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true, ORE) &&
!I.mayHaveSideEffects()) {		!I.mayHaveSideEffects()) {
if (sink(I, LI, DT, CurLoop, SafetyInfo, ORE, FreeInLoop)) {		if (sink(I, LI, DT, CurLoop, SafetyInfo, MSSAU, ORE, FreeInLoop)) {
if (!FreeInLoop) {		if (!FreeInLoop) {
++II;		++II;
eraseInstruction(I, *SafetyInfo, CurAST);		eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);
}		}
Changed = true;		Changed = true;
}		}
}		}
}		}
}		}
		if (MSSAU && VerifyMemorySSA)
		MSSAU->getMemorySSA()->verifyMemorySSA();
return Changed;		return Changed;
}		}

// This is a helper class for hoistRegion to make it able to hoist control flow		// This is a helper class for hoistRegion to make it able to hoist control flow
// in order to be able to hoist phis. The way this works is that we initially		// in order to be able to hoist phis. The way this works is that we initially
// start hoisting to the loop preheader, and when we see a loop invariant branch		// start hoisting to the loop preheader, and when we see a loop invariant branch
// we make note of this. When we then come to hoist an instruction that's		// we make note of this. When we then come to hoist an instruction that's
// conditional on such a branch we duplicate the branch and the relevant control		// conditional on such a branch we duplicate the branch and the relevant control
// flow, then hoist the instruction into the block corresponding to its original		// flow, then hoist the instruction into the block corresponding to its original
// block in the duplicated control flow.		// block in the duplicated control flow.
class ControlFlowHoister {		class ControlFlowHoister {
private:		private:
// Information about the loop we are hoisting from		// Information about the loop we are hoisting from
LoopInfo *LI;		LoopInfo *LI;
DominatorTree *DT;		DominatorTree *DT;
Loop *CurLoop;		Loop *CurLoop;
		MemorySSAUpdater *MSSAU;

// A map of blocks in the loop to the block their instructions will be hoisted		// A map of blocks in the loop to the block their instructions will be hoisted
// to.		// to.
DenseMap<BasicBlock , BasicBlock > HoistDestinationMap;		DenseMap<BasicBlock , BasicBlock > HoistDestinationMap;

// The branches that we can hoist, mapped to the block that marks a		// The branches that we can hoist, mapped to the block that marks a
// convergence point of their control flow.		// convergence point of their control flow.
DenseMap<BranchInst , BasicBlock > HoistableBranches;		DenseMap<BranchInst , BasicBlock > HoistableBranches;

public:		public:
ControlFlowHoister(LoopInfo LI, DominatorTree DT, Loop *CurLoop)		ControlFlowHoister(LoopInfo LI, DominatorTree DT, Loop *CurLoop,
: LI(LI), DT(DT), CurLoop(CurLoop) {}		MemorySSAUpdater *MSSAU)
		: LI(LI), DT(DT), CurLoop(CurLoop), MSSAU(MSSAU) {}

void registerPossiblyHoistableBranch(BranchInst *BI) {		void registerPossiblyHoistableBranch(BranchInst *BI) {
// We can only hoist conditional branches with loop invariant operands.		// We can only hoist conditional branches with loop invariant operands.
if (!ControlFlowHoisting \|\| !BI->isConditional() \|\|		if (!ControlFlowHoisting \|\| !BI->isConditional() \|\|
!CurLoop->hasLoopInvariantOperands(BI))		!CurLoop->hasLoopInvariantOperands(BI))
return;		return;

// The branch destinations need to be in the loop, and we don't gain		// The branch destinations need to be in the loop, and we don't gain
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	BasicBlock getOrCreateHoistedBlock(BasicBlock BB) {

// Create hoisted versions of blocks that currently don't have them		// Create hoisted versions of blocks that currently don't have them
auto CreateHoistedBlock = [&](BasicBlock *Orig) {		auto CreateHoistedBlock = [&](BasicBlock *Orig) {
if (HoistDestinationMap.count(Orig))		if (HoistDestinationMap.count(Orig))
return HoistDestinationMap[Orig];		return HoistDestinationMap[Orig];
BasicBlock *New =		BasicBlock *New =
BasicBlock::Create(C, Orig->getName() + ".licm", Orig->getParent());		BasicBlock::Create(C, Orig->getName() + ".licm", Orig->getParent());
HoistDestinationMap[Orig] = New;		HoistDestinationMap[Orig] = New;
DT->addNewBlock(New, HoistTarget);		DT->addNewBlock(New, HoistTarget);
		sanjoyUnsubmitted Done Reply Inline Actions Just to be clear, we don't have to tell `MSSAU` about these new blocks (and the edges between these new blocks)? sanjoy: Just to be clear, we don't have to tell `MSSAU` about these new blocks (and the edges between…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions `MSSAU` doesn't keep track of blocks with no memory accesses. This code if just duplicating the control flow from inside the loop to outside the loop. `MSSAU` only starts caring when instructions that touch memory are moved in there. It also matters that the correct block predecessors are included in MemoryPhis, so when control flow changes to modify that, we need to update `MSSA`. The 3 lines below address this. asbirlea: `MSSAU` doesn't keep track of blocks with no memory accesses. This code if just duplicating…
if (CurLoop->getParentLoop())		if (CurLoop->getParentLoop())
CurLoop->getParentLoop()->addBasicBlockToLoop(New, *LI);		CurLoop->getParentLoop()->addBasicBlockToLoop(New, *LI);
++NumCreatedBlocks;		++NumCreatedBlocks;
LLVM_DEBUG(dbgs() << "LICM created " << New->getName()		LLVM_DEBUG(dbgs() << "LICM created " << New->getName()
<< " as hoist destination for " << Orig->getName()		<< " as hoist destination for " << Orig->getName()
<< "\n");		<< "\n");
return New;		return New;
};		};
Show All 19 Lines	if (!HoistFalseDest->getTerminator()) {
BranchInst::Create(HoistCommonSucc, HoistFalseDest);		BranchInst::Create(HoistCommonSucc, HoistFalseDest);
}		}

// If BI is being cloned to what was originally the preheader then		// If BI is being cloned to what was originally the preheader then
// HoistCommonSucc will now be the new preheader.		// HoistCommonSucc will now be the new preheader.
if (HoistTarget == InitialPreheader) {		if (HoistTarget == InitialPreheader) {
// Phis in the loop header now need to use the new preheader.		// Phis in the loop header now need to use the new preheader.
InitialPreheader->replaceSuccessorsPhiUsesWith(HoistCommonSucc);		InitialPreheader->replaceSuccessorsPhiUsesWith(HoistCommonSucc);
		if (MSSAU)
		MSSAU->wireOldPredecessorsToNewImmediatePredecessor(
		HoistTarget->getSingleSuccessor(), HoistCommonSucc, {HoistTarget});
// The new preheader dominates the loop header.		// The new preheader dominates the loop header.
DomTreeNode *PreheaderNode = DT->getNode(HoistCommonSucc);		DomTreeNode *PreheaderNode = DT->getNode(HoistCommonSucc);
DomTreeNode *HeaderNode = DT->getNode(CurLoop->getHeader());		DomTreeNode *HeaderNode = DT->getNode(CurLoop->getHeader());
DT->changeImmediateDominator(HeaderNode, PreheaderNode);		DT->changeImmediateDominator(HeaderNode, PreheaderNode);
// The preheader hoist destination is now the new preheader, with the		// The preheader hoist destination is now the new preheader, with the
// exception of the hoist destination of this branch.		// exception of the hoist destination of this branch.
for (auto &Pair : HoistDestinationMap)		for (auto &Pair : HoistDestinationMap)
if (Pair.second == InitialPreheader && Pair.first != BI->getParent())		if (Pair.second == InitialPreheader && Pair.first != BI->getParent())
Show All 14 Lines

/// Walk the specified region of the CFG (defined by all blocks dominated by		/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in depth first		/// the specified block, and that are in the current loop) in depth first
/// order w.r.t the DominatorTree. This allows us to visit definitions before		/// order w.r.t the DominatorTree. This allows us to visit definitions before
/// uses, allowing us to hoist a loop body in one pass without iteration.		/// uses, allowing us to hoist a loop body in one pass without iteration.
///		///
bool llvm::hoistRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,		bool llvm::hoistRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,
DominatorTree DT, TargetLibraryInfo TLI, Loop *CurLoop,		DominatorTree DT, TargetLibraryInfo TLI, Loop *CurLoop,
AliasSetTracker CurAST, ICFLoopSafetyInfo SafetyInfo,		AliasSetTracker CurAST, MemorySSAUpdater MSSAU,
		ICFLoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
// Verify inputs.		// Verify inputs.
assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&		assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&
CurLoop != nullptr && CurAST != nullptr && SafetyInfo != nullptr &&		CurLoop != nullptr && SafetyInfo != nullptr &&
"Unexpected input to hoistRegion");		"Unexpected input to hoistRegion.");
		assert(((CurAST != nullptr) ^ (MSSAU != nullptr)) &&
		"Either AliasSetTracker or MemorySSA should be initialized.");

ControlFlowHoister CFH(LI, DT, CurLoop);		ControlFlowHoister CFH(LI, DT, CurLoop, MSSAU);

// Keep track of instructions that have been hoisted, as they may need to be		// Keep track of instructions that have been hoisted, as they may need to be
// re-hoisted if they end up not dominating all of their uses.		// re-hoisted if they end up not dominating all of their uses.
SmallVector<Instruction *, 16> HoistedInstructions;		SmallVector<Instruction *, 16> HoistedInstructions;

// For PHI hoisting to work we need to hoist blocks before their successors.		// For PHI hoisting to work we need to hoist blocks before their successors.
// We can do this by iterating through the blocks in the loop in reverse		// We can do this by iterating through the blocks in the loop in reverse
// post-order.		// post-order.
Show All 10 Lines	for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) {
Instruction &I = *II++;		Instruction &I = *II++;
// Try constant folding this instruction. If all the operands are		// Try constant folding this instruction. If all the operands are
// constants, it is technically hoistable, but it would be better to		// constants, it is technically hoistable, but it would be better to
// just fold it.		// just fold it.
if (Constant *C = ConstantFoldInstruction(		if (Constant *C = ConstantFoldInstruction(
&I, I.getModule()->getDataLayout(), TLI)) {		&I, I.getModule()->getDataLayout(), TLI)) {
LLVM_DEBUG(dbgs() << "LICM folding inst: " << I << " --> " << *C		LLVM_DEBUG(dbgs() << "LICM folding inst: " << I << " --> " << *C
<< '\n');		<< '\n');
		if (CurAST)
CurAST->copyValue(&I, C);		CurAST->copyValue(&I, C);
		// FIXME MSSA: Such replacements may make accesses unoptimized (D51960).
I.replaceAllUsesWith(C);		I.replaceAllUsesWith(C);
if (isInstructionTriviallyDead(&I, TLI))		if (isInstructionTriviallyDead(&I, TLI))
eraseInstruction(I, *SafetyInfo, CurAST);		eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);
Changed = true;		Changed = true;
continue;		continue;
}		}

// Try hoisting the instruction out to the preheader. We can only do		// Try hoisting the instruction out to the preheader. We can only do
// this if all of the operands of the instruction are loop invariant and		// this if all of the operands of the instruction are loop invariant and
// if it is safe to hoist the instruction.		// if it is safe to hoist the instruction.
// TODO: It may be safe to hoist if we are hoisting to a conditional block		// TODO: It may be safe to hoist if we are hoisting to a conditional block
// and we have accurately duplicated the control flow from the loop header		// and we have accurately duplicated the control flow from the loop header
// to that block.		// to that block.
if (CurLoop->hasLoopInvariantOperands(&I) &&		if (CurLoop->hasLoopInvariantOperands(&I) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, true, ORE) &&		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true, ORE) &&
isSafeToExecuteUnconditionally(		isSafeToExecuteUnconditionally(
I, DT, CurLoop, SafetyInfo, ORE,		I, DT, CurLoop, SafetyInfo, ORE,
CurLoop->getLoopPreheader()->getTerminator())) {		CurLoop->getLoopPreheader()->getTerminator())) {
hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo, ORE);		hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
		MSSAU, ORE);
HoistedInstructions.push_back(&I);		HoistedInstructions.push_back(&I);
Changed = true;		Changed = true;
continue;		continue;
}		}

// Attempt to remove floating point division out of the loop by		// Attempt to remove floating point division out of the loop by
// converting it to a reciprocal multiplication.		// converting it to a reciprocal multiplication.
if (I.getOpcode() == Instruction::FDiv &&		if (I.getOpcode() == Instruction::FDiv &&
CurLoop->isLoopInvariant(I.getOperand(1)) &&		CurLoop->isLoopInvariant(I.getOperand(1)) &&
I.hasAllowReciprocal()) {		I.hasAllowReciprocal()) {
auto Divisor = I.getOperand(1);		auto Divisor = I.getOperand(1);
auto One = llvm::ConstantFP::get(Divisor->getType(), 1.0);		auto One = llvm::ConstantFP::get(Divisor->getType(), 1.0);
auto ReciprocalDivisor = BinaryOperator::CreateFDiv(One, Divisor);		auto ReciprocalDivisor = BinaryOperator::CreateFDiv(One, Divisor);
ReciprocalDivisor->setFastMathFlags(I.getFastMathFlags());		ReciprocalDivisor->setFastMathFlags(I.getFastMathFlags());
SafetyInfo->insertInstructionTo(I.getParent());		SafetyInfo->insertInstructionTo(I.getParent());
ReciprocalDivisor->insertBefore(&I);		ReciprocalDivisor->insertBefore(&I);

auto Product =		auto Product =
BinaryOperator::CreateFMul(I.getOperand(0), ReciprocalDivisor);		BinaryOperator::CreateFMul(I.getOperand(0), ReciprocalDivisor);
Product->setFastMathFlags(I.getFastMathFlags());		Product->setFastMathFlags(I.getFastMathFlags());
SafetyInfo->insertInstructionTo(I.getParent());		SafetyInfo->insertInstructionTo(I.getParent());
Product->insertAfter(&I);		Product->insertAfter(&I);
I.replaceAllUsesWith(Product);		I.replaceAllUsesWith(Product);
eraseInstruction(I, *SafetyInfo, CurAST);		eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);

hoist(*ReciprocalDivisor, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB),		hoist(*ReciprocalDivisor, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB),
SafetyInfo, ORE);		SafetyInfo, MSSAU, ORE);
HoistedInstructions.push_back(ReciprocalDivisor);		HoistedInstructions.push_back(ReciprocalDivisor);
Changed = true;		Changed = true;
continue;		continue;
}		}

using namespace PatternMatch;		using namespace PatternMatch;
if (((I.use_empty() &&		if (((I.use_empty() &&
match(&I, m_Intrinsic<Intrinsic::invariant_start>())) \|\|		match(&I, m_Intrinsic<Intrinsic::invariant_start>())) \|\|
isGuard(&I)) &&		isGuard(&I)) &&
CurLoop->hasLoopInvariantOperands(&I) &&		CurLoop->hasLoopInvariantOperands(&I) &&
SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop) &&		SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop) &&
SafetyInfo->doesNotWriteMemoryBefore(I, CurLoop)) {		SafetyInfo->doesNotWriteMemoryBefore(I, CurLoop)) {
hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo, ORE);		hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
		MSSAU, ORE);
HoistedInstructions.push_back(&I);		HoistedInstructions.push_back(&I);
Changed = true;		Changed = true;
continue;		continue;
}		}

if (PHINode *PN = dyn_cast<PHINode>(&I)) {		if (PHINode *PN = dyn_cast<PHINode>(&I)) {
if (CFH.canHoistPHI(PN)) {		if (CFH.canHoistPHI(PN)) {
// Redirect incoming blocks first to ensure that we create hoisted		// Redirect incoming blocks first to ensure that we create hoisted
// versions of those blocks before we hoist the phi.		// versions of those blocks before we hoist the phi.
for (unsigned int i = 0; i < PN->getNumIncomingValues(); ++i)		for (unsigned int i = 0; i < PN->getNumIncomingValues(); ++i)
PN->setIncomingBlock(		PN->setIncomingBlock(
i, CFH.getOrCreateHoistedBlock(PN->getIncomingBlock(i)));		i, CFH.getOrCreateHoistedBlock(PN->getIncomingBlock(i)));
hoist(*PN, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,		hoist(*PN, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
ORE);		MSSAU, ORE);
assert(DT->dominates(PN, BB) && "Conditional PHIs not expected");		assert(DT->dominates(PN, BB) && "Conditional PHIs not expected");
Changed = true;		Changed = true;
continue;		continue;
}		}
}		}

// Remember possibly hoistable branches so we can actually hoist them		// Remember possibly hoistable branches so we can actually hoist them
// later if needed.		// later if needed.
Show All 26 Lines	for (Instruction *I : reverse(HoistedInstructions)) {
<< HoistPoint->getParent()->getName()		<< HoistPoint->getParent()->getName()
<< ": " << *I << "\n");		<< ": " << *I << "\n");
moveInstructionBefore(I, HoistPoint, *SafetyInfo);		moveInstructionBefore(I, HoistPoint, *SafetyInfo);
HoistPoint = I;		HoistPoint = I;
Changed = true;		Changed = true;
}		}
}		}
}		}
		if (MSSAU && VerifyMemorySSA)
		MSSAU->getMemorySSA()->verifyMemorySSA();

// Now that we've finished hoisting make sure that LI and DT are still valid.		// Now that we've finished hoisting make sure that LI and DT are still
		// valid.
#ifndef NDEBUG		#ifndef NDEBUG
if (Changed) {		if (Changed) {
assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&		assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
"Dominator tree verification failed");		"Dominator tree verification failed");
LI->verify(*DT);		LI->verify(*DT);
}		}
#endif		#endif

▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	bool isHoistableAndSinkableInst(Instruction &I) {
return (isa<LoadInst>(I) \|\| isa<StoreInst>(I) \|\|		return (isa<LoadInst>(I) \|\| isa<StoreInst>(I) \|\|
isa<CallInst>(I) \|\| isa<FenceInst>(I) \|\|		isa<CallInst>(I) \|\| isa<FenceInst>(I) \|\|
isa<BinaryOperator>(I) \|\| isa<CastInst>(I) \|\|		isa<BinaryOperator>(I) \|\| isa<CastInst>(I) \|\|
isa<SelectInst>(I) \|\| isa<GetElementPtrInst>(I) \|\|		isa<SelectInst>(I) \|\| isa<GetElementPtrInst>(I) \|\|
isa<CmpInst>(I) \|\| isa<InsertElementInst>(I) \|\|		isa<CmpInst>(I) \|\| isa<InsertElementInst>(I) \|\|
isa<ExtractElementInst>(I) \|\| isa<ShuffleVectorInst>(I) \|\|		isa<ExtractElementInst>(I) \|\| isa<ShuffleVectorInst>(I) \|\|
isa<ExtractValueInst>(I) \|\| isa<InsertValueInst>(I));		isa<ExtractValueInst>(I) \|\| isa<InsertValueInst>(I));
}		}
/// Return true if all of the alias sets within this AST are known not to		/// Return true if all of the alias sets within this AST are known not to
		sanjoyUnsubmitted Done Reply Inline Actions Update comment? sanjoy: Update comment?
/// contain a Mod.		/// contain a Mod.
bool isReadOnly(AliasSetTracker *CurAST) {		bool isReadOnly(AliasSetTracker CurAST, const MemorySSAUpdater MSSAU,
		const Loop *L) {
		if (CurAST) {
for (AliasSet &AS : *CurAST) {		for (AliasSet &AS : *CurAST) {
if (!AS.isForwardingAliasSet() && AS.isMod()) {		if (!AS.isForwardingAliasSet() && AS.isMod()) {
return false;		return false;
}		}
}		}
return true;		return true;
		} else { /MSSAU/
		for (auto *BB : L->getBlocks())
		if (MSSAU->getMemorySSA()->getBlockDefs(BB))
		return false;
		return true;
		}
		}

		/// Return true if I is the only Instruction with a MemoryAccess in L.
		bool isOnlyMemoryAccess(const Instruction I, const Loop L,
		const MemorySSAUpdater *MSSAU) {
		for (auto *BB : L->getBlocks())
		if (auto *Acc = MSSAU->getMemorySSA()->getBlockAccesses(BB)) {
		if (++Acc->begin() == Acc->end()) { // size = 1
		if (const auto MUD = dyn_cast<MemoryUseOrDef>(&Acc->begin()))
		if (MUD->getMemoryInst() != I)
		return false;
		// if single access is a phi, keep checking.
		} else if (++(++Acc->begin()) == Acc->end()) { // size = 2
		// if one of the two accesses is not a phi, ret false.
		if (!dyn_cast<MemoryPhi>(&*Acc->begin()))
		sanjoyUnsubmitted Done Reply Inline Actions So if there is a memory phi then it has to be `Acc->begin()`? Btw, it might be cleaner to express this as a loop: int nonphi = 0; for (auto X : Acc) { if (is_phi) { continue; } if (X->getInst() != I \|\| nonphi++ == 1) { return false; } } return true; sanjoy: So if there is a memory phi then it has to be `Acc->begin()`? Btw, it might be cleaner to…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions Yes, a block may have a single `MemoryPhi`, and it's always the first one in the list of Defs. SG! Updated with loop. asbirlea: Yes, a block may have a single `MemoryPhi`, and it's always the first one in the list of Defs.
		return false;
		if (const auto MUD = dyn_cast<MemoryUseOrDef>(&(++Acc->begin())))
		if (MUD->getMemoryInst() != I)
		return false;
		} else
		return false;
		}
		return true;
}		}
}		}

bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,		bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
Loop CurLoop, AliasSetTracker CurAST,		Loop CurLoop, AliasSetTracker CurAST,
		MemorySSAUpdater *MSSAU,
bool TargetExecutesOncePerLoop,		bool TargetExecutesOncePerLoop,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
// If we don't understand the instruction, bail early.		// If we don't understand the instruction, bail early.
if (!isHoistableAndSinkableInst(I))		if (!isHoistableAndSinkableInst(I))
return false;		return false;

		MemorySSA *MSSA = MSSAU ? MSSAU->getMemorySSA() : nullptr;

// Loads have extra constraints we have to verify before we can hoist them.		// Loads have extra constraints we have to verify before we can hoist them.
if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {		if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {
if (!LI->isUnordered())		if (!LI->isUnordered())
return false; // Don't sink/hoist volatile or ordered atomic loads!		return false; // Don't sink/hoist volatile or ordered atomic loads!

// Loads from constant memory are always safe to move, even if they end up		// Loads from constant memory are always safe to move, even if they end up
// in the same alias set as something that ends up being modified.		// in the same alias set as something that ends up being modified.
if (AA->pointsToConstantMemory(LI->getOperand(0)))		if (AA->pointsToConstantMemory(LI->getOperand(0)))
return true;		return true;
if (LI->getMetadata(LLVMContext::MD_invariant_load))		if (LI->getMetadata(LLVMContext::MD_invariant_load))
return true;		return true;

if (LI->isAtomic() && !TargetExecutesOncePerLoop)		if (LI->isAtomic() && !TargetExecutesOncePerLoop)
return false; // Don't risk duplicating unordered loads		return false; // Don't risk duplicating unordered loads

// This checks for an invariant.start dominating the load.		// This checks for an invariant.start dominating the load.
if (isLoadInvariantInLoop(LI, DT, CurLoop))		if (isLoadInvariantInLoop(LI, DT, CurLoop))
return true;		return true;

bool Invalidated = pointerInvalidatedByLoop(MemoryLocation::get(LI),		bool Invalidated;
CurAST, CurLoop, AA);		if (CurAST)
		Invalidated = pointerInvalidatedByLoop(MemoryLocation::get(LI), CurAST,
		CurLoop, AA);
		else
		Invalidated = pointerInvalidatedByLoopWithMSSA(
		MSSA, cast<MemoryUse>(MSSA->getMemoryAccess(LI)), CurLoop);
		sanjoyUnsubmitted Done Reply Inline Actions Probably better to leave this un-initialized; that way the compiler will warn if we forget to initialize it in one branch. You could also use a ternary here. sanjoy: Probably better to leave this un-initialized; that way the compiler will warn if we forget to…
// Check loop-invariant address because this may also be a sinkable load		// Check loop-invariant address because this may also be a sinkable load
// whose address is not necessarily loop-invariant.		// whose address is not necessarily loop-invariant.
if (ORE && Invalidated && CurLoop->isLoopInvariant(LI->getPointerOperand()))		if (ORE && Invalidated && CurLoop->isLoopInvariant(LI->getPointerOperand()))
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemarkMissed(		return OptimizationRemarkMissed(
DEBUG_TYPE, "LoadWithLoopInvariantAddressInvalidated", LI)		DEBUG_TYPE, "LoadWithLoopInvariantAddressInvalidated", LI)
<< "failed to move load with loop-invariant address "		<< "failed to move load with loop-invariant address "
"because the loop may invalidate its value";		"because the loop may invalidate its value";
});		});

return !Invalidated;		return !Invalidated;
} else if (CallInst *CI = dyn_cast<CallInst>(&I)) {		} else if (CallInst *CI = dyn_cast<CallInst>(&I)) {
// Don't sink or hoist dbg info; it's legal, but not useful.		// Don't sink or hoist dbg info; it's legal, but not useful.
if (isa<DbgInfoIntrinsic>(I))		if (isa<DbgInfoIntrinsic>(I))
return false;		return false;

// Don't sink calls which can throw.		// Don't sink calls which can throw.
if (CI->mayThrow())		if (CI->mayThrow())
return false;		return false;

using namespace PatternMatch;		using namespace PatternMatch;
if (match(CI, m_Intrinsic<Intrinsic::assume>()))		if (match(CI, m_Intrinsic<Intrinsic::assume>()))
// Assumes don't actually alias anything or throw		// Assumes don't actually alias anything or throw
return true;		return true;

// Handle simple cases by querying alias analysis.		// Handle simple cases by querying alias analysis.
FunctionModRefBehavior Behavior = AA->getModRefBehavior(CI);		FunctionModRefBehavior Behavior = AA->getModRefBehavior(CI);
if (Behavior == FMRB_DoesNotAccessMemory)		if (Behavior == FMRB_DoesNotAccessMemory)
return true;		return true;
if (AliasAnalysis::onlyReadsMemory(Behavior)) {		if (AliasAnalysis::onlyReadsMemory(Behavior)) {
// A readonly argmemonly function only reads from memory pointed to by		// A readonly argmemonly function only reads from memory pointed to by
// it's arguments with arbitrary offsets. If we can prove there are no		// it's arguments with arbitrary offsets. If we can prove there are no
// writes to this memory in the loop, we can hoist or sink.		// writes to this memory in the loop, we can hoist or sink.
if (AliasAnalysis::onlyAccessesArgPointees(Behavior)) {		if (AliasAnalysis::onlyAccessesArgPointees(Behavior)) {
// TODO: expand to writeable arguments		// TODO: expand to writeable arguments
for (Value *Op : CI->arg_operands())		for (Value *Op : CI->arg_operands())
if (Op->getType()->isPointerTy() &&		if (Op->getType()->isPointerTy()) {
pointerInvalidatedByLoop(		bool Invalidated;
		sanjoyUnsubmitted Done Reply Inline Actions Minor nit: I usually prefer leaving variables like `Invalidated` uninitialized; that way if I forget to assign to it on one branch of the `if` I'll get a warning and/or an ASan failure. sanjoy: Minor nit: I usually prefer leaving variables like `Invalidated` uninitialized; that way if I…
		if (CurAST)
		Invalidated = pointerInvalidatedByLoop(
MemoryLocation(Op, LocationSize::unknown(), AAMDNodes()),		MemoryLocation(Op, LocationSize::unknown(), AAMDNodes()),
CurAST, CurLoop, AA))		CurAST, CurLoop, AA);
		else
		Invalidated = pointerInvalidatedByLoopWithMSSA(
		MSSA, cast<MemoryUse>(MSSA->getMemoryAccess(CI)), CurLoop);
		sanjoyUnsubmitted Done Reply Inline Actions We should probably early exit if `!Op->getType()->isPointerTy()` (and assert `Op->getType()->isPointerTy()` in `pointerInvalidatedByLoopWithMSSA` etc. sanjoy: We should probably early exit if `!Op->getType()->isPointerTy()` (and assert `Op->getType()…
		if (Invalidated)
return false;		return false;
		}
return true;		return true;
}		}

// If this call only reads from memory and there are no writes to memory		// If this call only reads from memory and there are no writes to memory
// in the loop, we can hoist or sink the call as appropriate.		// in the loop, we can hoist or sink the call as appropriate.
if (isReadOnly(CurAST))		if (isReadOnly(CurAST, MSSAU, CurLoop))
		sanjoyUnsubmitted Done Reply Inline Actions Minor nit: even though unnecessary, I'd prefer using braces here since for readability. sanjoy: Minor nit: even though unnecessary, I'd prefer using braces here since for readability.
return true;		return true;
}		}

// FIXME: This should use mod/ref information to see if we can hoist or		// FIXME: This should use mod/ref information to see if we can hoist or
// sink the call.		// sink the call.

return false;		return false;
		sanjoyUnsubmitted Done Reply Inline Actions This can probably be a single `llvm::any_of` on `CurLoop->blocks()`. sanjoy: This can probably be a single `llvm::any_of` on `CurLoop->blocks()`.
} else if (auto *FI = dyn_cast<FenceInst>(&I)) {		} else if (auto *FI = dyn_cast<FenceInst>(&I)) {
// Fences alias (most) everything to provide ordering. For the moment,		// Fences alias (most) everything to provide ordering. For the moment,
// just give up if there are any other memory operations in the loop.		// just give up if there are any other memory operations in the loop.
		if (CurAST) {
auto Begin = CurAST->begin();		auto Begin = CurAST->begin();
assert(Begin != CurAST->end() && "must contain FI");		assert(Begin != CurAST->end() && "must contain FI");
if (std::next(Begin) != CurAST->end())		if (std::next(Begin) != CurAST->end())
// constant memory for instance, TODO: handle better		// constant memory for instance, TODO: handle better
return false;		return false;
auto *UniqueI = Begin->getUniqueInstruction();		auto *UniqueI = Begin->getUniqueInstruction();
if (!UniqueI)		if (!UniqueI)
// other memory op, give up		// other memory op, give up
return false;		return false;
(void)FI; //suppress unused variable warning		(void)FI; // suppress unused variable warning
assert(UniqueI == FI && "AS must contain FI");		assert(UniqueI == FI && "AS must contain FI");
return true;		return true;
		} else { // MSSAU
		if (isOnlyMemoryAccess(FI, CurLoop, MSSAU))
		sanjoyUnsubmitted Done Reply Inline Actions Can be `return isOnlyMemoryAccess(FI, CurLoop, MSSAU)`. sanjoy: Can be `return isOnlyMemoryAccess(FI, CurLoop, MSSAU)`.
		return true;
		return false;
		}
} else if (auto *SI = dyn_cast<StoreInst>(&I)) {		} else if (auto *SI = dyn_cast<StoreInst>(&I)) {
if (!SI->isUnordered())		if (!SI->isUnordered())
return false; // Don't sink/hoist volatile or ordered atomic store!		return false; // Don't sink/hoist volatile or ordered atomic store!

// We can only hoist a store that we can prove writes a value which is not		// We can only hoist a store that we can prove writes a value which is not
// read or overwritten within the loop. For those cases, we fallback to		// read or overwritten within the loop. For those cases, we fallback to
// load store promotion instead. TODO: We can extend this to cases where		// load store promotion instead. TODO: We can extend this to cases where
// there is exactly one write to the location and that write dominates an		// there is exactly one write to the location and that write dominates an
// arbitrary number of reads in the loop.		// arbitrary number of reads in the loop.
		if (CurAST) {
auto &AS = CurAST->getAliasSetFor(MemoryLocation::get(SI));		auto &AS = CurAST->getAliasSetFor(MemoryLocation::get(SI));

if (AS.isRef() \|\| !AS.isMustAlias())		if (AS.isRef() \|\| !AS.isMustAlias())
// Quick exit test, handled by the full path below as well.		// Quick exit test, handled by the full path below as well.
return false;		return false;
auto *UniqueI = AS.getUniqueInstruction();		auto *UniqueI = AS.getUniqueInstruction();
if (!UniqueI)		if (!UniqueI)
// other memory op, give up		// other memory op, give up
return false;		return false;
assert(UniqueI == SI && "AS must contain SI");		assert(UniqueI == SI && "AS must contain SI");
return true;		return true;
		} else { // MSSAU
		if (isOnlyMemoryAccess(SI, CurLoop, MSSAU))
		return true;
		if (!EnableLicmCap) {
		auto *Source = MSSA->getSkipSelfWalker()->getClobberingMemoryAccess(SI);
		if (MSSA->isLiveOnEntryDef(Source) \|\|
		!CurLoop->contains(Source->getBlock()))
		return true;
		}
		return false;
		}
}		}

assert(!I.mayReadOrWriteMemory() && "unhandled aliasing");		assert(!I.mayReadOrWriteMemory() && "unhandled aliasing");

// We've established mechanical ability and aliasing, it's up to the caller		// We've established mechanical ability and aliasing, it's up to the caller
// to check fault safety		// to check fault safety
return true;		return true;
}		}
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	if (CurLoop->contains(UI)) {
continue;		continue;
}		}
return false;		return false;
}		}
}		}
return true;		return true;
}		}

static Instruction *		static Instruction *CloneInstructionInExitBlock(
CloneInstructionInExitBlock(Instruction &I, BasicBlock &ExitBlock, PHINode &PN,		Instruction &I, BasicBlock &ExitBlock, PHINode &PN, const LoopInfo *LI,
const LoopInfo *LI,		const LoopSafetyInfo SafetyInfo, MemorySSAUpdater MSSAU) {
const LoopSafetyInfo *SafetyInfo) {
Instruction *New;		Instruction *New;
if (auto *CI = dyn_cast<CallInst>(&I)) {		if (auto *CI = dyn_cast<CallInst>(&I)) {
const auto &BlockColors = SafetyInfo->getBlockColors();		const auto &BlockColors = SafetyInfo->getBlockColors();

// Sinking call-sites need to be handled differently from other		// Sinking call-sites need to be handled differently from other
// instructions. The cloned call-site needs a funclet bundle operand		// instructions. The cloned call-site needs a funclet bundle operand
// appropriate for it's location in the CFG.		// appropriate for it's location in the CFG.
SmallVector<OperandBundleDef, 1> OpBundles;		SmallVector<OperandBundleDef, 1> OpBundles;
Show All 19 Lines	static Instruction *CloneInstructionInExitBlock(
} else {		} else {
New = I.clone();		New = I.clone();
}		}

ExitBlock.getInstList().insert(ExitBlock.getFirstInsertionPt(), New);		ExitBlock.getInstList().insert(ExitBlock.getFirstInsertionPt(), New);
if (!I.getName().empty())		if (!I.getName().empty())
New->setName(I.getName() + ".le");		New->setName(I.getName() + ".le");

		MemoryAccess *OldMemAcc;
		if (MSSAU && (OldMemAcc = MSSAU->getMemorySSA()->getMemoryAccess(&I))) {
		// Create a new MemoryAccess and let MemorySSA set its defining access.
		sanjoyUnsubmitted Done Reply Inline Actions I'm not sure that setting the defining access to nullptr here is correct -- if there is some subtle reason why it is, please add a comment. sanjoy: I'm not sure that setting the defining access to nullptr here is correct -- if there is some…
		MemoryAccess *NewMemAcc = MSSAU->createMemoryAccessInBB(
		New, nullptr, New->getParent(), MemorySSA::Beginning);
		sanjoyUnsubmitted Done Reply Inline Actions Instead of two `dyn_cast_or_null`s how about: if (NewMemAcc) { if (auto MemDef = dyn_cast<MemoryDef>(NewMemAcc)) { ... } else { auto MemUse = cast<MemoryUse>(NewMemAcc); } } ? sanjoy: Instead of two `dyn_cast_or_null`s how about: ``` if (NewMemAcc) { if (auto *MemDef =…
		if (NewMemAcc) {
		if (auto *MemDef = dyn_cast<MemoryDef>(NewMemAcc))
		sanjoyUnsubmitted Done Reply Inline Actions I'd prefer `/RenameUses=/true`. sanjoy: I'd prefer `/RenameUses=/true`.
		MSSAU->insertDef(MemDef, /RenameUses=/true);
		else {
		auto *MemUse = cast<MemoryUse>(NewMemAcc);
		MSSAU->insertUse(MemUse);
		sanjoyUnsubmitted Done Reply Inline Actions Note: I did not revisit this bit since it looks like @george.burgess.iv settled it. sanjoy: Note: I did not revisit this bit since it looks like @george.burgess.iv settled it.
		// This is only called to properly update the defining access.
		if (EnableLicmCap)
		sanjoyUnsubmitted Done Reply Inline Actions I'm not clear on how this works -- IIUC you're calling into `getClobberingMemoryAccess` with `NewMemUse`'s `DefiningAccess` set to `nullptr`, but: Why don't we have to do this for `MemoryDef`s? What if `MSSA->getWalker()` is a `DoNothingMemorySSAWalker`? There are places where we should be crashing, like: `CachingWalker::getClobberingMemoryAccess` line 2078 that calls `getClobberingMemoryAccess(DefiningAccess, Q)` that calls `ClobberWalker::findWalker` which has `if (auto MU = dyn_cast<MemoryUse>(Start))`. sanjoy:* I'm not clear on how this works -- IIUC you're calling into `getClobberingMemoryAccess` with…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions AFAIU: For Defs the DefiningAccess is properly updated. For Uses, there is a DefiningAccess set, but that one is not optimized. When adding an access like in this case, calling getClobberingMemoryAccess will optimize it. If the walker is DoNothingMemorySSAWalker, it won't optimize? Pulling in george.burgess.iv to clarify, it's possible I'm missing something here. asbirlea: AFAIU: For Defs the DefiningAccess is properly updated. For Uses, there is a DefiningAccess set…
		george.burgess.ivUnsubmitted Done Reply Inline Actions IIUC you're calling into getClobberingMemoryAccess with NewMemUse's DefiningAccess set to nullptr I'm assuming you meant `MemUse`, in which case `insertUse` sets up the defining access to something non-null. It's sort of sketchy that we're trying to eagerly update the defining access using the walker, though. Is there a reason that we can't call `MSSA->getWalker()->getClobberingMemoryAccess(MemUse);` when we need optimized clobbers? Doing so should be incredibly cheap when the use has already been optimized, and it lets us be lazy about the work we do (if we don't need an accurate clobber for `MemUse`, we'll just never compute it) george.burgess.iv: > IIUC you're calling into getClobberingMemoryAccess with NewMemUse's DefiningAccess set to…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions I'm assuming you meant MemUse, in which case insertUse sets up the defining access to something non-null. Thanks for clarifying this. It's sort of sketchy that we're trying to eagerly update the defining access using the walker, though. Is there a reason that we can't call MSSA->getWalker()->getClobberingMemoryAccess(MemUse); when we need optimized clobbers? Doing so should be incredibly cheap when the use has already been optimized, and it lets us be lazy about the work we do (if we don't need an accurate clobber for MemUse, we'll just never compute it) The reasoning here is that when making code changes such as hoist/sink, I need accesses to be (reasonably) optimized. I'm not necessarily to looking to get full precision here, I'd be perfectly happy with an API to optimize up to the MemorySSA optimize-uses cap. In LICM.cpp:1652 (and in the future for promotion) I'm relying on getting the defining access, not the walker->clobbering access, allowing for some imprecision in exchange for not paying the lookup time for blocks with large load/store count. I had come across a case where after a hoist or sink, this approach did not work for a very small testcase without using the walker to get clobbering. So the choice I made was to use the walker ahead of time to update accesses I insert when cloning, allowing me to use getDefining afterwards. Hopefully I clarified the intention here :). Do you think there's a better way to address this problem? I'm open to suggestions. @dberlin So, this use renaming is expensive, and i'd like to understand the goal (It requires walking all stores and loads in the dominator subtree rooted at your memory access) It's really meant for when you've inserted defs that may alias existing uses. If your goal is to replace all uses of an old memory access with some new memoryaccess, you want to use the replace API :) The code here is creating a new memory access to handle cloning an access in all exit blocks, where that access may alias existing uses, so I think using the replace API isn't enough. Please let me know if there is a way to handle this differently. asbirlea: > I'm assuming you meant MemUse, in which case insertUse sets up the defining access to…
		george.burgess.ivUnsubmitted Done Reply Inline Actions (To be clear: the addition of the flag resolved my MemorySSA comments. While it's sort of hacky that we need to have it in the first place, the best place to handle "this {block,function} is pathologically big; let's give up instead of taking forever to walk it, and rewalk it, and rewalk it, and ..." is in MSSA, IMO. Until we make MSSA smart enough to actually do that, though, a targeted escape hatch seems like the best way forward to me.) george.burgess.iv: (To be clear: the addition of the flag resolved my MemorySSA comments. While it's sort of hacky…
		MSSAU->getMemorySSA()->getWalker()->getClobberingMemoryAccess(MemUse);
		}
		}
		}

// Build LCSSA PHI nodes for any in-loop operands. Note that this is		// Build LCSSA PHI nodes for any in-loop operands. Note that this is
// particularly cheap because we can rip off the PHI node that we're		// particularly cheap because we can rip off the PHI node that we're
// replacing for the number and blocks of the predecessors.		// replacing for the number and blocks of the predecessors.
// OPT: If this shows up in a profile, we can instead finish sinking all		// OPT: If this shows up in a profile, we can instead finish sinking all
// invariant instructions, and then walk their operands to re-establish		// invariant instructions, and then walk their operands to re-establish
// LCSSA. That will eliminate creating PHI nodes just to nuke them when		// LCSSA. That will eliminate creating PHI nodes just to nuke them when
// sinking bottom-up.		// sinking bottom-up.
for (User::op_iterator OI = New->op_begin(), OE = New->op_end(); OI != OE;		for (User::op_iterator OI = New->op_begin(), OE = New->op_end(); OI != OE;
++OI)		++OI)
if (Instruction OInst = dyn_cast<Instruction>(OI))		if (Instruction OInst = dyn_cast<Instruction>(OI))
if (Loop *OLoop = LI->getLoopFor(OInst->getParent()))		if (Loop *OLoop = LI->getLoopFor(OInst->getParent()))
if (!OLoop->contains(&PN)) {		if (!OLoop->contains(&PN)) {
PHINode *OpPN =		PHINode *OpPN =
PHINode::Create(OInst->getType(), PN.getNumIncomingValues(),		PHINode::Create(OInst->getType(), PN.getNumIncomingValues(),
OInst->getName() + ".lcssa", &ExitBlock.front());		OInst->getName() + ".lcssa", &ExitBlock.front());
for (unsigned i = 0, e = PN.getNumIncomingValues(); i != e; ++i)		for (unsigned i = 0, e = PN.getNumIncomingValues(); i != e; ++i)
OpPN->addIncoming(OInst, PN.getIncomingBlock(i));		OpPN->addIncoming(OInst, PN.getIncomingBlock(i));
*OI = OpPN;		*OI = OpPN;
}		}
return New;		return New;
}		}

static void eraseInstruction(Instruction &I, ICFLoopSafetyInfo &SafetyInfo,		static void eraseInstruction(Instruction &I, ICFLoopSafetyInfo &SafetyInfo,
AliasSetTracker *AST) {		AliasSetTracker AST, MemorySSAUpdater MSSAU) {
if (AST)		if (AST)
AST->deleteValue(&I);		AST->deleteValue(&I);
		if (MSSAU)
		MSSAU->removeMemoryAccess(&I);
SafetyInfo.removeInstruction(&I);		SafetyInfo.removeInstruction(&I);
I.eraseFromParent();		I.eraseFromParent();
}		}

static void moveInstructionBefore(Instruction &I, Instruction &Dest,		static void moveInstructionBefore(Instruction &I, Instruction &Dest,
ICFLoopSafetyInfo &SafetyInfo) {		ICFLoopSafetyInfo &SafetyInfo) {
SafetyInfo.removeInstruction(&I);		SafetyInfo.removeInstruction(&I);
SafetyInfo.insertInstructionTo(Dest.getParent());		SafetyInfo.insertInstructionTo(Dest.getParent());
I.moveBefore(&Dest);		I.moveBefore(&Dest);
}		}

static Instruction *sinkThroughTriviallyReplaceablePHI(		static Instruction *sinkThroughTriviallyReplaceablePHI(
PHINode TPN, Instruction I, LoopInfo *LI,		PHINode TPN, Instruction I, LoopInfo *LI,
SmallDenseMap<BasicBlock , Instruction , 32> &SunkCopies,		SmallDenseMap<BasicBlock , Instruction , 32> &SunkCopies,
const LoopSafetyInfo SafetyInfo, const Loop CurLoop) {		const LoopSafetyInfo SafetyInfo, const Loop CurLoop,
		MemorySSAUpdater *MSSAU) {
assert(isTriviallyReplaceablePHI(TPN, I) &&		assert(isTriviallyReplaceablePHI(TPN, I) &&
"Expect only trivially replaceable PHI");		"Expect only trivially replaceable PHI");
BasicBlock *ExitBlock = TPN->getParent();		BasicBlock *ExitBlock = TPN->getParent();
Instruction *New;		Instruction *New;
auto It = SunkCopies.find(ExitBlock);		auto It = SunkCopies.find(ExitBlock);
if (It != SunkCopies.end())		if (It != SunkCopies.end())
New = It->second;		New = It->second;
else		else
New = SunkCopies[ExitBlock] =		New = SunkCopies[ExitBlock] = CloneInstructionInExitBlock(
CloneInstructionInExitBlock(I, ExitBlock, *TPN, LI, SafetyInfo);		I, ExitBlock, *TPN, LI, SafetyInfo, MSSAU);
return New;		return New;
}		}

static bool canSplitPredecessors(PHINode PN, LoopSafetyInfo SafetyInfo) {		static bool canSplitPredecessors(PHINode PN, LoopSafetyInfo SafetyInfo) {
BasicBlock *BB = PN->getParent();		BasicBlock *BB = PN->getParent();
if (!BB->canSplitPredecessors())		if (!BB->canSplitPredecessors())
return false;		return false;
// It's not impossible to split EHPad blocks, but if BlockColors already exist		// It's not impossible to split EHPad blocks, but if BlockColors already exist
// it require updating BlockColors for all offspring blocks accordingly. By		// it require updating BlockColors for all offspring blocks accordingly. By
// skipping such corner case, we can make updating BlockColors after splitting		// skipping such corner case, we can make updating BlockColors after splitting
// predecessor fairly simple.		// predecessor fairly simple.
if (!SafetyInfo->getBlockColors().empty() && BB->getFirstNonPHI()->isEHPad())		if (!SafetyInfo->getBlockColors().empty() && BB->getFirstNonPHI()->isEHPad())
return false;		return false;
for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {		for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
BasicBlock BBPred = PI;		BasicBlock BBPred = PI;
if (isa<IndirectBrInst>(BBPred->getTerminator()))		if (isa<IndirectBrInst>(BBPred->getTerminator()))
return false;		return false;
}		}
return true;		return true;
}		}

static void splitPredecessorsOfLoopExit(PHINode PN, DominatorTree DT,		static void splitPredecessorsOfLoopExit(PHINode PN, DominatorTree DT,
LoopInfo LI, const Loop CurLoop,		LoopInfo LI, const Loop CurLoop,
LoopSafetyInfo *SafetyInfo) {		LoopSafetyInfo *SafetyInfo,
		MemorySSAUpdater *MSSAU) {
#ifndef NDEBUG		#ifndef NDEBUG
SmallVector<BasicBlock *, 32> ExitBlocks;		SmallVector<BasicBlock *, 32> ExitBlocks;
CurLoop->getUniqueExitBlocks(ExitBlocks);		CurLoop->getUniqueExitBlocks(ExitBlocks);
SmallPtrSet<BasicBlock *, 32> ExitBlockSet(ExitBlocks.begin(),		SmallPtrSet<BasicBlock *, 32> ExitBlockSet(ExitBlocks.begin(),
ExitBlocks.end());		ExitBlocks.end());
#endif		#endif
BasicBlock *ExitBB = PN->getParent();		BasicBlock *ExitBB = PN->getParent();
assert(ExitBlockSet.count(ExitBB) && "Expect the PHI is in an exit block.");		assert(ExitBlockSet.count(ExitBB) && "Expect the PHI is in an exit block.");
Show All 33 Lines	#endif
const auto &BlockColors = SafetyInfo->getBlockColors();		const auto &BlockColors = SafetyInfo->getBlockColors();
SmallSetVector<BasicBlock *, 8> PredBBs(pred_begin(ExitBB), pred_end(ExitBB));		SmallSetVector<BasicBlock *, 8> PredBBs(pred_begin(ExitBB), pred_end(ExitBB));
while (!PredBBs.empty()) {		while (!PredBBs.empty()) {
BasicBlock PredBB = PredBBs.begin();		BasicBlock PredBB = PredBBs.begin();
assert(CurLoop->contains(PredBB) &&		assert(CurLoop->contains(PredBB) &&
"Expect all predecessors are in the loop");		"Expect all predecessors are in the loop");
if (PN->getBasicBlockIndex(PredBB) >= 0) {		if (PN->getBasicBlockIndex(PredBB) >= 0) {
BasicBlock *NewPred = SplitBlockPredecessors(		BasicBlock *NewPred = SplitBlockPredecessors(
ExitBB, PredBB, ".split.loop.exit", DT, LI, nullptr, true);		ExitBB, PredBB, ".split.loop.exit", DT, LI, MSSAU, true);
// Since we do not allow splitting EH-block with BlockColors in		// Since we do not allow splitting EH-block with BlockColors in
// canSplitPredecessors(), we can simply assign predecessor's color to		// canSplitPredecessors(), we can simply assign predecessor's color to
// the new block.		// the new block.
if (!BlockColors.empty())		if (!BlockColors.empty())
// Grab a reference to the ColorVector to be inserted before getting the		// Grab a reference to the ColorVector to be inserted before getting the
// reference to the vector we are copying because inserting the new		// reference to the vector we are copying because inserting the new
// element in BlockColors might cause the map to be reallocated.		// element in BlockColors might cause the map to be reallocated.
SafetyInfo->copyColors(NewPred, PredBB);		SafetyInfo->copyColors(NewPred, PredBB);
}		}
PredBBs.remove(PredBB);		PredBBs.remove(PredBB);
}		}
}		}

/// When an instruction is found to only be used outside of the loop, this		/// When an instruction is found to only be used outside of the loop, this
/// function moves it to the exit blocks and patches up SSA form as needed.		/// function moves it to the exit blocks and patches up SSA form as needed.
/// This method is guaranteed to remove the original instruction from its		/// This method is guaranteed to remove the original instruction from its
/// position, and may either delete it or move it to outside of the loop.		/// position, and may either delete it or move it to outside of the loop.
///		///
static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,		static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,
const Loop CurLoop, ICFLoopSafetyInfo SafetyInfo,		const Loop CurLoop, ICFLoopSafetyInfo SafetyInfo,
OptimizationRemarkEmitter *ORE, bool FreeInLoop) {		MemorySSAUpdater MSSAU, OptimizationRemarkEmitter ORE,
		bool FreeInLoop) {
LLVM_DEBUG(dbgs() << "LICM sinking instruction: " << I << "\n");		LLVM_DEBUG(dbgs() << "LICM sinking instruction: " << I << "\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "InstSunk", &I)		return OptimizationRemark(DEBUG_TYPE, "InstSunk", &I)
<< "sinking " << ore::NV("Inst", &I);		<< "sinking " << ore::NV("Inst", &I);
});		});
bool Changed = false;		bool Changed = false;
if (isa<LoadInst>(I))		if (isa<LoadInst>(I))
++NumMovedLoads;		++NumMovedLoads;
Show All 35 Lines	for (Value::user_iterator UI = I.user_begin(), UE = I.user_end(); UI != UE;) {
if (isTriviallyReplaceablePHI(*PN, I))		if (isTriviallyReplaceablePHI(*PN, I))
continue;		continue;

if (!canSplitPredecessors(PN, SafetyInfo))		if (!canSplitPredecessors(PN, SafetyInfo))
return Changed;		return Changed;

// Split predecessors of the PHI so that we can make users trivially		// Split predecessors of the PHI so that we can make users trivially
// replaceable.		// replaceable.
splitPredecessorsOfLoopExit(PN, DT, LI, CurLoop, SafetyInfo);		splitPredecessorsOfLoopExit(PN, DT, LI, CurLoop, SafetyInfo, MSSAU);

// Should rebuild the iterators, as they may be invalidated by		// Should rebuild the iterators, as they may be invalidated by
// splitPredecessorsOfLoopExit().		// splitPredecessorsOfLoopExit().
UI = I.user_begin();		UI = I.user_begin();
UE = I.user_end();		UE = I.user_end();
}		}

if (VisitedUsers.empty())		if (VisitedUsers.empty())
Show All 18 Lines	for (auto *UI : Users) {

if (CurLoop->contains(User))		if (CurLoop->contains(User))
continue;		continue;

PHINode *PN = cast<PHINode>(User);		PHINode *PN = cast<PHINode>(User);
assert(ExitBlockSet.count(PN->getParent()) &&		assert(ExitBlockSet.count(PN->getParent()) &&
"The LCSSA PHI is not in an exit block!");		"The LCSSA PHI is not in an exit block!");
// The PHI must be trivially replaceable.		// The PHI must be trivially replaceable.
Instruction *New = sinkThroughTriviallyReplaceablePHI(PN, &I, LI, SunkCopies,		Instruction *New = sinkThroughTriviallyReplaceablePHI(
SafetyInfo, CurLoop);		PN, &I, LI, SunkCopies, SafetyInfo, CurLoop, MSSAU);
PN->replaceAllUsesWith(New);		PN->replaceAllUsesWith(New);
eraseInstruction(PN, SafetyInfo, nullptr);		eraseInstruction(PN, SafetyInfo, nullptr, nullptr);
Changed = true;		Changed = true;
}		}
return Changed;		return Changed;
}		}

/// When an instruction is found to only use loop invariant operands that		/// When an instruction is found to only use loop invariant operands that
/// is safe to hoist, this instruction is called to do the dirty work.		/// is safe to hoist, this instruction is called to do the dirty work.
///		///
static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,		BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,
OptimizationRemarkEmitter *ORE) {		MemorySSAUpdater MSSAU, OptimizationRemarkEmitter ORE) {
LLVM_DEBUG(dbgs() << "LICM hoisting to " << Dest->getName() << ": " << I		LLVM_DEBUG(dbgs() << "LICM hoisting to " << Dest->getName() << ": " << I
<< "\n");		<< "\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "Hoisted", &I) << "hoisting "		return OptimizationRemark(DEBUG_TYPE, "Hoisted", &I) << "hoisting "
<< ore::NV("Inst", &I);		<< ore::NV("Inst", &I);
});		});

// Metadata can be dependent on conditions we are hoisting above.		// Metadata can be dependent on conditions we are hoisting above.
// Conservatively strip all metadata on the instruction unless we were		// Conservatively strip all metadata on the instruction unless we were
// guaranteed to execute I if we entered the loop, in which case the metadata		// guaranteed to execute I if we entered the loop, in which case the metadata
// is valid in the loop preheader.		// is valid in the loop preheader.
if (I.hasMetadataOtherThanDebugLoc() &&		if (I.hasMetadataOtherThanDebugLoc() &&
// The check on hasMetadataOtherThanDebugLoc is to prevent us from burning		// The check on hasMetadataOtherThanDebugLoc is to prevent us from burning
// time in isGuaranteedToExecute if we don't actually have anything to		// time in isGuaranteedToExecute if we don't actually have anything to
// drop. It is a compile time optimization, not required for correctness.		// drop. It is a compile time optimization, not required for correctness.
!SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop))		!SafetyInfo->isGuaranteedToExecute(I, DT, CurLoop))
I.dropUnknownNonDebugMetadata();		I.dropUnknownNonDebugMetadata();

if (isa<PHINode>(I))		if (isa<PHINode>(I))
// Move the new node to the end of the phi list in the destination block.		// Move the new node to the end of the phi list in the destination block.
moveInstructionBefore(I, Dest->getFirstNonPHI(), SafetyInfo);		moveInstructionBefore(I, Dest->getFirstNonPHI(), SafetyInfo);
else		else
// Move the new node to the destination block, before its terminator.		// Move the new node to the destination block, before its terminator.
		sanjoyUnsubmitted Done Reply Inline Actions Can this be `cast_or_null`? sanjoy: Can this be `cast_or_null`?
moveInstructionBefore(I, Dest->getTerminator(), SafetyInfo);		moveInstructionBefore(I, Dest->getTerminator(), SafetyInfo);
		if (MSSAU) {
		// If moving, I just moved a load or store, so update MemorySSA.
		MemoryUseOrDef *OldMemAcc = cast_or_null<MemoryUseOrDef>(
		MSSAU->getMemorySSA()->getMemoryAccess(&I));
		sanjoyUnsubmitted Done Reply Inline Actions I not convinced this is a good idea -- it looks like we're coding to the implementation of `getClobberingMemoryAccess` here instead of coding to its interface. But if @george.burgess.iv is fine with it, I am too. sanjoy: I not convinced this is a good idea -- it looks like we're coding to the implementation of…
		sanjoyUnsubmitted Done Reply Inline Actions It looks like `MemorySSA::CachingWalker::getClobberingMemoryAccess` has an early exit if `OldMemAcc` was optimized. Are we okay not updating the defining access in that case? sanjoy: It looks like `MemorySSA::CachingWalker::getClobberingMemoryAccess` has an early exit if…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions If it's been optimized, then the defining access should have been updated, and viceversa. @george.burgess.iv: Could you clarify this please? I don't see resetOptimized being called on any path for the call chain starting with moreToPlace->insertDef, though it would make sense for this to happen. Am I missing something here? asbirlea: If it's been optimized, then the defining access should have been updated, and viceversa.
		george.burgess.ivUnsubmitted Done Reply Inline Actions I not convinced this is a good idea -- it looks like we're coding to the implementation of getClobberingMemoryAccess here instead of coding to its interface. But if @george.burgess.iv is fine with it, I am too In general, I agree, but Walkers are a sticky concept. The model we've sold people is "always call `getWalker()->getClobberingMemoryAddress(MA)` if you need the optimized access; the walker will handle making repeated calls fast," and we're specifically calling this on `MSSA->getWalker()`. I'll try to find a way to structure MSSA so that this guarantee is more obvious. Am I missing something here? Our theoretical `isOptimized` bit depends on two things: being flipped by a call to `setOptimized`, and either our defining access or optimized access having the same ID as it did when `setOptimized` was called. (...It also depends on stores not appearing out of thin air ;) ) So, when you move a Def, all Uses are implicitly invalided by the `replaceAllUsesWith` (since their defining access no longer has the same ID as before), and the Def's cached clobber is invalidated when you call `setDefiningAccess` on it. Similarly, when you move a Use, we just set its defining access to whatever's closest. So, its cache will be invalidated (or not, if you move it right under its old clobber. But then the optimized use is still correct). What I'm unsure about is what happens what happens to `Def`s optimized to a `Def` when said optimized-to `Def` is moved. I feel like I asked this during the original reviews for the caching/updater stuff, but I don't remember the answer. I'll do archaeology and get back to you. :) (If the Def/Def case is uninteresting to this pass, feel free to not block this patch on my search) george.burgess.iv: > I not convinced this is a good idea -- it looks like we're coding to the implementation of…
		george.burgess.ivUnsubmitted Done Reply Inline Actions OK, so assuming the fixes stick, https://bugs.llvm.org/show_bug.cgi?id=36529 should fix any def/def craziness we may see here. Moving things around above the defs should also be fine, since if you have a Def A which has an "optimized" clobber B and you add a clobber between them, that sounds like a functional change (or, if nothing can observe that clobber, a waste of CPU time ;) ). Similarly, if B is moved such that a pre-existing clobber should become A's clobber, that sounds like a functional change. I think this resolves all the questions. Please let me know if not. george.burgess.iv: OK, so assuming the fixes stick, https://bugs.llvm.org/show_bug.cgi?id=36529 should fix any…
		if (OldMemAcc) {
		MSSAU->moveToPlace(OldMemAcc, Dest, MemorySSA::End);
		// This is only called to properly update the defining access.
		if (EnableLicmCap)
		MSSAU->getMemorySSA()->getWalker()->getClobberingMemoryAccess(
		OldMemAcc);
		}
		}

// Do not retain debug locations when we are moving instructions to different		// Do not retain debug locations when we are moving instructions to different
// basic blocks, because we want to avoid jumpy line tables. Calls, however,		// basic blocks, because we want to avoid jumpy line tables. Calls, however,
// need to retain their debug locs because they may be inlined.		// need to retain their debug locs because they may be inlined.
// FIXME: How do we retain source locations without causing poor debugging		// FIXME: How do we retain source locations without causing poor debugging
// behavior?		// behavior?
if (!isa<CallInst>(I))		if (!isa<CallInst>(I))
I.setDebugLoc(DebugLoc());		I.setDebugLoc(DebugLoc());
▲ Show 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	bool llvm::promoteLoopAccessesToScalars(
SSA.AddAvailableValue(Preheader, PreheaderLoad);		SSA.AddAvailableValue(Preheader, PreheaderLoad);

// Rewrite all the loads in the loop and remember all the definitions from		// Rewrite all the loads in the loop and remember all the definitions from
// stores in the loop.		// stores in the loop.
Promoter.run(LoopUses);		Promoter.run(LoopUses);

// If the SSAUpdater didn't use the load in the preheader, just zap it now.		// If the SSAUpdater didn't use the load in the preheader, just zap it now.
if (PreheaderLoad->use_empty())		if (PreheaderLoad->use_empty())
eraseInstruction(PreheaderLoad, SafetyInfo, CurAST);		eraseInstruction(PreheaderLoad, SafetyInfo, CurAST, nullptr);

return true;		return true;
}		}

/// Returns an owning pointer to an alias set which incorporates aliasing info		/// Returns an owning pointer to an alias set which incorporates aliasing info
/// from L and all subloops of L.		/// from L and all subloops of L.
/// FIXME: In new pass manager, there is no helper function to handle loop		/// FIXME: In new pass manager, there is no helper function to handle loop
/// analysis such as cloneBasicBlockAnalysis, so the AST needs to be recomputed		/// analysis such as cloneBasicBlockAnalysis, so the AST needs to be recomputed
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
<< *(MemLoc.Ptr) << "\n");		<< *(MemLoc.Ptr) << "\n");
return true;		return true;
}		}
}		}
LLVM_DEBUG(dbgs() << "Aliasing okay for " << *(MemLoc.Ptr) << "\n");		LLVM_DEBUG(dbgs() << "Aliasing okay for " << *(MemLoc.Ptr) << "\n");
return false;		return false;
}		}

		static bool pointerInvalidatedByLoopWithMSSA(MemorySSA MSSA, MemoryUse MU,
		Loop *CurLoop) {
		MemoryAccess *Source;
		sanjoyUnsubmitted Done Reply Inline Actions How about call it `isVolatileLoadOrLiveOnEntry`? sanjoy: How about call it `isVolatileLoadOrLiveOnEntry`?
		// See declaration of EnableLicmCap for usage details.
		if (EnableLicmCap)
		sanjoyUnsubmitted Done Reply Inline Actions The usual idiom I've seen for this is: if (!Processed.insert(Acc).second) return false; sanjoy: The usual idiom I've seen for this is: ``` if (!Processed.insert(Acc).second) return false…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions Code in question removed. asbirlea: Code in question removed.
		Source = MU->getDefiningAccess();
		sanjoyUnsubmitted Done Reply Inline Actions Should we be returning `true` in this recursive case? IE if we have a `MemoryPhi` that has one of its inputs as itself and the other input as live-on-entry then shouldn't that be equivalent to just a live-on-entry? sanjoy: Should we be returning `true` in this recursive case? IE if we have a `MemoryPhi` that has one…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions Right, but code in question removed. asbirlea: Right, but code in question removed.
		else
		Source = MSSA->getSkipSelfWalker()->getClobberingMemoryAccess(MU);
		return !MSSA->isLiveOnEntryDef(Source) &&
		sanjoyUnsubmitted Done Reply Inline Actions If `MUD` is always guaranteed to be a use then this should be `cast_or_null`. Otherwise please update the comment. sanjoy: If `MUD` is always guaranteed to be a use then this should be `cast_or_null`. Otherwise please…
		CurLoop->contains(Source->getBlock());
		}

		sanjoyUnsubmitted Done Reply Inline Actions I'd avoid testing for `MemoryDef` early -- instead I'd suggest doing this as: if (auto Phi = dyn_cast_or_null<MemoryPhi>(Acc)) { } else if (auto Def = dyn_cast_or_null<MemoryDef>(Acc)) { } sanjoy: I'd avoid testing for `MemoryDef` early -- instead I'd suggest doing this as: ``` if (auto…
/// Little predicate that returns true if the specified basic block is in		/// Little predicate that returns true if the specified basic block is in
/// a subloop of the current one, not the current one itself.		/// a subloop of the current one, not the current one itself.
///		///
static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI) {		static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI) {
		sanjoyUnsubmitted Done Reply Inline Actions How about a range for on `Phi->operands()`? Or even better -- `return llvm::all_of(Phi->operands(), <lambda>);` sanjoy: How about a range for on `Phi->operands()`? Or even better -- `return llvm::all_of(Phi…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions If using `Phi->operands()` I need to re-add the conversion from Use to MemoryAccess hidden in the macro in getIncomingValue(). New code would be: return llvm::all_of(Phi->operands(), [MSSA, &Processed](const Use &Arg) { return isVolatileLoadOrLiveOnEntry( cast_or_null<MemoryAccess>(Arg.get()), MSSA, Processed); }); Does this look better than existing? Scratch that, code in question removed. asbirlea: If using `Phi->operands()` I need to re-add the conversion from Use to MemoryAccess hidden in…
assert(CurLoop->contains(BB) && "Only valid if BB is IN the loop");		assert(CurLoop->contains(BB) && "Only valid if BB is IN the loop");
		sanjoyUnsubmitted Done Reply Inline Actions Stray comment? sanjoy: Stray comment?
return LI->getLoopFor(BB) != CurLoop;		return LI->getLoopFor(BB) != CurLoop;
}		}
		sanjoyUnsubmitted Done Reply Inline Actions LLVM style is avoiding braces for one liners. You could also fold the assignment into the condition, like `if (MemoryUseOrDef AccI = ...)` sanjoy:* LLVM style is avoiding braces for one liners. You could also fold the assignment into the…
		sanjoyUnsubmitted Done Reply Inline Actions This can just be `if (MemoryUse MU = dyn_cast_or_null<MemoryUse>(MUD))` sanjoy:* This can just be `if (MemoryUse *MU = dyn_cast_or_null<MemoryUse>(MUD))`
		sanjoyUnsubmitted Done Reply Inline Actions s/if/If/ sanjoy: s/if/If/
		sanjoyUnsubmitted Done Reply Inline Actions This seems fishy -- if `volatile` loads do not invalidate pointers then why are they treated as defs by MSSA? sanjoy: This seems fishy -- if `volatile` loads do not invalidate pointers then why are they treated as…
		asbirleaAuthorUnsubmitted Done Reply Inline Actions Well, because they should be treated as defs. Motivation for special casing this was test/Transforms/LICM/volatile-alias.ll. I updated the code to no longer consider volatile loads acceptable, meaning test in question will fail. And I believe that's the right thing. asbirlea: Well, because they should be treated as defs. Motivation for special casing this was…
		sanjoyUnsubmitted Done Reply Inline Actions Is this correct even if defining access for `Def` is a store in the loop? sanjoy: Is this correct even if defining access for `Def` is a store in the loop?
		asbirleaAuthorUnsubmitted Done Reply Inline Actions Code updated, see comment below. asbirlea: Code updated, see comment below.
		sanjoyUnsubmitted Done Reply Inline Actions Minor nit: probably cleaner to use `if (MSSA->isLiveOnEntryDef(Source) \|\| !CurLoop->contains(Source->getBlock()))` here. sanjoy: Minor nit: probably cleaner to use `if (MSSA->isLiveOnEntryDef(Source) \|\| !CurLoop->contains…

lib/Transforms/Scalar/LoopSink.cpp

Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	static bool sinkLoopInvariantInstructions(Loop &L, AAResults &AA, LoopInfo &LI,
// Traverse preheader's instructions in reverse order becaue if A depends		// Traverse preheader's instructions in reverse order becaue if A depends
// on B (A appears after B), A needs to be sinked first before B can be		// on B (A appears after B), A needs to be sinked first before B can be
// sinked.		// sinked.
for (auto II = Preheader->rbegin(), E = Preheader->rend(); II != E;) {		for (auto II = Preheader->rbegin(), E = Preheader->rend(); II != E;) {
Instruction I = &II++;		Instruction I = &II++;
// No need to check for instruction's operands are loop invariant.		// No need to check for instruction's operands are loop invariant.
assert(L.hasLoopInvariantOperands(I) &&		assert(L.hasLoopInvariantOperands(I) &&
"Insts in a loop's preheader should have loop invariant operands!");		"Insts in a loop's preheader should have loop invariant operands!");
if (!canSinkOrHoistInst(*I, &AA, &DT, &L, &CurAST, false))		if (!canSinkOrHoistInst(*I, &AA, &DT, &L, &CurAST, nullptr, false))
		sanjoyUnsubmitted Done Reply Inline Actions How about adding `/MemorySSAUpdater=/nullptr` etc. here? sanjoy: How about adding `/MemorySSAUpdater=/nullptr` etc. here?
continue;		continue;
if (sinkInstruction(L, *I, ColdLoopBBs, LoopBlockNumber, LI, DT, BFI))		if (sinkInstruction(L, *I, ColdLoopBBs, LoopBlockNumber, LI, DT, BFI))
Changed = true;		Changed = true;
}		}

if (Changed && SE)		if (Changed && SE)
SE->forgetLoopDispositions(&L);		SE->forgetLoopDispositions(&L);
return Changed;		return Changed;
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

test/Transforms/LICM/argmemonly-call.ll

	; RUN: opt -S -basicaa -licm -licm-n2-threshold=0 %s \| FileCheck %s			; RUN: opt -S -basicaa -licm -licm-n2-threshold=0 %s \| FileCheck %s
	; RUN: opt -licm -basicaa -licm-n2-threshold=200 < %s -S \| FileCheck %s --check-prefix=ALIAS-N2			; RUN: opt -licm -basicaa -licm-n2-threshold=200 < %s -S \| FileCheck %s --check-prefix=ALIAS-N2
	; RUN: opt -aa-pipeline=basic-aa -licm-n2-threshold=0 -passes='require<aa>,require<targetir>,require<scalar-evolution>,require<opt-remark-emit>,loop(licm)' < %s -S \| FileCheck %s			; RUN: opt -aa-pipeline=basic-aa -licm-n2-threshold=0 -passes='require<aa>,require<targetir>,require<scalar-evolution>,require<opt-remark-emit>,loop(licm)' < %s -S \| FileCheck %s
	; RUN: opt -aa-pipeline=basic-aa -licm-n2-threshold=200 -passes='require<aa>,require<targetir>,require<scalar-evolution>,require<opt-remark-emit>,loop(licm)' < %s -S \| FileCheck %s --check-prefix=ALIAS-N2			; RUN: opt -aa-pipeline=basic-aa -licm-n2-threshold=200 -passes='require<aa>,require<targetir>,require<scalar-evolution>,require<opt-remark-emit>,loop(licm)' < %s -S \| FileCheck %s --check-prefix=ALIAS-N2
				; RUN: opt -S -basicaa -licm -licm-n2-threshold=0 -enable-mssa-loop-dependency=true -verify-memoryssa %s \| FileCheck %s --check-prefix=ALIAS-N2

	declare i32 @foo() readonly argmemonly nounwind			declare i32 @foo() readonly argmemonly nounwind
	declare i32 @foo2() readonly nounwind			declare i32 @foo2() readonly nounwind
	declare i32 @bar(i32* %loc2) readonly argmemonly nounwind			declare i32 @bar(i32* %loc2) readonly argmemonly nounwind

	define void @test(i32* %loc) {			define void @test(i32* %loc) {
	; CHECK-LABEL: @test			; CHECK-LABEL: @test
	; CHECK: @foo			; CHECK: @foo
	; CHECK-LABEL: loop:			; CHECK-LABEL: loop:
				; ALIAS-N2-LABEL: @test
				; ALIAS-N2: @foo
				; ALIAS-N2-LABEL: loop:
	br label %loop			br label %loop

	loop:			loop:
	%res = call i32 @foo()			%res = call i32 @foo()
	store i32 %res, i32* %loc			store i32 %res, i32* %loc
	br label %loop			br label %loop
	}			}

	; Negative test: show argmemonly is required			; Negative test: show argmemonly is required
	define void @test_neg(i32* %loc) {			define void @test_neg(i32* %loc) {
	; CHECK-LABEL: @test_neg			; CHECK-LABEL: @test_neg
	; CHECK-LABEL: loop:			; CHECK-LABEL: loop:
	; CHECK: @foo			; CHECK: @foo
				; ALIAS-N2-LABEL: @test_neg
				; ALIAS-N2-LABEL: loop:
				; ALIAS-N2: @foo
	br label %loop			br label %loop

	loop:			loop:
	%res = call i32 @foo2()			%res = call i32 @foo2()
	store i32 %res, i32* %loc			store i32 %res, i32* %loc
	br label %loop			br label %loop
	}			}

	define void @test2(i32* noalias %loc, i32* noalias %loc2) {			define void @test2(i32* noalias %loc, i32* noalias %loc2) {
	; CHECK-LABEL: @test2			; CHECK-LABEL: @test2
	; CHECK: @bar			; CHECK: @bar
	; CHECK-LABEL: loop:			; CHECK-LABEL: loop:
				; ALIAS-N2-LABEL: @test2
				; ALIAS-N2: @bar
				; ALIAS-N2-LABEL: loop:
	br label %loop			br label %loop

	loop:			loop:
	%res = call i32 @bar(i32* %loc2)			%res = call i32 @bar(i32* %loc2)
	store i32 %res, i32* %loc			store i32 %res, i32* %loc
	br label %loop			br label %loop
	}			}

	; Negative test: %might clobber gep			; Negative test: %might clobber gep
	define void @test3(i32* %loc) {			define void @test3(i32* %loc) {
	; CHECK-LABEL: @test3			; CHECK-LABEL: @test3
	; CHECK-LABEL: loop:			; CHECK-LABEL: loop:
	; CHECK: @bar			; CHECK: @bar
				; ALIAS-N2-LABEL: @test3
				; ALIAS-N2-LABEL: loop:
				; ALIAS-N2: @bar
	br label %loop			br label %loop

	loop:			loop:
	%res = call i32 @bar(i32* %loc)			%res = call i32 @bar(i32* %loc)
	%gep = getelementptr i32, i32 *%loc, i64 1000000			%gep = getelementptr i32, i32 *%loc, i64 1000000
	store i32 %res, i32* %gep			store i32 %res, i32* %gep
	br label %loop			br label %loop
	}			}


	; Negative test: %loc might alias %loc2			; Negative test: %loc might alias %loc2
	define void @test4(i32* %loc, i32* %loc2) {			define void @test4(i32* %loc, i32* %loc2) {
	; CHECK-LABEL: @test4			; CHECK-LABEL: @test4
	; CHECK-LABEL: loop:			; CHECK-LABEL: loop:
	; CHECK: @bar			; CHECK: @bar
				; ALIAS-N2-LABEL: @test4
				; ALIAS-N2-LABEL: loop:
				; ALIAS-N2: @bar
	br label %loop			br label %loop

	loop:			loop:
	%res = call i32 @bar(i32* %loc2)			%res = call i32 @bar(i32* %loc2)
	store i32 %res, i32* %loc			store i32 %res, i32* %loc
	br label %loop			br label %loop
	}			}

	declare i32 @foo_new(i32*) readonly			declare i32 @foo_new(i32*) readonly
	; With the default AST mechanism used by LICM for alias analysis,			; With the default AST mechanism used by LICM for alias analysis,
	; we clump foo_new with bar.			; we clump foo_new with bar.
	; With the N2 Alias analysis diagnostic tool, we are able to hoist the			; With the N2 Alias analysis diagnostic tool, we are able to hoist the
	; argmemonly bar call out of the loop.			; argmemonly bar call out of the loop.
				; Using MemorySSA we can also hoist bar.

	define void @test5(i32* %loc2, i32* noalias %loc) {			define void @test5(i32* %loc2, i32* noalias %loc) {
	; ALIAS-N2-LABEL: @test5			; ALIAS-N2-LABEL: @test5
	; ALIAS-N2: @bar			; ALIAS-N2: @bar
	; ALIAS-N2-LABEL: loop:			; ALIAS-N2-LABEL: loop:

	; CHECK-LABEL: @test5			; CHECK-LABEL: @test5
	; CHECK-LABEL: loop:			; CHECK-LABEL: loop:
	Show All 10 Lines

	; memcpy doesn't write to it's source argument, so loads to that location			; memcpy doesn't write to it's source argument, so loads to that location
	; can still be hoisted			; can still be hoisted
	define void @test6(i32* noalias %loc, i32* noalias %loc2) {			define void @test6(i32* noalias %loc, i32* noalias %loc2) {
	; CHECK-LABEL: @test6			; CHECK-LABEL: @test6
	; CHECK: %val = load i32, i32* %loc2			; CHECK: %val = load i32, i32* %loc2
	; CHECK-LABEL: loop:			; CHECK-LABEL: loop:
	; CHECK: @llvm.memcpy			; CHECK: @llvm.memcpy
				; ALIAS-N2-LABEL: @test6
				; ALIAS-N2: %val = load i32, i32* %loc2
				; ALIAS-N2-LABEL: loop:
				; ALIAS-N2: @llvm.memcpy
	br label %loop			br label %loop

	loop:			loop:
	%val = load i32, i32* %loc2			%val = load i32, i32* %loc2
	store i32 %val, i32* %loc			store i32 %val, i32* %loc
	%dest = bitcast i32* %loc to i8*			%dest = bitcast i32* %loc to i8*
	%src = bitcast i32* %loc2 to i8*			%src = bitcast i32* %loc2 to i8*
	call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dest, i8* %src, i64 8, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dest, i8* %src, i64 8, i1 false)
	br label %loop			br label %loop
	}			}

	define void @test7(i32* noalias %loc, i32* noalias %loc2) {			define void @test7(i32* noalias %loc, i32* noalias %loc2) {
	; CHECK-LABEL: @test7			; CHECK-LABEL: @test7
	; CHECK: %val = load i32, i32* %loc2			; CHECK: %val = load i32, i32* %loc2
	; CHECK-LABEL: loop:			; CHECK-LABEL: loop:
	; CHECK: @custom_memcpy			; CHECK: @custom_memcpy
				; ALIAS-N2-LABEL: @test7
				; ALIAS-N2: %val = load i32, i32* %loc2
				; ALIAS-N2-LABEL: loop:
				; ALIAS-N2: @custom_memcpy
	br label %loop			br label %loop

	loop:			loop:
	%val = load i32, i32* %loc2			%val = load i32, i32* %loc2
	store i32 %val, i32* %loc			store i32 %val, i32* %loc
	%dest = bitcast i32* %loc to i8*			%dest = bitcast i32* %loc to i8*
	%src = bitcast i32* %loc2 to i8*			%src = bitcast i32* %loc2 to i8*
	call void @custom_memcpy(i8* %dest, i8* %src)			call void @custom_memcpy(i8* %dest, i8* %src)
	br label %loop			br label %loop
	}			}

	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1)			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1)
	declare void @custom_memcpy(i8* nocapture writeonly, i8* nocapture readonly) argmemonly nounwind			declare void @custom_memcpy(i8* nocapture writeonly, i8* nocapture readonly) argmemonly nounwind

test/Transforms/LICM/hoist-bitcast-load.ll

	; RUN: opt -S -basicaa -licm < %s \| FileCheck %s			; RUN: opt -S -basicaa -licm < %s \| FileCheck %s
	; RUN: opt -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(simplify-cfg,licm)' -S < %s \| FileCheck %s			; RUN: opt -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(simplify-cfg,licm)' -S < %s \| FileCheck %s
				; RUN: opt -S -basicaa -licm -enable-mssa-loop-dependency=true -verify-memoryssa < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Make sure the basic alloca pointer hoisting works:			; Make sure the basic alloca pointer hoisting works:
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	; CHECK: load i32, i32* %c, align 4			; CHECK: load i32, i32* %c, align 4
	; CHECK: for.body:			; CHECK: for.body:
	▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

test/Transforms/LICM/hoist-debuginvariant.ll

	; RUN: opt < %s -licm -S \| FileCheck %s			; RUN: opt < %s -licm -S \| FileCheck %s
	; RUN: opt < %s -strip-debug -licm -S \| FileCheck %s			; RUN: opt < %s -strip-debug -licm -S \| FileCheck %s
				; RUN: opt < %s -licm -enable-mssa-loop-dependency=true -verify-memoryssa -S \| FileCheck %s --check-prefixes=CHECK,MSSA

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; Verify that the sdiv is hoisted out of the loop			; Verify that the sdiv is hoisted out of the loop
	; even in the presence of a preceding debug intrinsic.			; even in the presence of a preceding debug intrinsic.

	@a = global i32 0			@a = global i32 0
	@b = global i32 0			@b = global i32 0
	@c = global i32 0			@c = global i32 0

	define void @fn1() !dbg !6 {			define void @fn1() !dbg !6 {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: [[_TMP2:%.]] = load i32, i32 @a, align 4			; CHECK-NEXT: [[_TMP2:%.]] = load i32, i32 @a, align 4
	; CHECK-NEXT: [[_TMP3:%.]] = load i32, i32 @b, align 4			; CHECK-NEXT: [[_TMP3:%.]] = load i32, i32 @b, align 4
	; CHECK-NEXT: [[_TMP4:%.*]] = sdiv i32 [[_TMP2]], [[_TMP3]]			; CHECK-NEXT: [[_TMP4:%.*]] = sdiv i32 [[_TMP2]], [[_TMP3]]
				; MSSA-NEXT: store i32 [[_TMP4:%.]], i32 @c, align 4
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]

	br label %bb3			br label %bb3

	bb3: ; preds = %bb3, %0			bb3: ; preds = %bb3, %0
	call void @llvm.dbg.value(metadata i32* @c, metadata !10, metadata !DIExpression(DW_OP_deref)), !dbg !12			call void @llvm.dbg.value(metadata i32* @c, metadata !10, metadata !DIExpression(DW_OP_deref)), !dbg !12
	%_tmp2 = load i32, i32* @a, align 4			%_tmp2 = load i32, i32* @a, align 4
	%_tmp3 = load i32, i32* @b, align 4			%_tmp3 = load i32, i32* @b, align 4
	%_tmp4 = sdiv i32 %_tmp2, %_tmp3			%_tmp4 = sdiv i32 %_tmp2, %_tmp3
	store i32 %_tmp4, i32* @c, align 4			store i32 %_tmp4, i32* @c, align 4
	Show All 26 Lines

test/Transforms/LICM/hoist-deref-load.ll

	; RUN: opt -S -basicaa -licm < %s \| FileCheck %s			; RUN: opt -S -basicaa -licm < %s \| FileCheck %s
	; RUN: opt -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(simplify-cfg,licm)' -S < %s \| FileCheck %s			; RUN: opt -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(simplify-cfg,licm)' -S < %s \| FileCheck %s
				; RUN: opt -S -basicaa -licm -enable-mssa-loop-dependency=true -verify-memoryssa < %s \| FileCheck %s
				; RUN: opt -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(simplify-cfg,licm)' -enable-mssa-loop-dependency=true -verify-memoryssa -S < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; This test represents the following function:			; This test represents the following function:
	; void test1(int * __restrict__ a, int * __restrict__ b, int &c, int n) {			; void test1(int * __restrict__ a, int * __restrict__ b, int &c, int n) {
	; for (int i = 0; i < n; ++i)			; for (int i = 0; i < n; ++i)
	; if (a[i] > 0)			; if (a[i] > 0)
	▲ Show 20 Lines • Show All 716 Lines • Show Last 20 Lines

test/Transforms/LICM/hoist-fast-fdiv.ll

	; RUN: opt -licm -S < %s \| FileCheck %s			; RUN: opt -licm -S < %s \| FileCheck %s
				; RUN: opt -licm -enable-mssa-loop-dependency=true -verify-memoryssa -S < %s \| FileCheck %s

	; Function Attrs: noinline norecurse nounwind readnone ssp uwtable			; Function Attrs: noinline norecurse nounwind readnone ssp uwtable
	define zeroext i1 @invariant_denom(double %v) #0 {			define zeroext i1 @invariant_denom(double %v) #0 {
	entry:			entry:
	; CHECK-LABEL: @invariant_denom(			; CHECK-LABEL: @invariant_denom(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: fdiv fast double 1.000000e+00, %v			; CHECK-NEXT: fdiv fast double 1.000000e+00, %v
	br label %loop			br label %loop
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

test/Transforms/LICM/hoist-invariant-load.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -licm -disable-basicaa -stats -S 2>&1 \| grep "1 licm"			; RUN: opt < %s -licm -disable-basicaa -stats -S 2>&1 \| grep "1 licm"
				; RUN: opt < %s -licm -enable-mssa-loop-dependency=true -verify-memoryssa -disable-basicaa -stats -S 2>&1 \| grep "1 licm"

	@"\01L_OBJC_METH_VAR_NAME_" = internal global [4 x i8] c"foo\00", section "__TEXT,__objc_methname,cstring_literals", align 1			@"\01L_OBJC_METH_VAR_NAME_" = internal global [4 x i8] c"foo\00", section "__TEXT,__objc_methname,cstring_literals", align 1
	@"\01L_OBJC_SELECTOR_REFERENCES_" = internal global i8* getelementptr inbounds ([4 x i8], [4 x i8]* @"\01L_OBJC_METH_VAR_NAME_", i32 0, i32 0), section "__DATA, __objc_selrefs, literal_pointers, no_dead_strip"			@"\01L_OBJC_SELECTOR_REFERENCES_" = internal global i8* getelementptr inbounds ([4 x i8], [4 x i8]* @"\01L_OBJC_METH_VAR_NAME_", i32 0, i32 0), section "__DATA, __objc_selrefs, literal_pointers, no_dead_strip"
	@"\01L_OBJC_IMAGE_INFO" = internal constant [2 x i32] [i32 0, i32 16], section "__DATA, __objc_imageinfo, regular, no_dead_strip"			@"\01L_OBJC_IMAGE_INFO" = internal constant [2 x i32] [i32 0, i32 16], section "__DATA, __objc_imageinfo, regular, no_dead_strip"
	@llvm.used = appending global [3 x i8] [i8 getelementptr inbounds ([4 x i8], [4 x i8]* @"\01L_OBJC_METH_VAR_NAME_", i32 0, i32 0), i8* bitcast (i8** @"\01L_OBJC_SELECTOR_REFERENCES_" to i8), i8 bitcast ([2 x i32]* @"\01L_OBJC_IMAGE_INFO" to i8*)], section "llvm.metadata"			@llvm.used = appending global [3 x i8] [i8 getelementptr inbounds ([4 x i8], [4 x i8]* @"\01L_OBJC_METH_VAR_NAME_", i32 0, i32 0), i8* bitcast (i8** @"\01L_OBJC_SELECTOR_REFERENCES_" to i8), i8 bitcast ([2 x i32]* @"\01L_OBJC_IMAGE_INFO" to i8*)], section "llvm.metadata"

	define void @test(i8* %x) uwtable ssp {			define void @test(i8* %x) uwtable ssp {
	entry:			entry:
	Show All 30 Lines

test/Transforms/LICM/hoist-nounwind.ll

; RUN: opt -S -basicaa -licm < %s \| FileCheck %s		; RUN: opt -S -basicaa -licm < %s \| FileCheck %s
; RUN: opt -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(licm)' -S %s \| FileCheck %s		; RUN: opt -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(licm)' -S %s \| FileCheck %s
		; RUN: opt -S -basicaa -licm -enable-mssa-loop-dependency=true -verify-memoryssa < %s \| FileCheck %s
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

declare void @f() nounwind		declare void @f() nounwind

; Don't hoist load past nounwind call.		; Don't hoist load past nounwind call.
define i32 @test1(i32* noalias nocapture readonly %a) nounwind uwtable {		define i32 @test1(i32* noalias nocapture readonly %a) nounwind uwtable {
; CHECK-LABEL: @test1(		; CHECK-LABEL: @test1(
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	for.body:
%add = add nsw i32 %i1, %x.05		%add = add nsw i32 %i1, %x.05
%inc = add nuw nsw i32 %i.06, 1		%inc = add nuw nsw i32 %i.06, 1
%exitcond = icmp eq i32 %inc, 1000		%exitcond = icmp eq i32 %inc, 1000
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body

for.cond.cleanup:		for.cond.cleanup:
ret i32 %add		ret i32 %add
}		}
No newline at end of file		No newline at end of file

test/Transforms/LICM/hoist-phi.ll

	; RUN: opt -S -licm < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED			; RUN: opt -S -licm < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED
	; RUN: opt -S -licm -licm-control-flow-hoisting=1 < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-ENABLED			; RUN: opt -S -licm -licm-control-flow-hoisting=1 < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-ENABLED
	; RUN: opt -S -licm -licm-control-flow-hoisting=0 < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED			; RUN: opt -S -licm -licm-control-flow-hoisting=0 < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED
	; RUN: opt -passes='require<opt-remark-emit>,loop(licm)' -S < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED			; RUN: opt -passes='require<opt-remark-emit>,loop(licm)' -S < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED
	; RUN: opt -passes='require<opt-remark-emit>,loop(licm)' -licm-control-flow-hoisting=1 -S < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-ENABLED			; RUN: opt -passes='require<opt-remark-emit>,loop(licm)' -licm-control-flow-hoisting=1 -S < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-ENABLED
	; RUN: opt -passes='require<opt-remark-emit>,loop(licm)' -licm-control-flow-hoisting=0 -S < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED			; RUN: opt -passes='require<opt-remark-emit>,loop(licm)' -licm-control-flow-hoisting=0 -S < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED

				; RUN: opt -passes='require<opt-remark-emit>,loop(licm)' -licm-control-flow-hoisting=1 -enable-mssa-loop-dependency=true -verify-memoryssa -S < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-ENABLED
				; Enable run below when adding promotion. e.g. "store i32 %phi, i32* %p" is promoted to phi.lcssa.
				; opt -passes='require<opt-remark-emit>,loop(licm)' -licm-control-flow-hoisting=0 -enable-mssa-loop-dependency=true -verify-memoryssa -S < %s \| FileCheck %s -check-prefixes=CHECK,CHECK-DISABLED


	; CHECK-LABEL: @triangle_phi			; CHECK-LABEL: @triangle_phi
	define void @triangle_phi(i32 %x, i32* %p) {			define void @triangle_phi(i32 %x, i32* %p) {
	; CHECK-LABEL: entry:			; CHECK-LABEL: entry:
	; CHECK: %cmp1 = icmp sgt i32 %x, 0			; CHECK: %cmp1 = icmp sgt i32 %x, 0
	; CHECK-ENABLED: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]			; CHECK-ENABLED: br i1 %cmp1, label %[[IF_LICM:.]], label %[[THEN_LICM:.]]
	entry:			entry:
	br label %loop			br label %loop

	▲ Show 20 Lines • Show All 1,500 Lines • Show Last 20 Lines

test/Transforms/LICM/hoist-round.ll

	; RUN: opt -S -licm < %s \| FileCheck %s			; RUN: opt -S -licm < %s \| FileCheck %s
	; RUN: opt -aa-pipeline=basic-aa -passes='require<aa>,require<targetir>,require<scalar-evolution>,require<opt-remark-emit>,loop(licm)' -S %s \| FileCheck %s			; RUN: opt -aa-pipeline=basic-aa -passes='require<aa>,require<targetir>,require<scalar-evolution>,require<opt-remark-emit>,loop(licm)' -S %s \| FileCheck %s
				; RUN: opt -S -licm -enable-mssa-loop-dependency=true -verify-memoryssa < %s \| FileCheck %s

	target datalayout = "E-m:e-p:32:32-i8:8:8-i16:16:16-i64:32:32-f64:32:32-v64:32:32-v128:32:32-a0:0:32-n32"			target datalayout = "E-m:e-p:32:32-i8:8:8-i16:16:16-i64:32:32-f64:32:32-v64:32:32-v128:32:32-a0:0:32-n32"

	; This test verifies that ceil, floor, nearbyint, trunc, rint, round,			; This test verifies that ceil, floor, nearbyint, trunc, rint, round,
	; copysign, minnum, maxnum, minimum, maximum, and fabs intrinsics are			; copysign, minnum, maxnum, minimum, maximum, and fabs intrinsics are
	; considered safe to speculate.			; considered safe to speculate.

	; CHECK-LABEL: @test			; CHECK-LABEL: @test
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/Transforms/LICM/hoisting.ll

	; RUN: opt < %s -licm -S \| FileCheck %s			; RUN: opt < %s -licm -S \| FileCheck %s
	; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(licm)' -S \| FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<opt-remark-emit>,loop(licm)' -S \| FileCheck %s
				; RUN: opt < %s -licm -enable-mssa-loop-dependency=true -verify-memoryssa -S \| FileCheck %s

	@X = global i32 0 ; <i32*> [#uses=1]			@X = global i32 0 ; <i32*> [#uses=1]

	declare void @foo()			declare void @foo()

	declare i32 @llvm.bitreverse.i32(i32)			declare i32 @llvm.bitreverse.i32(i32)

	; This testcase tests for a problem where LICM hoists			; This testcase tests for a problem where LICM hoists
	▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

test/Transforms/LICM/sink-promote.ll

This file was added.

				; RUN: opt < %s -basicaa -licm -S \| FileCheck %s

				; Test moved from sinking.ll, as it tests sinking of a store who alone touches
				; a memory location in a loop.
				; Store can be sunk out of exit block containing indirectbr instructions after
				; D50925. Updated to use an argument instead of undef, due to PR38989.
				define void @test12(i32* %ptr) {
				; CHECK-LABEL: @test12
				; CHECK: store
				; CHECK-NEXT: br label %lab4
				br label %lab4

				lab4:
				br label %lab20

				lab5:
				br label %lab20

				lab6:
				br label %lab4

				lab7:
				br i1 undef, label %lab8, label %lab13

				lab8:
				br i1 undef, label %lab13, label %lab10

				lab10:
				br label %lab7

				lab13:
				ret void

				lab20:
				br label %lab21

				lab21:
				; CHECK: lab21:
				; CHECK-NOT: store
				; CHECK: br i1 false, label %lab21, label %lab22
				store i32 36127957, i32* %ptr, align 4
				br i1 undef, label %lab21, label %lab22

				lab22:
				; CHECK: lab22:
				; CHECK-NOT: store
				; CHECK-NEXT: indirectbr i8* undef
				indirectbr i8* undef, [label %lab5, label %lab6, label %lab7]
				}

test/Transforms/LICM/sink.ll

	; RUN: opt -S -licm < %s \| FileCheck %s --check-prefix=CHECK-LICM			; RUN: opt -S -licm < %s \| FileCheck %s --check-prefix=CHECK-LICM
	; RUN: opt -S -licm < %s \| opt -S -loop-sink \| FileCheck %s --check-prefix=CHECK-SINK			; RUN: opt -S -licm < %s \| opt -S -loop-sink \| FileCheck %s --check-prefix=CHECK-SINK
	; RUN: opt -S < %s -passes='require<opt-remark-emit>,loop(licm),loop-sink' \			; RUN: opt -S < %s -passes='require<opt-remark-emit>,loop(licm),loop-sink' \
	; RUN: \| FileCheck %s --check-prefix=CHECK-SINK			; RUN: \| FileCheck %s --check-prefix=CHECK-SINK
				; RUN: opt -S -licm -enable-mssa-loop-dependency=true -verify-memoryssa < %s \| FileCheck %s --check-prefix=CHECK-LICM

	; Original source code:			; Original source code:
	; int g;			; int g;
	; int foo(int p, int x) {			; int foo(int p, int x) {
	; for (int i = 0; i != x; i++)			; for (int i = 0; i != x; i++)
	; if (__builtin_expect(i == p, 0)) {			; if (__builtin_expect(i == p, 0)) {
	; x += g; x *= g;			; x += g; x *= g;
	; }			; }
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

test/Transforms/LICM/sinking.ll

	; RUN: opt < %s -basicaa -licm -S \| FileCheck %s			; RUN: opt < %s -basicaa -licm -S \| FileCheck %s
	; RUN: opt < %s -debugify -basicaa -licm -S \| FileCheck %s -check-prefix=DEBUGIFY			; RUN: opt < %s -debugify -basicaa -licm -S \| FileCheck %s -check-prefix=DEBUGIFY
				; RUN: opt < %s -basicaa -licm -S -enable-mssa-loop-dependency=true -verify-memoryssa \| FileCheck %s


	declare i32 @strlen(i8*) readonly nounwind			declare i32 @strlen(i8*) readonly nounwind

	declare void @foo()			declare void @foo()

	; Sink readonly function.			; Sink readonly function.
	define i32 @test1(i8* %P) {			define i32 @test1(i8* %P) {
	br label %Loop			br label %Loop
	▲ Show 20 Lines • Show All 342 Lines • ▼ Show 20 Lines
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: %[[LCSSAPHI:.*]] = phi i64 [ %iv, %l4.latch ], [ %iv, %l4.body ]			; CHECK-NEXT: %[[LCSSAPHI:.*]] = phi i64 [ %iv, %l4.latch ], [ %iv, %l4.body ]
	; CHECK-NEXT: %l.le = trunc i64 %[[LCSSAPHI]] to i32			; CHECK-NEXT: %l.le = trunc i64 %[[LCSSAPHI]] to i32
	; CHECK-NEXT: ret i32 %l.le			; CHECK-NEXT: ret i32 %l.le

	ret i32 %lcssa			ret i32 %lcssa
	}			}

	; Can't sink stores out of exit blocks containing indirectbr instructions			; @test12 moved to sink-promote.ll, as it tests sinking and promotion.
				lebedev.riUnsubmitted Done Reply Inline Actions But there is no `sink-promote.ll` in this differential? Is that in some other patch? lebedev.ri: But there is no `sink-promote.ll` in this differential? Is that in some other patch?
	; because loop simplify does not create dedicated exits for such blocks. Test
	; that by sinking the store from lab21 to lab22, but not further.
	define void @test12() {
	; CHECK-LABEL: @test12
	br label %lab4

	lab4:
	br label %lab20

	lab5:
	br label %lab20

	lab6:
	br label %lab4

	lab7:
	br i1 undef, label %lab8, label %lab13

	lab8:
	br i1 undef, label %lab13, label %lab10

	lab10:
	br label %lab7

	lab13:
	ret void

	lab20:
	br label %lab21

	lab21:
	; CHECK: lab21:
	; CHECK-NOT: store
	; CHECK: br i1 false, label %lab21, label %lab22
	store i32 36127957, i32* undef, align 4
	br i1 undef, label %lab21, label %lab22

	lab22:
	; CHECK: lab22:
	; CHECK: store
	; CHECK-NEXT: indirectbr i8* undef
	indirectbr i8* undef, [label %lab5, label %lab6, label %lab7]
	}

	; Test that we don't crash when trying to sink stores and there's no preheader			; Test that we don't crash when trying to sink stores and there's no preheader
	; available (which is used for creating loads that may be used by the SSA			; available (which is used for creating loads that may be used by the SSA
	; updater)			; updater)
	define void @test13() {			define void @test13() {
	; CHECK-LABEL: @test13			; CHECK-LABEL: @test13
	br label %lab59			br label %lab59

	▲ Show 20 Lines • Show All 372 Lines • Show Last 20 Lines

test/Transforms/LICM/volatile-alias.ll

	; RUN: opt -basicaa -sroa -loop-rotate -licm -S < %s \| FileCheck %s			; RUN: opt -basicaa -sroa -loop-rotate -licm -S < %s \| FileCheck %s
	; RUN: opt -basicaa -sroa -loop-rotate %s \| opt -aa-pipeline=basic-aa -passes='require<aa>,require<targetir>,require<scalar-evolution>,require<opt-remark-emit>,loop(licm)' -S \| FileCheck %s			; RUN: opt -basicaa -sroa -loop-rotate %s \| opt -aa-pipeline=basic-aa -passes='require<aa>,require<targetir>,require<scalar-evolution>,require<opt-remark-emit>,loop(licm)' -S \| FileCheck %s
				; RUN: opt -basicaa -sroa -loop-rotate -licm -enable-mssa-loop-dependency=true -verify-memoryssa -S < %s \| FileCheck %s
	; The objects p and q are aliased to each other, but even though *q is			; The objects p and q are aliased to each other, but even though *q is
	; volatile, *p can be considered invariant in the loop. Check if it is moved			; volatile, *p can be considered invariant in the loop. Check if it is moved
	; out of the loop.			; out of the loop.
	; CHECK: load i32, i32* %p			; CHECK: load i32, i32* %p
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: load volatile i32, i32* %q			; CHECK: load volatile i32, i32* %q

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Use MemorySSA in LICM to do sinking and hoisting.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 180775

include/llvm/Analysis/MemorySSAUpdater.h

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Scalar/LICM.cpp

lib/Transforms/Scalar/LoopSink.cpp

test/Transforms/LICM/argmemonly-call.ll

test/Transforms/LICM/hoist-bitcast-load.ll

test/Transforms/LICM/hoist-debuginvariant.ll

test/Transforms/LICM/hoist-deref-load.ll

test/Transforms/LICM/hoist-fast-fdiv.ll

test/Transforms/LICM/hoist-invariant-load.ll

test/Transforms/LICM/hoist-nounwind.ll

test/Transforms/LICM/hoist-phi.ll

test/Transforms/LICM/hoist-round.ll

test/Transforms/LICM/hoisting.ll

test/Transforms/LICM/sink-promote.ll

test/Transforms/LICM/sink.ll

test/Transforms/LICM/sinking.ll

test/Transforms/LICM/volatile-alias.ll

Use MemorySSA in LICM to do sinking and hoisting.
ClosedPublic