This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Make Loop ICM profile aware
ClosedPublic

Authored by wenlei on Jul 21 2019, 10:50 AM.

Download Raw Diff

Details

Reviewers

asbirlea
sanjoy
reames
nikic
hfinkel
vsk

Commits

rG7e71aa24bc07: [LICM] Make Loop ICM profile aware
rL368526: [LICM] Make Loop ICM profile aware

Summary

Hoisting/sinking instruction out of a loop isn't always beneficial. Hoisting an instruction from a cold block inside a loop body out of the loop could hurt performance. This change makes Loop ICM profile aware - it now checks block frequency to make sure hoisting/sinking anly moves instruction to colder block.

Test Plan:

ninja check

Diff Detail

Repository: rL LLVM

Event Timeline

wenlei created this revision.Jul 21 2019, 10:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2019, 10:50 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B35439: Diff 211005.Jul 21 2019, 10:51 AM

Extra note: we saw a case from one of the big services (at facebook) that's very similar to the example in LICM/sink.ll, where there're cold paths in the loop. And by checking block frequency to avoid hoisting to hotter blocks, we got good improvements (~7% for a CPU related metric).

xbolva00 added a subscriber: xbolva00.Jul 22 2019, 9:33 AM

xbolva00 added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
1675 ↗	(On Diff #211005)	Maybe you can return here?

Useful patch, thanks!

xbolva00 added reviewers: reames, nikic, hfinkel.Jul 22 2019, 9:35 AM

Thank you for the patch. This LGTM.

llvm/lib/Transforms/Scalar/LICM.cpp
885 ↗	(On Diff #211005)	Nit: It may be worth checking worthSinkOrHoistInst prior to some of the other conditions if it's a faster/cheaper shortcutting condition.

This revision is now accepted and ready to land.Jul 22 2019, 10:06 AM

Address review feedbacks.

Harbormaster completed remote builds in B35479: Diff 211152.Jul 22 2019, 11:16 AM

wenlei marked 2 inline comments as done.Jul 22 2019, 11:20 AM

wenlei added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
885 ↗	(On Diff #211005)	good point, I moved it up over isSafeToExecuteUnconditionally.
1675 ↗	(On Diff #211005)	changed. thanks!

vsk requested changes to this revision.Jul 22 2019, 11:32 AM

vsk added subscribers: davidxl, vsk.

vsk added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
804 ↗	(On Diff #211152)	IIRC BFI returns a 0 frequency if information about a block is missing. If information for either block is missing, shouldn't this fall back to the non-PGO behavior?

This revision now requires changes to proceed.Jul 22 2019, 11:32 AM

This was tried before. IIRC, the conclusion was to implement look sinking pass to undo non-profitable LICM. The loop sinking pass was in D22778. Can the loop sinking pass be enhanced to handle the case here ( I have not looked in details) ?

wenlei marked an inline comment as done.Jul 22 2019, 11:51 AM

wenlei added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
804 ↗	(On Diff #211152)	Hmm.. how can we differentiate between missing info vs. the block count is really just 0? My understanding is 1) if profile isn't available in general or for this function, BPI would use static info about the CFG to estimate weights (BranchProbabilityInfo::calculate), thus BFI automatically gets a "synthetic" static profile; 2) if information is missing because we don't have profile for that block, that's likely AutoFDO instead of Instr. PGO, and SampleProfileLoader::propagateWeights would try to infer if possible, but that's more of a profile accuracy issue.

In D65060#1596152, @davidxl wrote:

This was tried before. IIRC, the conclusion was to implement look sinking pass to undo non-profitable LICM. The loop sinking pass was in D22778. Can the loop sinking pass be enhanced to handle the case here ( I have not looked in details) ?

Folks want disable LICM with metadata - https://reviews.llvm.org/D64557 - to just workaround this LICM issue. If we know (this patch) that we should not hoist instrs, why to hoist them and undo them?

Thanks for explaining. I don't have any other concerns about the patch as-written, but am interested in how it fits in with LoopSink (per David's comment).

In D65060#1596152, @davidxl wrote:

This was tried before. IIRC, the conclusion was to implement look sinking pass to undo non-profitable LICM. The loop sinking pass was in D22778. Can the loop sinking pass be enhanced to handle the case here ( I have not looked in details) ?

I just looked at D22778 briefly, but it didn't mention why we need to undo it later instead of just checking bfi during LICM - I'm curious. The loop sinking pass evidently didn't help the pathological case (a big switch inside a loop, and hundreds of switch arms got hoisted) we hit, but I didn't look into why either.

Hal probably remembered more details. IIRC, the argument was that LICM provides IR canonicalization which can simplify analysis -- there were some examples about exposed vectorization opportunities etc.

FWIW, LoopVectorize kind of expects invariant code to be hoisted out of the loop body before vectorization. We recently addressed some issues caused by stuff not being hoisted (D59995) and have some ideas on how to improve things on the LoopVectorize side a bit.

In D65060#1596194, @davidxl wrote:

Hal probably remembered more details. IIRC, the argument was that LICM provides IR canonicalization which can simplify analysis -- there were some examples about exposed vectorization opportunities etc.

I looked at the loop sink pass, and the case we ran into again. I think it'd be difficult for the loop sink pass to handle that case. There's a switch with over 200 case arms, inside a loop. Each case arm has a GEP, and the results of these GEP are merged into a phi outside of the switch, then used by a load afterwards. LICM hoisted all these 200+ GEP to the preheader. But since the use is after the switch, none of the original containing blocks (lower frequency switch arms) dominates the use block, so loop sink cannot sink these GEPs back as it can't find a set of blocks with lower frequency sum.

In D65060#1596210, @fhahn wrote:

FWIW, LoopVectorize kind of expects invariant code to be hoisted out of the loop body before vectorization. We recently addressed some issues caused by stuff not being hoisted (D59995) and have some ideas on how to improve things on the LoopVectorize side a bit.

While It is not ideal to skip LICM as canonicalization, practically if loop sink can't undo some harmful hoisting, we have to do this to get around pathological cases. On the other hand, when some blocks from loop body is much colder than preheader, there's a good chance that we have non-trivial control flow inside loop, so the impact on vectorization might not be too big..

We could also tune the heuristic to be conservative - only skip hoisting when preheader is much hotter (e.g. >=4x). Thoughts?

It is also better to introduce a coldness factor parameter 'F', i.e, if the source block's count is less than 1/F of the header count, suppress the hoisting.

llvm/lib/Transforms/Scalar/LICM.cpp
804 ↗	(On Diff #211152)	Without profile data, BFI can easily give you wrong results, so you do need real profile data to guide the decision, otherwise this can not be turned on by default. To use real profile data, the right interface to use is ProfileSummaryInfo::getProfileCount -- this interface handles samplePGO and instrumentation based PGO properly.

In D65060#1599421, @davidxl wrote:

It is also better to introduce a coldness factor parameter 'F', i.e, if the source block's count is less than 1/F of the header count, suppress the hoisting.

Good point, thanks David. I'll expose a coldness factor, and make it tunable via switch. I replied your comments about BFI inline..

llvm/lib/Transforms/Scalar/LICM.cpp
804 ↗	(On Diff #211152)	Without profile data, BFI can easily give you wrong results, so you do need real profile data to guide the decision, otherwise this can not be turned on by default. I thought BFI would give us best guess without profile, e.g. it would leverage "__builtin_expect", and recognize loop (calcLoopBranchHeuristics), etc. for a synthetic profile. That's definitely not as accurate as a real profile, but I was following the examples of other passes. The LoopSink pass for example, uses BFI.getBlockFreq too. Did I miss anything? To use real profile data, the right interface to use is ProfileSummaryInfo::getProfileCount -- this interface handles samplePGO and instrumentation based PGO properly. Sorry I'm a bit confused. IIRC, ProfileSummaryInfo::getProfileCount is specifically for getting counts from call instructions. There we special case for sample profile, because call instructions should have accurate weight from LBR. The main use of that function is to determine if a call site is hot or cold (there's also assertion on that). Here we're not looking for hotness of a call site, so I'm sure if that interface is a good fit..

davidxl added inline comments.Jul 24 2019, 11:48 AM

llvm/lib/Transforms/Scalar/LICM.cpp
804 ↗	(On Diff #211152)	LoopSink pass has check: // Enable LoopSink only when runtime profile is available. // With static profile, the sinking decision may be sub-optimal. if (!Preheader->getParent()->hasProfileData()) return false; Regarding ProfileSummaryInfo::getProfileCount -- thanks for noticing. I think it is a mistake to assert for call instruction. It should do something like: if (hasSampleProfile() && Inst->isCall()) { ... return ... } // use BFI and entry count based method: .... but that is something to be fixed independently.

add hasProfileData check, expose tuning knob per review feedback.

wenlei marked an inline comment as done.Jul 24 2019, 2:35 PM

wenlei added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
804 ↗	(On Diff #211152)	thanks for the pointer to hasProfileData. the same check is now added.

Harbormaster completed remote builds in B35608: Diff 211601.Jul 24 2019, 2:35 PM

gentle ping.. @davidxl, let me know if the last update addressed your comments. thanks!

This version looks fine to me, but please let other reviewers to weigh in as well.

In D65060#1596177, @vsk wrote:

Thanks for explaining. I don't have any other concerns about the patch as-written, but am interested in how it fits in with LoopSink (per David's comment).

@vsk, David's comments has been addressed. Do you have any other concern with the latest update before this change can be accepted?

Apologies, I didn't mean to hold this up. I don't have any other concerns.

This revision is now accepted and ready to land.Aug 1 2019, 9:37 AM

Closed by commit rL368526: [LICM] Make Loop ICM profile aware (authored by wenlei). · Explain WhyAug 10 2019, 11:06 PM

This revision was automatically updated to reflect the committed changes.

wenlei mentioned this in rL368542: Fix pass dependency for LICM.Aug 11 2019, 3:53 PM

wenlei mentioned this in rGcb5a90fd314a: Fix pass dependency for LICM.

FYI: reverted in r368800.

This approach is broken for another reason, which also motivated the LoopSink approach David mentioned.

BlockFrequencyInfo isn't preserved by loop passes. This is deeply problematic. In the legacy pass manager, this almost works, but can result in wildly inconsistent behavior when half way through a loop pipeline BFI gets invalidated and/or where it can partition the loop pass pipeline in weird ways breaking intended iteration order of the loop pass pipeline. With the new pass manager, I suspect this produces non-deterministic behavior based on what order we visit things causing BFI to be present or not when we happen to reach the loop pass manager, and happening to change the behavior. Even worse, if BFI *happens* to be available at the start of the loop pass pipeline in the new pass manager, it will continue to be used even after passes restructure the basic blocks invalidating it. This can lead to arbitrarily bad behavior.

To do something like this, we really need to teach all the loop passes in the same loop pass pipeline as LICM that is set up this way to update and preserve BFI. This will fix the weird behavior w/ the legacy PM and it will allow you to add the analysis as one of the guaranteed loop analyses like MemorySSA and others.

This is exactly what we had to do for MemorySSA and we'll have to do the same thing here.

To be clear, we've already seen this cause stage2/3 differences and other problems when PGO is enabled (crashes, miscompiles).

Hope my explanation helps, but we'll need to revert this ASAP.

Thanks for the context, @chandlerc. Agreed that ideally BFI should be preserved by all loop passes. Not to defend this change, but I have a few questions just to make sure I understand the cause of miscompile/non-determinism and there's no other lurking issues around this.

In the legacy pass manager, this almost works, but can result in wildly inconsistent behavior when half way through a loop pipeline BFI gets invalidated and/or where it can partition the loop pass pipeline in weird ways breaking intended iteration order of the loop pass pipeline.

With legacy pass manager, the partition of loop pass pipeline manifested just like the test change in opt-O2-pipeline.ll/opt-O3-pipeline.ll. I thought it's definitely not desired, but not a correctness issue either - it shouldn't leads to miscompile or non-determinism. The failures you observed are not from legacy pass manager, right?

With new pass manager, since getCachedResult is used to access BFI there, if BFI was invalidated before the beginning loop pass manager, it will be null, which makes the added heuristic a no-op. Otherwise BFI will always be 'available' even after loop transformations invalidate it, the problem as I see it is that invalidation happens on loop level for new pass manager, but BFI is function level analysis result, so the invalidation wouldn't actually work. Correct me if I'm wrong, from what I can tell this seems a purposeful design to 'force' preservation of higher level analysis result during loop pass pipeline, to avoid the partition situation of legacy pass manager.

However what I don't understand is how this leads to non-determinism or miscompile.

I suspect this produces non-deterministic behavior based on what order we visit things causing BFI to be present or not when we happen to reach the loop pass manager, and happening to change the behavior.

IIUC, as long as the order we visit functions/loops is deterministic (think it is), whether BFI is available when we reach loop pass manager should also be deterministic. Or are you saying there're (benign) non-deterministic ordering today, and the fact that BFI availability happen to depend on that order made those benign internal non-determinism visible externally?

Even worse, if BFI *happens* to be available at the start of the loop pass pipeline in the new pass manager, it will continue to be used even after passes restructure the basic blocks invalidating it. This can lead to arbitrarily bad behavior.

If BFI is available at the start of loop pass pipeline, since the invalidation on loop level effective doesn't work for function level BFI, it's as if we mark BFI as preserved without actually updating/preserving it, which is wrong. But looking at the APIs of BFI, for blocks that still exist after transformation, we would still get a count, could be stale though, and for blocks didn't exist before transformation, we would get 0. That's problematic in terms of count quality/accuracy, but all seems deterministic, not correctness issue either.

Updating/preserving BFI seems non-trivial as we may also need to update metadata so that our updates will be persisted even outside of the loop pass manager, and next invocation of BPI/BFI will still get the updated result. That said, it's the right thing to do as you said. My point is there may always some imperfection in the update, and not updating it at all is an extreme, but it's should be a matter of accuracy still. I hope the degree of accuracy isn't the cause of the issues we're seeing. It'd be great if you can share a reproducer - I'd like to take a closer look. Thanks.

Let me try to answer some of the question, and my apologies for not catching this in the initial review.

With legacy pass manager, the partition of loop pass pipeline manifested just like the test change in opt-O2-pipeline.ll/opt-O3-pipeline.ll. I thought it's definitely not desired, but not a correctness issue either - it shouldn't leads to miscompile or non-determinism. The failures you observed are not from legacy pass manager, right?

That's correct. For the legacy pass manager the consequence of introducing this dependency is splitting the loop pass pipeline. The failures observed were in the new pass manager.

With new pass manager, since getCachedResult is used to access BFI there, if BFI was invalidated before the beginning loop pass manager, it will be null, which makes the added heuristic a no-op. Otherwise BFI will always be 'available' even after loop transformations invalidate it, the problem as I see it is that invalidation happens on loop level for new pass manager, but BFI is function level analysis result, so the invalidation wouldn't actually work. Correct me if I'm wrong, from what I can tell this seems a purposeful design to 'force' preservation of higher level analysis result during loop pass pipeline, to avoid the partition situation of legacy pass manager.

That's correct, the pass is invalidated in the middle of the loop pass pipeline, but since invalidation only happens at the loop pipeline level, it's not actually invalidated after each loop pass. So there are loop passes who will use garbage data from the BFI.

However what I don't understand is how this leads to non-determinism or miscompile.

BFI keeps in its internal data BasicBlock*-s. We'll have two passes, one deleting BBs (LoopSimplify or SimplifyCFG) and one creating new BBs (simple loop unswitch I think it was). The rough idea from what we've seen is that, once invalidated the BasicBlock pointers can be anything. They can be invalid causing crashes, but they can also be valid pointers pointing to newly created BasicBlocks by that second loop pass, blocks that are naturally different from the original ones BFI queried. Hence, based on where the new BBs are allocated, you'll get non-deterministic results on what the BFI query replies.
This is what made the problem very hard to pinpoint to the cause.

If BFI is available at the start of loop pass pipeline, since the invalidation on loop level effective doesn't work for function level BFI, it's as if we mark BFI as preserved without actually updating/preserving it, which is wrong. But looking at the APIs of BFI, for blocks that still exist after transformation, we would still get a count, could be stale though, and for blocks didn't exist before transformation, we would get 0. That's problematic in terms of count quality/accuracy, but all seems deterministic, not correctness issue either.

You're right that it's as if BFI is preserved without it actually being preserved. This is a big pain point in the new pass manager, which allows such mistakes.
I'm going to try to work on a refactoring for the getCachedAnalysis API in order to be able to catch such cases. The rough idea is that this API should only be allowed to get analyses that cannot be invalidated, while the analyses that *can* be invalidated but we "promise" to preserve throughout the loop pass pipeline, should go in the AnalysisResults.

Updating/preserving BFI seems non-trivial as we may also need to update metadata so that our updates will be persisted even outside of the loop pass manager, and next invocation of BPI/BFI will still get the updated result. That said, it's the right thing to do as you said. My point is there may always some imperfection in the update, and not updating it at all is an extreme, but it's should be a matter of accuracy still. I hope the degree of accuracy isn't the cause of the issues we're seeing. It'd be great if you can share a reproducer - I'd like to take a closer look. Thanks.

I think that the most basic update will be to remove all invalid data from the BFI internal data structures. Then, yes, it becomes a matter of accuracy. I'm deferring to chanderc@ if a reproducer can be made available, but I think the BFI cleanup update would give you the answer to whether the improvements you were seeing were genuine.

Thanks for reply, @asbirlea. Now I see where the non-determinism comes from. I thought that BFI query APIs all go through a translation from BasicBlock* to BlockNode, thus in query APIs, we use BasicBlock* just as a look up key without actually dereferencing it. This is what makes me think removing basic block is fine (we'll have dead entry, but it won't be accessed). But if the dangling BasicBlock* pointer happens to point to newly allocated BasicBlock* later, the count we get for that new block can be non-deterministic. This is quite tricky.. I'm not sure how crash can happen still as it seems we don't dereference BasicBlock* in query APIs, though non-determinism is enough of a problem.

I think that the most basic update will be to remove all invalid data from the BFI internal data structures. Then, yes, it becomes a matter of accuracy.

Agreed. This is like actually invalidating BFI on loop or block level. Is this being worked on actively? If not, folks from our side or myself can probably do it. For the long term, I still think we need to properly persist the update to metadata though, for the accuracy of later BPI/BFI passes.

I think the BFI cleanup update would give you the answer to whether the improvements you were seeing were genuine.

The wins we got was with legacy pass manager. (We started evaluating new pass manager in a different context).

asbirlea mentioned this in D67612: [UnrolledInstAnalyzer] Use MSSA to find stored values outside of loop..Sep 16 2019, 10:37 AM

modimo mentioned this in D86156: [BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults.Aug 18 2020, 10:40 AM

xbolva00 mentioned this in D84108: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline.Aug 27 2020, 4:23 PM

modimo mentioned this in D87551: [LICM] Make Loop ICM profile aware again.Sep 11 2020, 3:33 PM

wenlei mentioned this in rG2ea4c2c598b7: [BFI] Make BFI information available through loop passes inside….Sep 15 2020, 4:16 PM

wenlei mentioned this in rG2c391a5a14ae: [LICM] Make Loop ICM profile aware again.Sep 15 2020, 5:25 PM

modimo mentioned this in D111806: [LICM] Check the number of divergent paths from loop header to target block.Oct 21 2021, 2:23 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Utils/

LoopUtils.h

15 lines

lib/

Transforms/

Scalar/

LICM.cpp

90 lines

test/

Other/

10 lines

10 lines

10 lines

2 lines

Transforms/

LICM/

sink.ll

10 lines

Diff 214550

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

Show All 31 Lines
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"

namespace llvm {		namespace llvm {

class AliasSet;		class AliasSet;
class AliasSetTracker;		class AliasSetTracker;
class BasicBlock;		class BasicBlock;
		class BlockFrequencyInfo;
class DataLayout;		class DataLayout;
class Loop;		class Loop;
class LoopInfo;		class LoopInfo;
class MemoryAccess;		class MemoryAccess;
class MemorySSAUpdater;		class MemorySSAUpdater;
class OptimizationRemarkEmitter;		class OptimizationRemarkEmitter;
class PredicatedScalarEvolution;		class PredicatedScalarEvolution;
class PredIteratorCache;		class PredIteratorCache;
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	struct SinkAndHoistLICMFlags {
bool IsSink;		bool IsSink;
};		};

/// Walk the specified region of the CFG (defined by all blocks		/// Walk the specified region of the CFG (defined by all blocks
/// dominated by the specified block, and that are in the current loop) in		/// dominated by the specified block, and that are in the current loop) in
/// reverse depth first order w.r.t the DominatorTree. This allows us to visit		/// reverse depth first order w.r.t the DominatorTree. This allows us to visit
/// uses before definitions, allowing us to sink a loop body in one pass without		/// uses before definitions, allowing us to sink a loop body in one pass without
/// iteration. Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree,		/// iteration. Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree,
/// DataLayout, TargetLibraryInfo, Loop, AliasSet information for all		/// BlockFrequencyInfo, TargetLibraryInfo, Loop, AliasSet information for all
/// instructions of the loop and loop safety information as		/// instructions of the loop and loop safety information as
/// arguments. Diagnostics is emitted via \p ORE. It returns changed status.		/// arguments. Diagnostics is emitted via \p ORE. It returns changed status.
bool sinkRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,		bool sinkRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,
TargetLibraryInfo , TargetTransformInfo , Loop *,		BlockFrequencyInfo , TargetLibraryInfo , TargetTransformInfo *,
AliasSetTracker , MemorySSAUpdater , ICFLoopSafetyInfo *,		Loop , AliasSetTracker , MemorySSAUpdater , ICFLoopSafetyInfo ,
SinkAndHoistLICMFlags &, OptimizationRemarkEmitter *);		SinkAndHoistLICMFlags &, OptimizationRemarkEmitter *);

/// Walk the specified region of the CFG (defined by all blocks		/// Walk the specified region of the CFG (defined by all blocks
/// dominated by the specified block, and that are in the current loop) in depth		/// dominated by the specified block, and that are in the current loop) in depth
/// first order w.r.t the DominatorTree. This allows us to visit definitions		/// first order w.r.t the DominatorTree. This allows us to visit definitions
/// before uses, allowing us to hoist a loop body in one pass without iteration.		/// before uses, allowing us to hoist a loop body in one pass without iteration.
/// Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree, DataLayout,		/// Takes DomTreeNode, AliasAnalysis, LoopInfo, DominatorTree, BlockFrequencyInfo,
/// TargetLibraryInfo, Loop, AliasSet information for all instructions of the		/// TargetLibraryInfo, Loop, AliasSet information for all instructions of the
/// loop and loop safety information as arguments. Diagnostics is emitted via \p		/// loop and loop safety information as arguments. Diagnostics is emitted via \p
/// ORE. It returns changed status.		/// ORE. It returns changed status.
bool hoistRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,		bool hoistRegion(DomTreeNode , AliasAnalysis , LoopInfo , DominatorTree ,
TargetLibraryInfo , Loop , AliasSetTracker *,		BlockFrequencyInfo , TargetLibraryInfo , Loop , AliasSetTracker ,
MemorySSAUpdater , ICFLoopSafetyInfo ,		MemorySSAUpdater , ICFLoopSafetyInfo , SinkAndHoistLICMFlags &,
SinkAndHoistLICMFlags &, OptimizationRemarkEmitter *);		OptimizationRemarkEmitter *);

/// This function deletes dead loops. The caller of this function needs to		/// This function deletes dead loops. The caller of this function needs to
/// guarantee that the loop is infact dead.		/// guarantee that the loop is infact dead.
/// The function requires a bunch or prerequisites to be present:		/// The function requires a bunch or prerequisites to be present:
/// - The loop needs to be in LCSSA form		/// - The loop needs to be in LCSSA form
/// - The loop needs to have a Preheader		/// - The loop needs to have a Preheader
/// - A unique dedicated exit block must exist		/// - A unique dedicated exit block must exist
///		///
▲ Show 20 Lines • Show All 217 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
static cl::opt<bool>		static cl::opt<bool>
DisablePromotion("disable-licm-promotion", cl::Hidden, cl::init(false),		DisablePromotion("disable-licm-promotion", cl::Hidden, cl::init(false),
cl::desc("Disable memory promotion in LICM pass"));		cl::desc("Disable memory promotion in LICM pass"));

static cl::opt<bool> ControlFlowHoisting(		static cl::opt<bool> ControlFlowHoisting(
"licm-control-flow-hoisting", cl::Hidden, cl::init(false),		"licm-control-flow-hoisting", cl::Hidden, cl::init(false),
cl::desc("Enable control flow (and PHI) hoisting in LICM"));		cl::desc("Enable control flow (and PHI) hoisting in LICM"));

		static cl::opt<unsigned> HoistSinkColdnessThreshold(
		"licm-coldness-threshold", cl::Hidden, cl::init(4),
		cl::desc("Relative coldness Threshold of hoisting/sinking destination "
		"block for LICM to be considered beneficial"));

static cl::opt<uint32_t> MaxNumUsesTraversed(		static cl::opt<uint32_t> MaxNumUsesTraversed(
"licm-max-num-uses-traversed", cl::Hidden, cl::init(8),		"licm-max-num-uses-traversed", cl::Hidden, cl::init(8),
cl::desc("Max num uses visited for identifying load "		cl::desc("Max num uses visited for identifying load "
"invariance in loop using invariant start (default = 8)"));		"invariance in loop using invariant start (default = 8)"));

// Default value of zero implies we use the regular alias set tracker mechanism		// Default value of zero implies we use the regular alias set tracker mechanism
// instead of the cross product using AA to identify aliasing of the memory		// instead of the cross product using AA to identify aliasing of the memory
// location we are interested in.		// location we are interested in.
Show All 28 Lines
static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);		static bool inSubLoop(BasicBlock BB, Loop CurLoop, LoopInfo *LI);
static bool isNotUsedOrFreeInLoop(const Instruction &I, const Loop *CurLoop,		static bool isNotUsedOrFreeInLoop(const Instruction &I, const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
TargetTransformInfo *TTI, bool &FreeInLoop);		TargetTransformInfo *TTI, bool &FreeInLoop);
static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,		static void hoist(Instruction &I, const DominatorTree DT, const Loop CurLoop,
BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,		BasicBlock Dest, ICFLoopSafetyInfo SafetyInfo,
MemorySSAUpdater MSSAU, OptimizationRemarkEmitter ORE);		MemorySSAUpdater MSSAU, OptimizationRemarkEmitter ORE);
static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,		static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,
const Loop CurLoop, ICFLoopSafetyInfo SafetyInfo,		BlockFrequencyInfo BFI, const Loop CurLoop,
MemorySSAUpdater MSSAU, OptimizationRemarkEmitter ORE);		ICFLoopSafetyInfo SafetyInfo, MemorySSAUpdater MSSAU,
		OptimizationRemarkEmitter *ORE);
static bool isSafeToExecuteUnconditionally(Instruction &Inst,		static bool isSafeToExecuteUnconditionally(Instruction &Inst,
const DominatorTree *DT,		const DominatorTree *DT,
const Loop *CurLoop,		const Loop *CurLoop,
const LoopSafetyInfo *SafetyInfo,		const LoopSafetyInfo *SafetyInfo,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
const Instruction *CtxI = nullptr);		const Instruction *CtxI = nullptr);
static bool pointerInvalidatedByLoop(MemoryLocation MemLoc,		static bool pointerInvalidatedByLoop(MemoryLocation MemLoc,
AliasSetTracker CurAST, Loop CurLoop,		AliasSetTracker CurAST, Loop CurLoop,
Show All 11 Lines
static void moveInstructionBefore(Instruction &I, Instruction &Dest,		static void moveInstructionBefore(Instruction &I, Instruction &Dest,
ICFLoopSafetyInfo &SafetyInfo,		ICFLoopSafetyInfo &SafetyInfo,
MemorySSAUpdater *MSSAU);		MemorySSAUpdater *MSSAU);

namespace {		namespace {
struct LoopInvariantCodeMotion {		struct LoopInvariantCodeMotion {
using ASTrackerMapTy = DenseMap<Loop *, std::unique_ptr<AliasSetTracker>>;		using ASTrackerMapTy = DenseMap<Loop *, std::unique_ptr<AliasSetTracker>>;
bool runOnLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,		bool runOnLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,
TargetLibraryInfo TLI, TargetTransformInfo TTI,		BlockFrequencyInfo BFI, TargetLibraryInfo TLI,
ScalarEvolution SE, MemorySSA MSSA,		TargetTransformInfo TTI, ScalarEvolution SE, MemorySSA *MSSA,
OptimizationRemarkEmitter *ORE, bool DeleteAST);		OptimizationRemarkEmitter *ORE, bool DeleteAST);

ASTrackerMapTy &getLoopToAliasSetMap() { return LoopToAliasSetMap; }		ASTrackerMapTy &getLoopToAliasSetMap() { return LoopToAliasSetMap; }
LoopInvariantCodeMotion(unsigned LicmMssaOptCap,		LoopInvariantCodeMotion(unsigned LicmMssaOptCap,
unsigned LicmMssaNoAccForPromotionCap)		unsigned LicmMssaNoAccForPromotionCap)
: LicmMssaOptCap(LicmMssaOptCap),		: LicmMssaOptCap(LicmMssaOptCap),
LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}		LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}

Show All 34 Lines	bool runOnLoop(Loop *L, LPPassManager &LPM) override {
// For the old PM, we can't use OptimizationRemarkEmitter as an analysis		// For the old PM, we can't use OptimizationRemarkEmitter as an analysis
// pass. Function analyses need to be preserved across loop transformations		// pass. Function analyses need to be preserved across loop transformations
// but ORE cannot be preserved (see comment before the pass definition).		// but ORE cannot be preserved (see comment before the pass definition).
OptimizationRemarkEmitter ORE(L->getHeader()->getParent());		OptimizationRemarkEmitter ORE(L->getHeader()->getParent());
return LICM.runOnLoop(L,		return LICM.runOnLoop(L,
&getAnalysis<AAResultsWrapperPass>().getAAResults(),		&getAnalysis<AAResultsWrapperPass>().getAAResults(),
&getAnalysis<LoopInfoWrapperPass>().getLoopInfo(),		&getAnalysis<LoopInfoWrapperPass>().getLoopInfo(),
&getAnalysis<DominatorTreeWrapperPass>().getDomTree(),		&getAnalysis<DominatorTreeWrapperPass>().getDomTree(),
		&getAnalysis<BlockFrequencyInfoWrapperPass>().getBFI(),
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(),		&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(),
&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(		&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(
*L->getHeader()->getParent()),		*L->getHeader()->getParent()),
SE ? &SE->getSE() : nullptr, MSSA, &ORE, false);		SE ? &SE->getSE() : nullptr, MSSA, &ORE, false);
}		}

/// This transformation requires natural loop information & requires that		/// This transformation requires natural loop information & requires that
/// loop preheaders be inserted into the CFG...		/// loop preheaders be inserted into the CFG...
///		///
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<BlockFrequencyInfoWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<LoopInfoWrapperPass>();		AU.addPreserved<LoopInfoWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
if (EnableMSSALoopDependency) {		if (EnableMSSALoopDependency) {
AU.addRequired<MemorySSAWrapperPass>();		AU.addRequired<MemorySSAWrapperPass>();
AU.addPreserved<MemorySSAWrapperPass>();		AU.addPreserved<MemorySSAWrapperPass>();
}		}
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
Show All 40 Lines	PreservedAnalyses LICMPass::run(Loop &L, LoopAnalysisManager &AM,

auto ORE = FAM.getCachedResult<OptimizationRemarkEmitterAnalysis>(F);		auto ORE = FAM.getCachedResult<OptimizationRemarkEmitterAnalysis>(F);
// FIXME: This should probably be optional rather than required.		// FIXME: This should probably be optional rather than required.
if (!ORE)		if (!ORE)
report_fatal_error("LICM: OptimizationRemarkEmitterAnalysis not "		report_fatal_error("LICM: OptimizationRemarkEmitterAnalysis not "
"cached at a higher level");		"cached at a higher level");

LoopInvariantCodeMotion LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap);		LoopInvariantCodeMotion LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap);
if (!LICM.runOnLoop(&L, &AR.AA, &AR.LI, &AR.DT, &AR.TLI, &AR.TTI, &AR.SE,		auto BFI = FAM.getCachedResult<BlockFrequencyAnalysis>(*F);
		if (!LICM.runOnLoop(&L, &AR.AA, &AR.LI, &AR.DT, BFI, &AR.TLI, &AR.TTI, &AR.SE,
AR.MSSA, ORE, true))		AR.MSSA, ORE, true))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

auto PA = getLoopPassPreservedAnalyses();		auto PA = getLoopPassPreservedAnalyses();

PA.preserve<DominatorTreeAnalysis>();		PA.preserve<DominatorTreeAnalysis>();
PA.preserve<LoopAnalysis>();		PA.preserve<LoopAnalysis>();
if (EnableMSSALoopDependency)		if (EnableMSSALoopDependency)
Show All 21 Lines
/// Hoist expressions out of the specified loop. Note, alias info for inner		/// Hoist expressions out of the specified loop. Note, alias info for inner
/// loop is not preserved so it is not a good idea to run LICM multiple		/// loop is not preserved so it is not a good idea to run LICM multiple
/// times on one loop.		/// times on one loop.
/// We should delete AST for inner loops in the new pass manager to avoid		/// We should delete AST for inner loops in the new pass manager to avoid
/// memory leak.		/// memory leak.
///		///
bool LoopInvariantCodeMotion::runOnLoop(		bool LoopInvariantCodeMotion::runOnLoop(
Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,		Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,
TargetLibraryInfo TLI, TargetTransformInfo TTI, ScalarEvolution *SE,		BlockFrequencyInfo BFI, TargetLibraryInfo TLI, TargetTransformInfo *TTI,
MemorySSA MSSA, OptimizationRemarkEmitter ORE, bool DeleteAST) {		ScalarEvolution SE, MemorySSA MSSA, OptimizationRemarkEmitter *ORE,
		bool DeleteAST) {
bool Changed = false;		bool Changed = false;

assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");		assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");

// If this loop has metadata indicating that LICM is not to be performed then		// If this loop has metadata indicating that LICM is not to be performed then
// just exit.		// just exit.
if (hasDisableLICMTransformsHint(L)) {		if (hasDisableLICMTransformsHint(L)) {
return false;		return false;
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	bool LoopInvariantCodeMotion::runOnLoop(
// Traverse the body of the loop in depth first order on the dominator tree so		// Traverse the body of the loop in depth first order on the dominator tree so
// that we are guaranteed to see definitions before we see uses. This allows		// that we are guaranteed to see definitions before we see uses. This allows
// us to sink instructions in one pass, without iteration. After sinking		// us to sink instructions in one pass, without iteration. After sinking
// instructions, we perform another pass to hoist them out of the loop.		// instructions, we perform another pass to hoist them out of the loop.
SinkAndHoistLICMFlags Flags = {NoOfMemAccTooLarge, LicmMssaOptCounter,		SinkAndHoistLICMFlags Flags = {NoOfMemAccTooLarge, LicmMssaOptCounter,
LicmMssaOptCap, LicmMssaNoAccForPromotionCap,		LicmMssaOptCap, LicmMssaNoAccForPromotionCap,
/IsSink=/true};		/IsSink=/true};
if (L->hasDedicatedExits())		if (L->hasDedicatedExits())
Changed \|= sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, TTI, L,		Changed \|= sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI, TLI, TTI, L,
CurAST.get(), MSSAU.get(), &SafetyInfo, Flags, ORE);		CurAST.get(), MSSAU.get(), &SafetyInfo, Flags, ORE);
Flags.IsSink = false;		Flags.IsSink = false;
if (Preheader)		if (Preheader)
Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, TLI, L,		Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI, TLI, L,
CurAST.get(), MSSAU.get(), &SafetyInfo, Flags, ORE);		CurAST.get(), MSSAU.get(), &SafetyInfo, Flags, ORE);

// Now that all loop invariants have been removed from the loop, promote any		// Now that all loop invariants have been removed from the loop, promote any
// memory references to scalars that we can.		// memory references to scalars that we can.
// Don't sink stores from loops without dedicated block exits. Exits		// Don't sink stores from loops without dedicated block exits. Exits
// containing indirect branches are not transformed by loop simplify,		// containing indirect branches are not transformed by loop simplify,
// make sure we catch that. An additional load may be generated in the		// make sure we catch that. An additional load may be generated in the
// preheader for SSA updater, so also avoid sinking when no preheader		// preheader for SSA updater, so also avoid sinking when no preheader
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
}		}

/// Walk the specified region of the CFG (defined by all blocks dominated by		/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in reverse depth		/// the specified block, and that are in the current loop) in reverse depth
/// first order w.r.t the DominatorTree. This allows us to visit uses before		/// first order w.r.t the DominatorTree. This allows us to visit uses before
/// definitions, allowing us to sink a loop body in one pass without iteration.		/// definitions, allowing us to sink a loop body in one pass without iteration.
///		///
bool llvm::sinkRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,		bool llvm::sinkRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,
DominatorTree DT, TargetLibraryInfo TLI,		DominatorTree DT, BlockFrequencyInfo BFI,
TargetTransformInfo TTI, Loop CurLoop,		TargetLibraryInfo TLI, TargetTransformInfo TTI,
AliasSetTracker CurAST, MemorySSAUpdater MSSAU,		Loop CurLoop, AliasSetTracker CurAST,
		MemorySSAUpdater *MSSAU,
ICFLoopSafetyInfo *SafetyInfo,		ICFLoopSafetyInfo *SafetyInfo,
SinkAndHoistLICMFlags &Flags,		SinkAndHoistLICMFlags &Flags,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {

// Verify inputs.		// Verify inputs.
assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&		assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&
CurLoop != nullptr && SafetyInfo != nullptr &&		CurLoop != nullptr && SafetyInfo != nullptr &&
"Unexpected input to sinkRegion.");		"Unexpected input to sinkRegion.");
Show All 32 Lines	for (BasicBlock::iterator II = BB->end(); II != BB->begin();) {
// outside of the loop. In this case, it doesn't even matter if the		// outside of the loop. In this case, it doesn't even matter if the
// operands of the instruction are loop invariant.		// operands of the instruction are loop invariant.
//		//
bool FreeInLoop = false;		bool FreeInLoop = false;
if (isNotUsedOrFreeInLoop(I, CurLoop, SafetyInfo, TTI, FreeInLoop) &&		if (isNotUsedOrFreeInLoop(I, CurLoop, SafetyInfo, TTI, FreeInLoop) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true, &Flags,		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true, &Flags,
ORE) &&		ORE) &&
!I.mayHaveSideEffects()) {		!I.mayHaveSideEffects()) {
if (sink(I, LI, DT, CurLoop, SafetyInfo, MSSAU, ORE)) {		if (sink(I, LI, DT, BFI, CurLoop, SafetyInfo, MSSAU, ORE)) {
if (!FreeInLoop) {		if (!FreeInLoop) {
++II;		++II;
eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);		eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);
}		}
Changed = true;		Changed = true;
}		}
}		}
}		}
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	BasicBlock getOrCreateHoistedBlock(BasicBlock BB) {

assert(CurLoop->getLoopPreheader() &&		assert(CurLoop->getLoopPreheader() &&
"Hoisting blocks should not have destroyed preheader");		"Hoisting blocks should not have destroyed preheader");
return HoistDestinationMap[BB];		return HoistDestinationMap[BB];
}		}
};		};
} // namespace		} // namespace

		// Hoisting/sinking instruction out of a loop isn't always beneficial. It's only
		// only worthwhile if the destination block is actually colder than current
		// block.
		static bool worthSinkOrHoistInst(Instruction &I, BasicBlock *DstBlock,
		OptimizationRemarkEmitter *ORE,
		BlockFrequencyInfo *BFI) {
		// Check block frequency only when runtime profile is available.
		// to avoid pathological cases. With static profile, lean towards
		// hosting because it helps canonicalize the loop for vectorizer.
		if (!DstBlock->getParent()->hasProfileData())
		return true;

		if (!HoistSinkColdnessThreshold \|\| !BFI)
		return true;

		BasicBlock *SrcBlock = I.getParent();
		if (BFI->getBlockFreq(DstBlock).getFrequency() / HoistSinkColdnessThreshold >
		BFI->getBlockFreq(SrcBlock).getFrequency()) {
		ORE->emit([&]() {
		return OptimizationRemarkMissed(DEBUG_TYPE, "SinkHoistInst", &I)
		<< "failed to sink or hoist instruction because containing block "
		"has lower frequency than destination block";
		});
		return false;
		}

		return true;
		}

/// Walk the specified region of the CFG (defined by all blocks dominated by		/// Walk the specified region of the CFG (defined by all blocks dominated by
/// the specified block, and that are in the current loop) in depth first		/// the specified block, and that are in the current loop) in depth first
/// order w.r.t the DominatorTree. This allows us to visit definitions before		/// order w.r.t the DominatorTree. This allows us to visit definitions before
/// uses, allowing us to hoist a loop body in one pass without iteration.		/// uses, allowing us to hoist a loop body in one pass without iteration.
///		///
bool llvm::hoistRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,		bool llvm::hoistRegion(DomTreeNode N, AliasAnalysis AA, LoopInfo *LI,
DominatorTree DT, TargetLibraryInfo TLI, Loop *CurLoop,		DominatorTree DT, BlockFrequencyInfo BFI,
		TargetLibraryInfo TLI, Loop CurLoop,
AliasSetTracker CurAST, MemorySSAUpdater MSSAU,		AliasSetTracker CurAST, MemorySSAUpdater MSSAU,
ICFLoopSafetyInfo *SafetyInfo,		ICFLoopSafetyInfo *SafetyInfo,
SinkAndHoistLICMFlags &Flags,		SinkAndHoistLICMFlags &Flags,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
// Verify inputs.		// Verify inputs.
assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&		assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&
CurLoop != nullptr && SafetyInfo != nullptr &&		CurLoop != nullptr && SafetyInfo != nullptr &&
"Unexpected input to hoistRegion.");		"Unexpected input to hoistRegion.");
Show All 34 Lines	for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) {
if (isInstructionTriviallyDead(&I, TLI))		if (isInstructionTriviallyDead(&I, TLI))
eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);		eraseInstruction(I, *SafetyInfo, CurAST, MSSAU);
Changed = true;		Changed = true;
continue;		continue;
}		}

// Try hoisting the instruction out to the preheader. We can only do		// Try hoisting the instruction out to the preheader. We can only do
// this if all of the operands of the instruction are loop invariant and		// this if all of the operands of the instruction are loop invariant and
// if it is safe to hoist the instruction.		// if it is safe to hoist the instruction. We also check block frequency
		// to make sure instruction only gets hoisted into colder blocks.
// TODO: It may be safe to hoist if we are hoisting to a conditional block		// TODO: It may be safe to hoist if we are hoisting to a conditional block
// and we have accurately duplicated the control flow from the loop header		// and we have accurately duplicated the control flow from the loop header
// to that block.		// to that block.
if (CurLoop->hasLoopInvariantOperands(&I) &&		if (CurLoop->hasLoopInvariantOperands(&I) &&
canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true, &Flags,		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, MSSAU, true, &Flags,
ORE) &&		ORE) &&
		worthSinkOrHoistInst(I, CurLoop->getLoopPreheader(), ORE, BFI) &&
isSafeToExecuteUnconditionally(		isSafeToExecuteUnconditionally(
I, DT, CurLoop, SafetyInfo, ORE,		I, DT, CurLoop, SafetyInfo, ORE,
CurLoop->getLoopPreheader()->getTerminator())) {		CurLoop->getLoopPreheader()->getTerminator())) {
hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,		hoist(I, DT, CurLoop, CFH.getOrCreateHoistedBlock(BB), SafetyInfo,
MSSAU, ORE);		MSSAU, ORE);
HoistedInstructions.push_back(&I);		HoistedInstructions.push_back(&I);
Changed = true;		Changed = true;
continue;		continue;
▲ Show 20 Lines • Show All 684 Lines • ▼ Show 20 Lines
}		}

/// When an instruction is found to only be used outside of the loop, this		/// When an instruction is found to only be used outside of the loop, this
/// function moves it to the exit blocks and patches up SSA form as needed.		/// function moves it to the exit blocks and patches up SSA form as needed.
/// This method is guaranteed to remove the original instruction from its		/// This method is guaranteed to remove the original instruction from its
/// position, and may either delete it or move it to outside of the loop.		/// position, and may either delete it or move it to outside of the loop.
///		///
static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,		static bool sink(Instruction &I, LoopInfo LI, DominatorTree DT,
const Loop CurLoop, ICFLoopSafetyInfo SafetyInfo,		BlockFrequencyInfo BFI, const Loop CurLoop,
MemorySSAUpdater MSSAU, OptimizationRemarkEmitter ORE) {		ICFLoopSafetyInfo SafetyInfo, MemorySSAUpdater MSSAU,
		OptimizationRemarkEmitter *ORE) {
LLVM_DEBUG(dbgs() << "LICM sinking instruction: " << I << "\n");		LLVM_DEBUG(dbgs() << "LICM sinking instruction: " << I << "\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "InstSunk", &I)		return OptimizationRemark(DEBUG_TYPE, "InstSunk", &I)
<< "sinking " << ore::NV("Inst", &I);		<< "sinking " << ore::NV("Inst", &I);
});		});
bool Changed = false;		bool Changed = false;
if (isa<LoadInst>(I))		if (isa<LoadInst>(I))
++NumMovedLoads;		++NumMovedLoads;
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
#endif		#endif

// Clones of this instruction. Don't create more than one per exit block!		// Clones of this instruction. Don't create more than one per exit block!
SmallDenseMap<BasicBlock , Instruction , 32> SunkCopies;		SmallDenseMap<BasicBlock , Instruction , 32> SunkCopies;

// If this instruction is only used outside of the loop, then all users are		// If this instruction is only used outside of the loop, then all users are
// PHI nodes in exit blocks due to LCSSA form. Just RAUW them with clones of		// PHI nodes in exit blocks due to LCSSA form. Just RAUW them with clones of
// the instruction.		// the instruction.
		// First check if I is worth sinking for all uses. Sink only when it is worth
		// across all uses.
SmallSetVector<User*, 8> Users(I.user_begin(), I.user_end());		SmallSetVector<User*, 8> Users(I.user_begin(), I.user_end());
		SmallVector<PHINode*, 8> ExitPNs;
for (auto *UI : Users) {		for (auto *UI : Users) {
auto *User = cast<Instruction>(UI);		auto *User = cast<Instruction>(UI);

if (CurLoop->contains(User))		if (CurLoop->contains(User))
continue;		continue;

PHINode *PN = cast<PHINode>(User);		PHINode *PN = cast<PHINode>(User);
assert(ExitBlockSet.count(PN->getParent()) &&		assert(ExitBlockSet.count(PN->getParent()) &&
"The LCSSA PHI is not in an exit block!");		"The LCSSA PHI is not in an exit block!");

		if (!worthSinkOrHoistInst(I, PN->getParent(), ORE, BFI)) {
		return Changed;
		}

		ExitPNs.push_back(PN);
		}

		for (auto *PN: ExitPNs) {
// The PHI must be trivially replaceable.		// The PHI must be trivially replaceable.
Instruction *New = sinkThroughTriviallyReplaceablePHI(		Instruction *New = sinkThroughTriviallyReplaceablePHI(
PN, &I, LI, SunkCopies, SafetyInfo, CurLoop, MSSAU);		PN, &I, LI, SunkCopies, SafetyInfo, CurLoop, MSSAU);
PN->replaceAllUsesWith(New);		PN->replaceAllUsesWith(New);
eraseInstruction(PN, SafetyInfo, nullptr, nullptr);		eraseInstruction(PN, SafetyInfo, nullptr, nullptr);
Changed = true;		Changed = true;
}		}
return Changed;		return Changed;
▲ Show 20 Lines • Show All 677 Lines • Show Last 20 Lines

llvm/trunk/test/Other/opt-O2-pipeline.ll

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Rotate Loops			; CHECK-NEXT: Rotate Loops
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
				; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
				; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unswitch loops			; CHECK-NEXT: Unswitch loops
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	Show All 35 Lines
	; CHECK-NEXT: Jump Threading			; CHECK-NEXT: Jump Threading
	; CHECK-NEXT: Value Propagation			; CHECK-NEXT: Value Propagation
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Dead Store Elimination			; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Post-Dominator Tree Construction			; CHECK-NEXT: Post-Dominator Tree Construction
	; CHECK-NEXT: Aggressive Dead Code Elimination			; CHECK-NEXT: Aggressive Dead Code Elimination
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Warn about non-applied transformations			; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	; CHECK-NEXT: Strip Unused Function Prototypes			; CHECK-NEXT: Strip Unused Function Prototypes
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/trunk/test/Other/opt-O3-pipeline.ll

	Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Rotate Loops			; CHECK-NEXT: Rotate Loops
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
				; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
				; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unswitch loops			; CHECK-NEXT: Unswitch loops
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	Show All 35 Lines
	; CHECK-NEXT: Jump Threading			; CHECK-NEXT: Jump Threading
	; CHECK-NEXT: Value Propagation			; CHECK-NEXT: Value Propagation
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Dead Store Elimination			; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Post-Dominator Tree Construction			; CHECK-NEXT: Post-Dominator Tree Construction
	; CHECK-NEXT: Aggressive Dead Code Elimination			; CHECK-NEXT: Aggressive Dead Code Elimination
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Warn about non-applied transformations			; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	; CHECK-NEXT: Strip Unused Function Prototypes			; CHECK-NEXT: Strip Unused Function Prototypes
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/trunk/test/Other/opt-Os-pipeline.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Rotate Loops			; CHECK-NEXT: Rotate Loops
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
				; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
				; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unswitch loops			; CHECK-NEXT: Unswitch loops
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	Show All 35 Lines
	; CHECK-NEXT: Jump Threading			; CHECK-NEXT: Jump Threading
	; CHECK-NEXT: Value Propagation			; CHECK-NEXT: Value Propagation
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Phi Values Analysis			; CHECK-NEXT: Phi Values Analysis
	; CHECK-NEXT: Memory Dependence Analysis			; CHECK-NEXT: Memory Dependence Analysis
	; CHECK-NEXT: Dead Store Elimination			; CHECK-NEXT: Dead Store Elimination
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Post-Dominator Tree Construction			; CHECK-NEXT: Post-Dominator Tree Construction
	; CHECK-NEXT: Aggressive Dead Code Elimination			; CHECK-NEXT: Aggressive Dead Code Elimination
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Unroll loops			; CHECK-NEXT: Unroll loops
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Combine redundant instructions			; CHECK-NEXT: Combine redundant instructions
				; CHECK-NEXT: Branch Probability Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	; CHECK-NEXT: LCSSA Verifier			; CHECK-NEXT: LCSSA Verifier
	; CHECK-NEXT: Loop-Closed SSA Form Pass			; CHECK-NEXT: Loop-Closed SSA Form Pass
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
				; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Loop Pass Manager			; CHECK-NEXT: Loop Pass Manager
	; CHECK-NEXT: Loop Invariant Code Motion			; CHECK-NEXT: Loop Invariant Code Motion
	; CHECK-NEXT: Lazy Branch Probability Analysis			; CHECK-NEXT: Lazy Branch Probability Analysis
	; CHECK-NEXT: Lazy Block Frequency Analysis			; CHECK-NEXT: Lazy Block Frequency Analysis
	; CHECK-NEXT: Optimization Remark Emitter			; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Warn about non-applied transformations			; CHECK-NEXT: Warn about non-applied transformations
	; CHECK-NEXT: Alignment from assumptions			; CHECK-NEXT: Alignment from assumptions
	; CHECK-NEXT: Strip Unused Function Prototypes			; CHECK-NEXT: Strip Unused Function Prototypes
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/trunk/test/Other/pass-pipelines.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-O2-NEXT: Remove unused exception handling info			; CHECK-O2-NEXT: Remove unused exception handling info
	; CHECK-O2-NEXT: Function Integration/Inlining			; CHECK-O2-NEXT: Function Integration/Inlining
	; CHECK-O2-NEXT: Deduce function attributes			; CHECK-O2-NEXT: Deduce function attributes
	; Next up is the main function pass pipeline. It shouldn't be split up and			; Next up is the main function pass pipeline. It shouldn't be split up and
	; should contain the main loop pass pipeline as well.			; should contain the main loop pass pipeline as well.
	; CHECK-O2-NEXT: FunctionPass Manager			; CHECK-O2-NEXT: FunctionPass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; CHECK-O2: Loop Pass Manager			; CHECK-O2: Loop Pass Manager
	; CHECK-O2-NOT: Manager			; Requiring block frequency for LICM will place ICM and rotation under separate Loop Pass Manager
	; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and			; FIXME: We shouldn't be pulling out to simplify-cfg and instcombine and
	; causing new loop pass managers.			; causing new loop pass managers.
	; CHECK-O2: Simplify the CFG			; CHECK-O2: Simplify the CFG
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; CHECK-O2: Combine redundant instructions			; CHECK-O2: Combine redundant instructions
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; CHECK-O2: Loop Pass Manager			; CHECK-O2: Loop Pass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LICM/sink.ll

	; RUN: opt -S -licm < %s \| FileCheck %s --check-prefix=CHECK-LICM			; RUN: opt -S -licm -licm -licm-coldness-threshold=0 < %s \| FileCheck %s --check-prefix=CHECK-LICM
				; RUN: opt -S -licm -licm < %s \| FileCheck %s --check-prefix=CHECK-BFI-LICM
	; RUN: opt -S -licm < %s \| opt -S -loop-sink \| FileCheck %s --check-prefix=CHECK-SINK			; RUN: opt -S -licm < %s \| opt -S -loop-sink \| FileCheck %s --check-prefix=CHECK-SINK
	; RUN: opt -S < %s -passes='require<opt-remark-emit>,loop(licm),loop-sink' \			; RUN: opt -S < %s -passes='require<opt-remark-emit>,loop(licm),loop-sink' \
	; RUN: \| FileCheck %s --check-prefix=CHECK-SINK			; RUN: \| FileCheck %s --check-prefix=CHECK-SINK
	; RUN: opt -S -licm -enable-mssa-loop-dependency=true -verify-memoryssa < %s \| FileCheck %s --check-prefix=CHECK-LICM			; RUN: opt -S -licm -licm-coldness-threshold=0 -enable-mssa-loop-dependency=true -verify-memoryssa < %s \| FileCheck %s --check-prefix=CHECK-LICM
				; RUN: opt -S -licm -enable-mssa-loop-dependency=true -verify-memoryssa < %s \| FileCheck %s --check-prefix=CHECK-BFI-LICM

	; Original source code:			; Original source code:
	; int g;			; int g;
	; int foo(int p, int x) {			; int foo(int p, int x) {
	; for (int i = 0; i != x; i++)			; for (int i = 0; i != x; i++)
	; if (__builtin_expect(i == p, 0)) {			; if (__builtin_expect(i == p, 0)) {
	; x += g; x *= g;			; x += g; x *= g;
	; }			; }
	Show All 10 Lines

	.lr.ph.preheader:			.lr.ph.preheader:
	br label %.lr.ph			br label %.lr.ph

	; CHECK-LICM: .lr.ph.preheader:			; CHECK-LICM: .lr.ph.preheader:
	; CHECK-LICM: load i32, i32* @g			; CHECK-LICM: load i32, i32* @g
	; CHECK-LICM: br label %.lr.ph			; CHECK-LICM: br label %.lr.ph

				; CHECK-BFI-LICM: .lr.ph.preheader:
				; CHECK-BFI-LICM-NOT: load i32, i32* @g
				; CHECK-BFI-LICM: br label %.lr.ph

	.lr.ph:			.lr.ph:
	%.03 = phi i32 [ %8, %.combine ], [ 0, %.lr.ph.preheader ]			%.03 = phi i32 [ %8, %.combine ], [ 0, %.lr.ph.preheader ]
	%.012 = phi i32 [ %.1, %.combine ], [ %1, %.lr.ph.preheader ]			%.012 = phi i32 [ %.1, %.combine ], [ %1, %.lr.ph.preheader ]
	%4 = icmp eq i32 %.03, %0			%4 = icmp eq i32 %.03, %0
	br i1 %4, label %.then, label %.combine, !prof !1			br i1 %4, label %.then, label %.combine, !prof !1

	.then:			.then:
	%5 = load i32, i32* @g, align 4			%5 = load i32, i32* @g, align 4
	Show All 25 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Make Loop ICM profile awareClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 214550

llvm/trunk/include/llvm/Transforms/Utils/LoopUtils.h

llvm/trunk/lib/Transforms/Scalar/LICM.cpp

llvm/trunk/test/Other/opt-O2-pipeline.ll

llvm/trunk/test/Other/opt-O3-pipeline.ll

llvm/trunk/test/Other/opt-Os-pipeline.ll

llvm/trunk/test/Other/pass-pipelines.ll

llvm/trunk/test/Transforms/LICM/sink.ll

[LICM] Make Loop ICM profile aware
ClosedPublic