This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/
-
llvm/
-
Transforms/
-
Scalar/
-
LICM.h
-
Utils/
-
LoopUtils.h
-
lib/
-
Passes/
-
PassRegistry.def
-
Transforms/Scalar/
-
Scalar/
-
LICM.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
lnicm.ll

Differential D104180

[LICM] Create LoopNest Invariant Code Motion (LNICM) pass
ClosedPublic

Authored by uint256_t on Jun 12 2021, 7:43 AM.

Download Raw Diff

Details

Reviewers

Whitney
nikic
lebedev.ri

Commits

rG74f0f9a455c5: [LICM] Create LoopNest Invariant Code Motion (LNICM) pass

Summary

This patch adds a new pass called LNICM which is a LoopNest version of LICM and a test case to show how LNICM works.
Basically, LNICM only hoists invariants out of loop nest (not a loop) to keep/make perfect loop nest. This enables later optimizations that require perfect loop nest.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

uint256_t created this revision.Jun 12 2021, 7:43 AM

Herald added subscribers: asbirlea, hiraditya, mgorny. · View Herald TranscriptJun 12 2021, 7:43 AM

uint256_t requested review of this revision.Jun 12 2021, 7:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2021, 7:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B108971: Diff 351657.Jun 12 2021, 7:44 AM

Following patches will utilize the LoopNest structure for more efficient optimization.

Could you elaborate on what kind of improvements you plan on adding using LoopNest? I think it would also be good to share a patch that makes implements such an additional optimization so there's a clear path towards concrete improvements and it would also show why using LoopNest is needed/beneficial.

As the patch is written I am concerned that this appears to mostly duplicate the existing code of LICM (which adds a maintenance burden) and just changes the type of the pass.

As the patch is written I am concerned that this appears to mostly duplicate the existing code of LICM (which adds a maintenance burden) and just changes the type of the pass.

I'll update the code to contain less duplication. I'm sorry to trouble you.

Could you elaborate on what kind of improvements you plan on adding using LoopNest?

With LoopNest pass, it'll be easy to efficiently hoist invariants in nested loop out of the nested loop at once.

for (i = 0 to 10) 
   for (k = 0 to 10) 
      j = invariant * 2;

will transform to

x = invariant * 2;
for (i = 0 to 10) 
   for (k = 0 to 10) 
      j = x;

uint256_t updated this revision to Diff 351665.Jun 12 2021, 9:13 AM

Herald added a subscriber: lxfind. · View Herald TranscriptJun 12 2021, 9:13 AM

Harbormaster completed remote builds in B108977: Diff 351665.Jun 12 2021, 9:13 AM

uint256_t updated this revision to Diff 351667.Jun 12 2021, 9:14 AM

uint256_t removed a subscriber: lxfind.Jun 12 2021, 9:16 AM

uint256_t edited the summary of this revision. (Show Details)Jun 12 2021, 9:30 AM

Harbormaster completed remote builds in B108979: Diff 351667.Jun 12 2021, 10:09 AM

I will second @fhahn's comments. Please include follow up patches with the improvements you mention, and tests that motivate those.

Actually, LNICM is to enable other optimizations in loop pipeline that require perfect loop nest by hoisting innermost invariants out of loop nest at once.
I said that LNICM can efficiently perform LICM, but after tweaking the code and running some tests, I realized that it might not get faster.
However, even if it gets no faster, LNICM is worth adding as a new pass.

In D104180#2819846, @uint256_t wrote:

Actually, LNICM is to enable other optimizations in loop pipeline that require perfect loop nest by hoisting innermost invariants out of loop nest at once.
I said that LNICM can efficiently perform LICM, but after tweaking the code and running some tests, I realized that it might not get faster.
However, even if it gets no faster, LNICM is worth adding as a new pass.

If we get the code duplication and no faster code, why is it worth adding? Could you please clarify?

LNICM itself is no faster, but we can expect other optimizations to run thanks to LNICM and may be able to get more optimized code.

Got it! Can you get some tests where some of these differences appear?
It would be a good motivator to see a concrete example; here are a couple of scenarios I can think of, but perhaps your use case is different.

LNICM can hoist additionally where LICM doesn't. If that can happen, we should show case with a test.
another loop nest pass, in the same LPM with LNICM, and how optimizations would differ if LICM was separated.

I'm sorry to be late to reply.
I'm making small tests and some code changes to demonstrate the following:

LNICM only hoists invariants out of the outermost loop, to keep/make perfect loop nest
- x[k][j] below could be hoisted out of i-loop (by LICM), but LNICM wouldn't do that because it would break the perfect nest loop.

void test(int arr[10][10][10], int x[10][10]) {
  for (int k = 0; k < 10; k++) 
    for (int j = 0; j < 10; j++) 
      for (int i = 0; i < 10; i++) 
        arr[i][k][j] += x[k][j];
}

Since LNICM keeps/makes perfect loop nest, passes like loop-interchange (that is loop nest pass) can perform optimizations

Whitney added a reviewer: nikic.Jun 20 2021, 3:09 PM

Add a test case to demonstrate that LNICM hoists invariants only out of loop nest and keeps perfect loop nest.

I'm personally is still having a hard time understanding the final picture. (same for loop-idiom pass patch)
Is LoopNest Invariant Code Motion Pass only about not doing the movement that breaks perfect loop nest?
What about all the code that is no longer moved?
What is the envisioned final pipeline structure?
Do we end up having to run both LNICM and LICM?

These two patches are proposing pretty significant changes, but are very light on the motivation/details.

Harbormaster completed remote builds in B110430: Diff 353675.Jun 22 2021, 10:59 AM

Is LoopNest Invariant Code Motion Pass only about not doing the movement that breaks perfect loop nest?

Yes.

What about all the code that is no longer moved?
What is the envisioned final pipeline structure?
Do we end up having to run both LNICM and LICM?

Yes, we need to run both LNICM and LICM. Sometimes we should run either one.

Update a test case to show loop-interchange doesn't run after running LICM but LNICM.

uint256_t edited the summary of this revision. (Show Details)Jun 23 2021, 8:02 AM

uint256_t added a subscriber: nikic.

Harbormaster completed remote builds in B110628: Diff 353968.Jun 23 2021, 8:24 AM

I don't introduce a lot of changes for LNICM, and reuse as much code in LICM as possible.
Some transformations really benefit from perfect loop nest. One of the examples is loop-interchange as shown in a lit test.
LICM cannot maintain perfect loop nest, so we need to add LNICM.

LGTM, the change to LICM is pretty small, and there is a motivating reason. Will approve tomorrow if there is no other concerns.

I have not been convinced so far.
This patch's description, and replies to the review comments are being prescriptive,
as-if everyone should be aware of some greater good before even commenting on it.

I am sorry to make you have any unintentional negative feelings. English is not my first language.
I appreciate all review comments I have gotten. I will try to have the motivation more up front next time.

I am in a GSoC project, where we try to utilize the LoopNest pass. When we investigate what would be benefited from LoopNest, we found that LICM could be a good candidate.
Unimodular transformations can be hugely simplified by only operating on perfect loop nest.
Loop interchange is one of the unimodular transformations which is implemented in LLVM, and it is implemented as no intervening code is allowed between loops in a loop nest.
I have included a lit test to illustrate that the current LICM could prevent code to be interchanged, where the proposed LNICM could prevent this problem.
The idea of LICM could still benefit for loop interchange, where a loop nest is imperfect to start with, but could be hoisted to make perfect. I have just modified the lit test to show this.

We have considered not to add a new pass, but add a new parameter to the constructor of the existing LICM, to make it operate like the LNICM when the given argument is true.
However, we thought adding a new pass to hoist invariance in the scope of loop nest is clearer.
What do you think @lebedev.ri ?

Whitney added a reviewer: lebedev.ri.Jun 26 2021, 9:52 AM

Harbormaster completed remote builds in B111143: Diff 354693.Jun 26 2021, 10:24 AM

I think the main thing that I don't get here is how this fits into our optimization pipeline, and whether there is some viable path to default enablement here. In abstract, I can see that the LNICM + LoopInterchange combination makes sense. But the way our pipeline is currently structured, LICM and LoopInterchange are run as part of two separate LPMs. This means that by the time LoopInterchange runs, LICM will have been run on all loops already, and hoisted operations as far as possible, including out of loop nests.

This would only make sense to me if we wanted to add a LNICM run into LPM2, but why would we want to do that? Also, LoopInterchange doesn't preserve MSSA, so it can't be part of the same LPM as LICM/LNICM anyway. Or technically it can, because we still support AST-based LICM, but that's only because we haven't gotten around to removing it yet. It's not an option that exists longer term.

eopXD added a subscriber: eopXD.Jun 28 2021, 8:53 PM

We're thinking of replacing LICM in LPM1 with LNICM and adding LPM3 containing LICM after LPM2.
In long term, LPM3 should be removed though.

In D104180#2846602, @uint256_t wrote:

We're thinking of replacing LICM in LPM1 with LNICM and adding LPM3 containing LICM after LPM2.
In long term, LPM3 should be removed though.

When you say "In long term, LPM3 should be removed though.", do you mean when LoopInterchange can preserve MSSA, then it could be move to LPM1, so LICM can be move to LPM2?
I think LICM is always needed in long term, correct?

Yes, you're right. My explanation was confusing.

uint256_t added a comment.Jun 30 2021, 8:51 AM

This comment was removed by uint256_t.

Whitney retitled this revision from [NFC] [LICM] Create LoopNest Invariant Code Motion (LNICM) pass to [LICM] Create LoopNest Invariant Code Motion (LNICM) pass.Jul 6 2021, 11:54 AM

What do you think about changes, @asbirlea @fhahn @lebedev.ri @nikic?

This change LGTM.

changes to the existing LICM pass is minimal
there is an motivating example

More work need to be done before modifying the pipeline.

@lebedev.ri, @nikic, have @uint256_t successfully clarified your concerns?
I will attempt to approve it in two days if there are no future comments.

Whitney accepted this revision.Jul 18 2021, 10:03 AM

This revision is now accepted and ready to land.Jul 18 2021, 10:03 AM

I'm okay with landing this, as the change to LICM is trivial and won't be an undue maintenance burden.

I remain somewhat skeptical about the proposed pipeline changes though. I believe it is pretty important that LICM runs early in the loop pipeline, e.g. in D99249 we've seen that we sometimes need it already before LoopRotate (but also after LoopRotate). Replacing early LICM passes with LNICM means we don't perform (early) LICM anymore in some cases, which will presumably affect other passes in the pipeline adversely. At the same time, your proposal to add an additional LICM pass after LPM2 will have a significant negative compile-time impact because it requires recomputing MemorySSA. Though it's worth mentioning that the simplification pipeline already contains another late LICM pass, so possibly adding another one wouldn't actually be necessary.

But anyway, we can land this and then evaluate any pipeline changes separately.

Closed by commit rG74f0f9a455c5: [LICM] Create LoopNest Invariant Code Motion (LNICM) pass (authored by maekawatoshiki <konndennsa@gmail.com>). · Explain WhyJul 19 2021, 8:32 AM

This revision was automatically updated to reflect the committed changes.

maekawatoshiki <konndennsa@gmail.com> added a commit: rG74f0f9a455c5: [LICM] Create LoopNest Invariant Code Motion (LNICM) pass.

uint256_t mentioned this in D108087: [SimpleLoopUnswitch] Create SimpleLoopNestUnswitch pass.Aug 20 2021, 4:05 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Scalar/

LICM.h

16 lines

Utils/

LoopUtils.h

2 lines

lib/

Passes/

PassRegistry.def

1 line

Transforms/

Scalar/

LICM.cpp

38 lines

test/

Transforms/

LICM/

lnicm.ll

103 lines

Diff 359807

llvm/include/llvm/Transforms/Scalar/LICM.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	LICMPass()
: LicmMssaOptCap(SetLicmMssaOptCap),		: LicmMssaOptCap(SetLicmMssaOptCap),
LicmMssaNoAccForPromotionCap(SetLicmMssaNoAccForPromotionCap) {}		LicmMssaNoAccForPromotionCap(SetLicmMssaNoAccForPromotionCap) {}
LICMPass(unsigned LicmMssaOptCap, unsigned LicmMssaNoAccForPromotionCap)		LICMPass(unsigned LicmMssaOptCap, unsigned LicmMssaNoAccForPromotionCap)
: LicmMssaOptCap(LicmMssaOptCap),		: LicmMssaOptCap(LicmMssaOptCap),
LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}		LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}
PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR, LPMUpdater &U);		LoopStandardAnalysisResults &AR, LPMUpdater &U);
};		};

		/// Performs LoopNest Invariant Code Motion Pass.
		class LNICMPass : public PassInfoMixin<LNICMPass> {
		unsigned LicmMssaOptCap;
		unsigned LicmMssaNoAccForPromotionCap;

		public:
		LNICMPass()
		: LicmMssaOptCap(SetLicmMssaOptCap),
		LicmMssaNoAccForPromotionCap(SetLicmMssaNoAccForPromotionCap) {}
		LNICMPass(unsigned LicmMssaOptCap, unsigned LicmMssaNoAccForPromotionCap)
		: LicmMssaOptCap(LicmMssaOptCap),
		LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}
		PreservedAnalyses run(LoopNest &L, LoopAnalysisManager &AM,
		LoopStandardAnalysisResults &AR, LPMUpdater &U);
		};
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_SCALAR_LICM_H		#endif // LLVM_TRANSFORMS_SCALAR_LICM_H

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	/// Takes DomTreeNode, AAResults, LoopInfo, DominatorTree,			/// Takes DomTreeNode, AAResults, LoopInfo, DominatorTree,
	/// BlockFrequencyInfo, TargetLibraryInfo, Loop, AliasSet information for all			/// BlockFrequencyInfo, TargetLibraryInfo, Loop, AliasSet information for all
	/// instructions of the loop and loop safety information as arguments.			/// instructions of the loop and loop safety information as arguments.
	/// Diagnostics is emitted via \p ORE. It returns changed status.			/// Diagnostics is emitted via \p ORE. It returns changed status.
	bool hoistRegion(DomTreeNode , AAResults , LoopInfo , DominatorTree ,			bool hoistRegion(DomTreeNode , AAResults , LoopInfo , DominatorTree ,
	BlockFrequencyInfo , TargetLibraryInfo , Loop *,			BlockFrequencyInfo , TargetLibraryInfo , Loop *,
	AliasSetTracker , MemorySSAUpdater , ScalarEvolution *,			AliasSetTracker , MemorySSAUpdater , ScalarEvolution *,
	ICFLoopSafetyInfo *, SinkAndHoistLICMFlags &,			ICFLoopSafetyInfo *, SinkAndHoistLICMFlags &,
	OptimizationRemarkEmitter *);			OptimizationRemarkEmitter *, bool);

	/// This function deletes dead loops. The caller of this function needs to			/// This function deletes dead loops. The caller of this function needs to
	/// guarantee that the loop is infact dead.			/// guarantee that the loop is infact dead.
	/// The function requires a bunch or prerequisites to be present:			/// The function requires a bunch or prerequisites to be present:
	/// - The loop needs to be in LCSSA form			/// - The loop needs to be in LCSSA form
	/// - The loop needs to have a Preheader			/// - The loop needs to have a Preheader
	/// - A unique dedicated exit block must exist			/// - A unique dedicated exit block must exist
	///			///
	▲ Show 20 Lines • Show All 334 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 409 Lines • ▼ Show 20 Lines

	#ifndef LOOP_PASS			#ifndef LOOP_PASS
	#define LOOP_PASS(NAME, CREATE_PASS)			#define LOOP_PASS(NAME, CREATE_PASS)
	#endif			#endif
	LOOP_PASS("canon-freeze", CanonicalizeFreezeInLoopsPass())			LOOP_PASS("canon-freeze", CanonicalizeFreezeInLoopsPass())
	LOOP_PASS("dot-ddg", DDGDotPrinterPass())			LOOP_PASS("dot-ddg", DDGDotPrinterPass())
	LOOP_PASS("invalidate<all>", InvalidateAllAnalysesPass())			LOOP_PASS("invalidate<all>", InvalidateAllAnalysesPass())
	LOOP_PASS("licm", LICMPass())			LOOP_PASS("licm", LICMPass())
				LOOP_PASS("lnicm", LNICMPass())
	LOOP_PASS("loop-flatten", LoopFlattenPass())			LOOP_PASS("loop-flatten", LoopFlattenPass())
	LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())			LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())
	LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())			LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())
	LOOP_PASS("loop-interchange", LoopInterchangePass())			LOOP_PASS("loop-interchange", LoopInterchangePass())
	LOOP_PASS("loop-rotate", LoopRotatePass())			LOOP_PASS("loop-rotate", LoopRotatePass())
	LOOP_PASS("no-op-loop", NoOpLoopPass())			LOOP_PASS("no-op-loop", NoOpLoopPass())
	LOOP_PASS("print", PrintLoopPass(dbgs()))			LOOP_PASS("print", PrintLoopPass(dbgs()))
	LOOP_PASS("loop-deletion", LoopDeletionPass())			LOOP_PASS("loop-deletion", LoopDeletionPass())
	Show All 28 Lines

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
static SmallVector<SmallSetVector<Value *, 8>, 0>		static SmallVector<SmallSetVector<Value *, 8>, 0>
collectPromotionCandidates(MemorySSA MSSA, AliasAnalysis AA, Loop *L);		collectPromotionCandidates(MemorySSA MSSA, AliasAnalysis AA, Loop *L);

namespace {		namespace {
struct LoopInvariantCodeMotion {		struct LoopInvariantCodeMotion {
bool runOnLoop(Loop L, AAResults AA, LoopInfo LI, DominatorTree DT,		bool runOnLoop(Loop L, AAResults AA, LoopInfo LI, DominatorTree DT,
BlockFrequencyInfo BFI, TargetLibraryInfo TLI,		BlockFrequencyInfo BFI, TargetLibraryInfo TLI,
TargetTransformInfo TTI, ScalarEvolution SE, MemorySSA *MSSA,		TargetTransformInfo TTI, ScalarEvolution SE, MemorySSA *MSSA,
OptimizationRemarkEmitter *ORE);		OptimizationRemarkEmitter *ORE, bool LoopNestMode = false);

LoopInvariantCodeMotion(unsigned LicmMssaOptCap,		LoopInvariantCodeMotion(unsigned LicmMssaOptCap,
unsigned LicmMssaNoAccForPromotionCap)		unsigned LicmMssaNoAccForPromotionCap)
: LicmMssaOptCap(LicmMssaOptCap),		: LicmMssaOptCap(LicmMssaOptCap),
LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}		LicmMssaNoAccForPromotionCap(LicmMssaNoAccForPromotionCap) {}

private:		private:
unsigned LicmMssaOptCap;		unsigned LicmMssaOptCap;
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	PreservedAnalyses LICMPass::run(Loop &L, LoopAnalysisManager &AM,
PA.preserve<DominatorTreeAnalysis>();		PA.preserve<DominatorTreeAnalysis>();
PA.preserve<LoopAnalysis>();		PA.preserve<LoopAnalysis>();
if (AR.MSSA)		if (AR.MSSA)
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();

return PA;		return PA;
}		}

		PreservedAnalyses LNICMPass::run(LoopNest &LN, LoopAnalysisManager &AM,
		LoopStandardAnalysisResults &AR,
		LPMUpdater &) {
		// For the new PM, we also can't use OptimizationRemarkEmitter as an analysis
		// pass. Function analyses need to be preserved across loop transformations
		// but ORE cannot be preserved (see comment before the pass definition).
		OptimizationRemarkEmitter ORE(LN.getParent());

		LoopInvariantCodeMotion LICM(LicmMssaOptCap, LicmMssaNoAccForPromotionCap);

		Loop &OutermostLoop = LN.getOutermostLoop();
		bool Changed = LICM.runOnLoop(&OutermostLoop, &AR.AA, &AR.LI, &AR.DT, AR.BFI,
		&AR.TLI, &AR.TTI, &AR.SE, AR.MSSA, &ORE, true);

		if (!Changed)
		return PreservedAnalyses::all();

		auto PA = getLoopPassPreservedAnalyses();

		PA.preserve<DominatorTreeAnalysis>();
		PA.preserve<LoopAnalysis>();
		if (AR.MSSA)
		PA.preserve<MemorySSAAnalysis>();

		return PA;
		}

char LegacyLICMPass::ID = 0;		char LegacyLICMPass::ID = 0;
INITIALIZE_PASS_BEGIN(LegacyLICMPass, "licm", "Loop Invariant Code Motion",		INITIALIZE_PASS_BEGIN(LegacyLICMPass, "licm", "Loop Invariant Code Motion",
false, false)		false, false)
INITIALIZE_PASS_DEPENDENCY(LoopPass)		INITIALIZE_PASS_DEPENDENCY(LoopPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LazyBFIPass)		INITIALIZE_PASS_DEPENDENCY(LazyBFIPass)
Show All 36 Lines
}		}

/// Hoist expressions out of the specified loop. Note, alias info for inner		/// Hoist expressions out of the specified loop. Note, alias info for inner
/// loop is not preserved so it is not a good idea to run LICM multiple		/// loop is not preserved so it is not a good idea to run LICM multiple
/// times on one loop.		/// times on one loop.
bool LoopInvariantCodeMotion::runOnLoop(		bool LoopInvariantCodeMotion::runOnLoop(
Loop L, AAResults AA, LoopInfo LI, DominatorTree DT,		Loop L, AAResults AA, LoopInfo LI, DominatorTree DT,
BlockFrequencyInfo BFI, TargetLibraryInfo TLI, TargetTransformInfo *TTI,		BlockFrequencyInfo BFI, TargetLibraryInfo TLI, TargetTransformInfo *TTI,
ScalarEvolution SE, MemorySSA MSSA, OptimizationRemarkEmitter *ORE) {		ScalarEvolution SE, MemorySSA MSSA, OptimizationRemarkEmitter *ORE,
		bool LoopNestMode) {
bool Changed = false;		bool Changed = false;

assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");		assert(L->isLCSSAForm(*DT) && "Loop is not in LCSSA form.");

// If this loop has metadata indicating that LICM is not to be performed then		// If this loop has metadata indicating that LICM is not to be performed then
// just exit.		// just exit.
if (hasDisableLICMTransformsHint(L)) {		if (hasDisableLICMTransformsHint(L)) {
return false;		return false;
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	bool LoopInvariantCodeMotion::runOnLoop(
if (L->hasDedicatedExits())		if (L->hasDedicatedExits())
Changed \|=		Changed \|=
sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI, TLI, TTI, L,		sinkRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI, TLI, TTI, L,
CurAST.get(), MSSAU.get(), &SafetyInfo, *Flags.get(), ORE);		CurAST.get(), MSSAU.get(), &SafetyInfo, *Flags.get(), ORE);
Flags->setIsSink(false);		Flags->setIsSink(false);
if (Preheader)		if (Preheader)
Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI, TLI, L,		Changed \|= hoistRegion(DT->getNode(L->getHeader()), AA, LI, DT, BFI, TLI, L,
CurAST.get(), MSSAU.get(), SE, &SafetyInfo,		CurAST.get(), MSSAU.get(), SE, &SafetyInfo,
*Flags.get(), ORE);		*Flags.get(), ORE, LoopNestMode);

// Now that all loop invariants have been removed from the loop, promote any		// Now that all loop invariants have been removed from the loop, promote any
// memory references to scalars that we can.		// memory references to scalars that we can.
// Don't sink stores from loops without dedicated block exits. Exits		// Don't sink stores from loops without dedicated block exits. Exits
// containing indirect branches are not transformed by loop simplify,		// containing indirect branches are not transformed by loop simplify,
// make sure we catch that. An additional load may be generated in the		// make sure we catch that. An additional load may be generated in the
// preheader for SSA updater, so also avoid sinking when no preheader		// preheader for SSA updater, so also avoid sinking when no preheader
// is available.		// is available.
▲ Show 20 Lines • Show All 428 Lines • ▼ Show 20 Lines
/// uses, allowing us to hoist a loop body in one pass without iteration.		/// uses, allowing us to hoist a loop body in one pass without iteration.
///		///
bool llvm::hoistRegion(DomTreeNode N, AAResults AA, LoopInfo *LI,		bool llvm::hoistRegion(DomTreeNode N, AAResults AA, LoopInfo *LI,
DominatorTree DT, BlockFrequencyInfo BFI,		DominatorTree DT, BlockFrequencyInfo BFI,
TargetLibraryInfo TLI, Loop CurLoop,		TargetLibraryInfo TLI, Loop CurLoop,
AliasSetTracker CurAST, MemorySSAUpdater MSSAU,		AliasSetTracker CurAST, MemorySSAUpdater MSSAU,
ScalarEvolution SE, ICFLoopSafetyInfo SafetyInfo,		ScalarEvolution SE, ICFLoopSafetyInfo SafetyInfo,
SinkAndHoistLICMFlags &Flags,		SinkAndHoistLICMFlags &Flags,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE, bool LoopNestMode) {
// Verify inputs.		// Verify inputs.
assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&		assert(N != nullptr && AA != nullptr && LI != nullptr && DT != nullptr &&
CurLoop != nullptr && SafetyInfo != nullptr &&		CurLoop != nullptr && SafetyInfo != nullptr &&
"Unexpected input to hoistRegion.");		"Unexpected input to hoistRegion.");
assert(((CurAST != nullptr) ^ (MSSAU != nullptr)) &&		assert(((CurAST != nullptr) ^ (MSSAU != nullptr)) &&
"Either AliasSetTracker or MemorySSA should be initialized.");		"Either AliasSetTracker or MemorySSA should be initialized.");

ControlFlowHoister CFH(LI, DT, CurLoop, MSSAU);		ControlFlowHoister CFH(LI, DT, CurLoop, MSSAU);

// Keep track of instructions that have been hoisted, as they may need to be		// Keep track of instructions that have been hoisted, as they may need to be
// re-hoisted if they end up not dominating all of their uses.		// re-hoisted if they end up not dominating all of their uses.
SmallVector<Instruction *, 16> HoistedInstructions;		SmallVector<Instruction *, 16> HoistedInstructions;

// For PHI hoisting to work we need to hoist blocks before their successors.		// For PHI hoisting to work we need to hoist blocks before their successors.
// We can do this by iterating through the blocks in the loop in reverse		// We can do this by iterating through the blocks in the loop in reverse
// post-order.		// post-order.
LoopBlocksRPO Worklist(CurLoop);		LoopBlocksRPO Worklist(CurLoop);
Worklist.perform(LI);		Worklist.perform(LI);
bool Changed = false;		bool Changed = false;
for (BasicBlock *BB : Worklist) {		for (BasicBlock *BB : Worklist) {
// Only need to process the contents of this block if it is not part of a		// Only need to process the contents of this block if it is not part of a
// subloop (which would already have been processed).		// subloop (which would already have been processed).
if (inSubLoop(BB, CurLoop, LI))		if (!LoopNestMode && inSubLoop(BB, CurLoop, LI))
continue;		continue;

for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) {		for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) {
Instruction &I = *II++;		Instruction &I = *II++;
// Try constant folding this instruction. If all the operands are		// Try constant folding this instruction. If all the operands are
// constants, it is technically hoistable, but it would be better to		// constants, it is technically hoistable, but it would be better to
// just fold it.		// just fold it.
if (Constant *C = ConstantFoldInstruction(		if (Constant *C = ConstantFoldInstruction(
▲ Show 20 Lines • Show All 1,563 Lines • Show Last 20 Lines

llvm/test/Transforms/LICM/lnicm.ll

This file was added.

				; RUN: opt -aa-pipeline=basic-aa -passes='loop(loop-interchange)' -S %s \| FileCheck %s --check-prefixes INTC
				; RUN: opt -aa-pipeline=basic-aa -passes='loop(lnicm,loop-interchange)' -S %s \| FileCheck %s --check-prefixes LNICM,CHECK
				; RUN: opt -aa-pipeline=basic-aa -passes='loop(licm,loop-interchange)' -S %s \| FileCheck %s --check-prefixes LICM,CHECK

				; This test represents the following function:
				; void test(int x[10][10], int y[10], int *z) {
				; for (int k = 0; k < 10; k++) {
				; int tmp = *z;
				; for (int i = 0; i < 10; i++)
				; x[i][k] += y[k] + tmp;
				; }
				; }
				; We only want to hoist the load of z out of the loop nest.
				; LICM hoists the load of y[k] out of the i-loop, but LNICM doesn't do so
				; to keep perfect loop nest. This enables optimizations that require
				; perfect loop nest (e.g. loop-interchange) to perform.


				define dso_local void @test([10 x i32]* noalias %x, i32* noalias readonly %y, i32* readonly %z) {
				; CHECK-LABEL: @test(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[Z:%.]] = load i32, i32 %z, align 4
				; CHECK-NEXT: br label [[FOR_BODY3_PREHEADER:%.*]]
				; LNICM: for.body.preheader:
				; LICM-NOT: for.body.preheader:
				; INTC-NOT: for.body.preheader:
				; LNICM-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; LNICM-NEXT: [[K:%.]] = phi i32 [ [[INC10:%.]], [[FOR_END:%.]] ], [ 0, [[FOR_BODY_PREHEADER:%.]] ]
				; LNICM-NEXT: br label [[FOR_BODY3_SPLIT1:%.*]]
				; LICM: [[TMP:%.]] = load i32, i32 [[ARRAYIDX:%.*]], align 4
				; LNICM: for.body3.preheader:
				; LICM-NOT: for.body3.preheader:
				; INTC-NOT: for.body3.preheader:
				; LNICM-NEXT: br label [[FOR_BODY3:%.*]]
				; CHECK: for.body3:
				; LNICM-NEXT: [[I:%.]] = phi i32 [ [[TMP3:%.]], [[FOR_BODY3_SPLIT:%.]] ], [ 0, [[FOR_BODY3_PREHEADER:%.]] ]
				; LNICM-NEXT: br label [[FOR_BODY_PREHEADER:%.*]]
				; LNICM: for.body3.split1:
				; LNICM-NEXT: [[IDXPROM:%.]] = sext i32 [[K:%.]] to i64
				; LNICM-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 %y, i64 [[IDXPROM:%.*]]
				; LNICM-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX:%.*]], align 4
				; LNICM-NEXT: [[ADD:%.]] = add nsw i32 [[TMP:%.]], [[Z:%.*]]
				; LNICM-NEXT: [[IDXPROM4:%.]] = sext i32 [[I:%.]] to i64
				; LNICM-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [10 x i32], [10 x i32] %x, i64 [[IDXPROM4:%.*]]
				; LNICM-NEXT: [[IDXPROM6:%.]] = sext i32 [[K:%.]] to i64
				; LNICM-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[ARRAYIDX5:%.]], i64 0, i64 [[IDXPROM6:%.]]
				; LNICM-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX7:%.*]], align 4
				; LNICM-NEXT: [[ADD8:%.]] = add nsw i32 [[TMP2:%.]], [[ADD:%.*]]
				; LNICM-NEXT: store i32 [[ADD8:%.]], i32 [[ARRAYIDX7:%.*]], align 4
				; LNICM-NEXT: [[INC:%.]] = add nsw i32 [[I:%.]], 1
				; LNICM-NEXT: [[CMP2:%.]] = icmp slt i32 [[INC:%.]], 10
				; LNICM-NEXT: br label [[FOR_END:%.*]]
				; LNICM: for.body3.split:
				; LICM-NOT: for.body3.split:
				; INTC-NOT: for.body3.split:
				; LNICM-NEXT: [[TMP3:%.]] = add nsw i32 [[I:%.]], 1
				; LNICM-NEXT: [[TMP4:%.]] = icmp slt i32 [[TMP3:%.]], 10
				; LNICM-NEXT: br i1 [[TMP4:%.]], label [[FOR_BODY3:%.]], label [[FOR_END11:%.*]], !llvm.loop !0
				; LNICM: for.end:
				; LNICM-NEXT: [[INC10:%.]] = add nsw i32 [[K:%.]], 1
				; LNICM-NEXT: [[CMP:%.]] = icmp slt i32 [[INC10:%.]], 10
				; LNICM-NEXT: br i1 [[CMP:%.]], label [[FOR_BODY:%.]], label [[FOR_BODY3_SPLIT:%.*]], !llvm.loop !2
				; LNICM: for.end11:
				; LNICM-NEXT: ret void

				entry:
				br label %for.body

				for.body:
				%k.02 = phi i32 [ 0, %entry ], [ %inc10, %for.end ]
				%0 = load i32, i32* %z, align 4
				br label %for.body3

				for.body3:
				%i.01 = phi i32 [ 0, %for.body ], [ %inc, %for.body3 ]
				%idxprom = sext i32 %k.02 to i64
				%arrayidx = getelementptr inbounds i32, i32* %y, i64 %idxprom
				%1 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %1, %0
				%idxprom4 = sext i32 %i.01 to i64
				%arrayidx5 = getelementptr inbounds [10 x i32], [10 x i32]* %x, i64 %idxprom4
				%idxprom6 = sext i32 %k.02 to i64
				%arrayidx7 = getelementptr inbounds [10 x i32], [10 x i32]* %arrayidx5, i64 0, i64 %idxprom6
				%2 = load i32, i32* %arrayidx7, align 4
				%add8 = add nsw i32 %2, %add
				store i32 %add8, i32* %arrayidx7, align 4
				%inc = add nsw i32 %i.01, 1
				%cmp2 = icmp slt i32 %inc, 10
				br i1 %cmp2, label %for.body3, label %for.end, !llvm.loop !0

				for.end:
				%inc10 = add nsw i32 %k.02, 1
				%cmp = icmp slt i32 %inc10, 10
				br i1 %cmp, label %for.body, label %for.end11, !llvm.loop !2

				for.end11:
				ret void
				}

				!0 = distinct !{!0, !1}
				!1 = !{!"llvm.loop.mustprogress"}
				!2 = distinct !{!2, !1}

This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Create LoopNest Invariant Code Motion (LNICM) passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 359807

llvm/include/llvm/Transforms/Scalar/LICM.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/Scalar/LICM.cpp

llvm/test/Transforms/LICM/lnicm.ll

[LICM] Create LoopNest Invariant Code Motion (LNICM) pass
ClosedPublic