This is an archive of the discontinued LLVM Phabricator instance.

[IndVarSimplify] Add loop-flattening
AbandonedPublic

Authored by SjoerdMeijer on Oct 6 2020, 2:23 AM.

Download Raw Diff

Details

Reviewers

ostannard
dmgreen
sanwou01
samtebbs
fhahn
reames
lebedev.ri

Summary

This pass was used for a 32-bit target, now we want to use it on 64-bit targets too. This pass wasn't triggering on 64-bit targets because the input IR to pass looks slightly different than it was expecting. For example, earlier transformations can perform rewrites using the widest available integer type, and address calculation uses wider 64-bit types. Thus, this change:

Moves pass LoopFlatten to just before LoopIndvarSimplify in the optimisation pipeline. These passes were already running shortly after each other, but LoopIndvarSimplify can perform rewrites using wider types that makes life more difficult for LoopFlatten. This simple reordering avoids these complications, at no disadvantage for the 32-bit target. For our motivating case on this target I've measured an irrelevant -0.0048% regression as a result of this pass reordering.
Overflow checks are performed on the GEP instructions. This change looks through a ZExt instruction only if it is used to index a GEP and if there are no other uses that could change the value. I think this is the least intrusive change compared to alternatives, for example promoting loop control instruction to the widest used type if different types are used.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Oct 6 2020, 2:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2020, 2:23 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

SjoerdMeijer requested review of this revision.Oct 6 2020, 2:23 AM

SjoerdMeijer added inline comments.Oct 6 2020, 2:26 AM

llvm/test/Transforms/LoopFlatten/zext-i64.ll
83 ↗	(On Diff #296389)	This file shows changes compared to a local initial commit of this new file to better highlight the modifications of this changes. Here's the negative test, a case that should not be transformed.

ostannard added inline comments.Oct 8 2020, 6:41 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp
386	I don't think this logic is correct. The check below is proving that the add and mul cannot overflow because the GEP would overflow before they do, and that would be UB. However, in the example in this comment, the add or mul could overflow, but the GEP would be fine, because it never sees an offset greater than 2*32-1. Since there's no UB, we would need to preserve the wrapping behaviour, which we can't do with a single, simple loop.

I think it might be really good if it would be possible to not implement all this overflow detection from scratch.
Is there nothing in SCEV already that does this?

In D88880#2319318, @lebedev.ri wrote:

I think it might be really good if it would be possible to not implement all this overflow detection from scratch.
Is there nothing in SCEV already that does this?

Agreed. This needs a bit of a rethink, best done with the help of existing infrastructure. I am going to have a look.

Thanks for looking at this!

ostannard added inline comments.Oct 8 2020, 7:58 AM

llvm/lib/Transforms/Scalar/LoopFlatten.cpp

386

Actually, I think the original version of this is incorrect too:

extern char a[];

#define SIZE 100000

int first = 1;

void foo(unsigned lim) {
  for (unsigned i = 0; i < lim; ++i) {
    for (unsigned j = 0; j < SIZE; ++j) {
      // This might overflow if lim is large, but that is well-defined.
      unsigned x = i * SIZE + j;
      if (first) {
        // Access memory using the computed index. Only do this the first time,
        // so the address calculation won't overflow.
        asm volatile("" : : "r" (a[x]));
        first = 0;
      }
      // Use the computed index for something which isn't UB.
      asm volatile("" : : "r" (x));
    }
  }
}

This gets flattened because the GEP check below passes, but the GEP is only actually executed once, and doesn't overflow the address space. I think everything in the source is well-defined, but flattening it changes the number of times the second asm statement is executed.

We might be able to salvage this by adding a check that the GEP dominates the loop latch?

We might be able to salvage this by adding a check that the GEP dominates the loop latch?

Thanks, I am picking this up again, and am going to progress this in 2 ways. I will address the existing bug separately.

Here, in this change that addresses different types, I will investigate widening to a wider type so that the transformation should be safe and avoiding more layers of overflow checks.

In D88880#2329493, @SjoerdMeijer wrote:

We might be able to salvage this by adding a check that the GEP dominates the loop latch?

Thanks, I am picking this up again, and am going to progress this in 2 ways. I will address the existing bug separately.

Here, in this change that addresses different types, I will investigate widening to a wider type so that the transformation should be safe and avoiding more layers of overflow checks.

I guess this is similar to the widening that IndVarSimplify does? Can we just re-use the stuff from there or have IndVarSimplify just do it for us?

I guess this is similar to the widening that IndVarSimplify does? Can we just re-use the stuff from there or have IndVarSimplify just do it for us?

Yep, exactly. That's exactly what I want to look at.

I've got a slightly different proposal. This moves loop flattening into IndVarSimplify for several reasons:

loop-flatten is best run just before IndVarSimplify because IndVarSimplify can promote induction variables. For overflow analysis to see if loop flattening is legal, it's best if inductions variables haven't been promoted yet.
When induction variables of a loop nest don't use the maximum legal integer type, we promote them to the widest type so we know loop flattening is safe thus avoiding overflow analysis. Promoting induction variables is what IndVarSimplify was already doing, so this reusing that.
Last but not least, with the loops that we support with loop-flattening, induction variable simplification is exactly the point of this transform, so this looks like a good home for it. Thus, this also avoids quite some churn making modifications to LoopUtils where refactored/shared code could live, and in both of the passes.

This is still work-in-progress, but wanted to share the idea already. This needs porting of the existing loopflatten tests, and I need to fix a bug/integration issue, but in terms of (re)structuring this should be pretty much it.

Herald added a subscriber: mgorny. · View Herald TranscriptOct 16 2020, 6:22 AM

SjoerdMeijer mentioned this in D89378: [LoopFlatten] Loop limit invariant checks.Oct 20 2020, 7:37 AM

Little ping to see if we are okay with the direction to flatten induction vars in IndVarSimplify, see my previous message for motivation as opposed to having a separate pass.

Think I am abandoning this as I am guessing it will be easier to get buy in if I keep things separate. This means that:

I will separate out the widening of the induction variable in IndVarSimplify and put this as a utility in LoopUtil,
and modify IndVarsSimplify to use that,
then do make changes to LoopFlatten.

SjoerdMeijer mentioned this in D90402: [LoopFlatten] Run it earlier, just before IndVarSimplify.Oct 29 2020, 8:40 AM

SjoerdMeijer mentioned this in D90408: [LoopFlatten] FlattenInfo bookkeeping. NFC..Oct 29 2020, 9:51 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

6 lines

Scalar/

LoopFlatten.h

lib/

Passes/

PassBuilder.cpp

7 lines

PassRegistry.def

1 line

Transforms/

IPO/

PassManagerBuilder.cpp

13 lines

Scalar/

1 line

582 lines

5 lines

utils/

gn/

secondary/

llvm/

lib/

Transforms/

Scalar/

BUILD.gn

1 line

Diff 298615

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	void initializeLoopDistributeLegacyPass(PassRegistry&);			void initializeLoopDistributeLegacyPass(PassRegistry&);
	void initializeLoopExtractorLegacyPassPass(PassRegistry &);			void initializeLoopExtractorLegacyPassPass(PassRegistry &);
	void initializeLoopGuardWideningLegacyPassPass(PassRegistry&);			void initializeLoopGuardWideningLegacyPassPass(PassRegistry&);
	void initializeLoopFuseLegacyPass(PassRegistry&);			void initializeLoopFuseLegacyPass(PassRegistry&);
	void initializeLoopIdiomRecognizeLegacyPassPass(PassRegistry&);			void initializeLoopIdiomRecognizeLegacyPassPass(PassRegistry&);
	void initializeLoopInfoWrapperPassPass(PassRegistry&);			void initializeLoopInfoWrapperPassPass(PassRegistry&);
	void initializeLoopInstSimplifyLegacyPassPass(PassRegistry&);			void initializeLoopInstSimplifyLegacyPassPass(PassRegistry&);
	void initializeLoopInterchangeLegacyPassPass(PassRegistry &);			void initializeLoopInterchangeLegacyPassPass(PassRegistry &);
	void initializeLoopFlattenLegacyPassPass(PassRegistry&);
	void initializeLoopLoadEliminationPass(PassRegistry&);			void initializeLoopLoadEliminationPass(PassRegistry&);
	void initializeLoopPassPass(PassRegistry&);			void initializeLoopPassPass(PassRegistry&);
	void initializeLoopPredicationLegacyPassPass(PassRegistry&);			void initializeLoopPredicationLegacyPassPass(PassRegistry&);
	void initializeLoopRerollLegacyPassPass(PassRegistry &);			void initializeLoopRerollLegacyPassPass(PassRegistry &);
	void initializeLoopRotateLegacyPassPass(PassRegistry&);			void initializeLoopRotateLegacyPassPass(PassRegistry&);
	void initializeLoopSimplifyCFGLegacyPassPass(PassRegistry&);			void initializeLoopSimplifyCFGLegacyPassPass(PassRegistry&);
	void initializeLoopSimplifyPass(PassRegistry&);			void initializeLoopSimplifyPass(PassRegistry&);
	void initializeLoopStrengthReducePass(PassRegistry&);			void initializeLoopStrengthReducePass(PassRegistry&);
	▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createInternalizePass();		(void) llvm::createInternalizePass();
(void) llvm::createLCSSAPass();		(void) llvm::createLCSSAPass();
(void) llvm::createLegacyDivergenceAnalysisPass();		(void) llvm::createLegacyDivergenceAnalysisPass();
(void) llvm::createLICMPass();		(void) llvm::createLICMPass();
(void) llvm::createLoopSinkPass();		(void) llvm::createLoopSinkPass();
(void) llvm::createLazyValueInfoPass();		(void) llvm::createLazyValueInfoPass();
(void) llvm::createLoopExtractorPass();		(void) llvm::createLoopExtractorPass();
(void) llvm::createLoopInterchangePass();		(void) llvm::createLoopInterchangePass();
(void) llvm::createLoopFlattenPass();
(void) llvm::createLoopPredicationPass();		(void) llvm::createLoopPredicationPass();
(void) llvm::createLoopSimplifyPass();		(void) llvm::createLoopSimplifyPass();
(void) llvm::createLoopSimplifyCFGPass();		(void) llvm::createLoopSimplifyCFGPass();
(void) llvm::createLoopStrengthReducePass();		(void) llvm::createLoopStrengthReducePass();
(void) llvm::createLoopRerollPass();		(void) llvm::createLoopRerollPass();
(void) llvm::createLoopUnrollPass();		(void) llvm::createLoopUnrollPass();
(void) llvm::createLoopUnrollAndJamPass();		(void) llvm::createLoopUnrollAndJamPass();
(void) llvm::createLoopUnswitchPass();		(void) llvm::createLoopUnswitchPass();
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	//			//
	// LoopInterchange - This pass interchanges loops to provide a more			// LoopInterchange - This pass interchanges loops to provide a more
	// cache-friendly memory access patterns.			// cache-friendly memory access patterns.
	//			//
	Pass *createLoopInterchangePass();			Pass *createLoopInterchangePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopFlatten - This pass flattens nested loops into a single loop.
	//
	Pass *createLoopFlattenPass();

	//===----------------------------------------------------------------------===//
	//
	// LoopStrengthReduce - This pass is strength reduces GEP instructions that use			// LoopStrengthReduce - This pass is strength reduces GEP instructions that use
	// a loop's canonical induction variable as one of their indices.			// a loop's canonical induction variable as one of their indices.
	//			//
	Pass *createLoopStrengthReducePass();			Pass *createLoopStrengthReducePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopUnswitch - This pass is a simple loop unswitching pass.			// LoopUnswitch - This pass is a simple loop unswitching pass.
	▲ Show 20 Lines • Show All 377 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/LoopFlatten.h

This file was deleted.

	//===- LoopFlatten.h - Loop Flatten ---------------- ------------ C++ --===//
	//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//
	//===----------------------------------------------------------------------===//
	//
	// This file provides the interface for the Loop Flatten Pass.
	//
	//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_SCALAR_LOOPFLATTEN_H
	#define LLVM_TRANSFORMS_SCALAR_LOOPFLATTEN_H

	#include "llvm/Analysis/LoopAnalysisManager.h"
	#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/IR/PassManager.h"
	#include "llvm/Transforms/Scalar/LoopPassManager.h"

	namespace llvm {

	class LoopFlattenPass : public PassInfoMixin<LoopFlattenPass> {
	public:
	LoopFlattenPass() = default;

	PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
	LoopStandardAnalysisResults &AR, LPMUpdater &U);
	};

	} // end namespace llvm

	#endif // LLVM_TRANSFORMS_SCALAR_LOOPFLATTEN_H

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar/InductiveRangeCheckElimination.h"		#include "llvm/Transforms/Scalar/InductiveRangeCheckElimination.h"
#include "llvm/Transforms/Scalar/InstSimplifyPass.h"		#include "llvm/Transforms/Scalar/InstSimplifyPass.h"
#include "llvm/Transforms/Scalar/JumpThreading.h"		#include "llvm/Transforms/Scalar/JumpThreading.h"
#include "llvm/Transforms/Scalar/LICM.h"		#include "llvm/Transforms/Scalar/LICM.h"
#include "llvm/Transforms/Scalar/LoopAccessAnalysisPrinter.h"		#include "llvm/Transforms/Scalar/LoopAccessAnalysisPrinter.h"
#include "llvm/Transforms/Scalar/LoopDataPrefetch.h"		#include "llvm/Transforms/Scalar/LoopDataPrefetch.h"
#include "llvm/Transforms/Scalar/LoopDeletion.h"		#include "llvm/Transforms/Scalar/LoopDeletion.h"
#include "llvm/Transforms/Scalar/LoopDistribute.h"		#include "llvm/Transforms/Scalar/LoopDistribute.h"
#include "llvm/Transforms/Scalar/LoopFlatten.h"
#include "llvm/Transforms/Scalar/LoopFuse.h"		#include "llvm/Transforms/Scalar/LoopFuse.h"
#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"		#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
#include "llvm/Transforms/Scalar/LoopInstSimplify.h"		#include "llvm/Transforms/Scalar/LoopInstSimplify.h"
#include "llvm/Transforms/Scalar/LoopInterchange.h"		#include "llvm/Transforms/Scalar/LoopInterchange.h"
#include "llvm/Transforms/Scalar/LoopLoadElimination.h"		#include "llvm/Transforms/Scalar/LoopLoadElimination.h"
#include "llvm/Transforms/Scalar/LoopPassManager.h"		#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Transforms/Scalar/LoopPredication.h"		#include "llvm/Transforms/Scalar/LoopPredication.h"
#include "llvm/Transforms/Scalar/LoopReroll.h"		#include "llvm/Transforms/Scalar/LoopReroll.h"
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
static cl::opt<bool> EnableGVNSink(		static cl::opt<bool> EnableGVNSink(
"enable-npm-gvn-sink", cl::init(false), cl::Hidden,		"enable-npm-gvn-sink", cl::init(false), cl::Hidden,
cl::desc("Enable the GVN hoisting pass for the new PM (default = off)"));		cl::desc("Enable the GVN hoisting pass for the new PM (default = off)"));

static cl::opt<bool> EnableUnrollAndJam(		static cl::opt<bool> EnableUnrollAndJam(
"enable-npm-unroll-and-jam", cl::init(false), cl::Hidden,		"enable-npm-unroll-and-jam", cl::init(false), cl::Hidden,
cl::desc("Enable the Unroll and Jam pass for the new PM (default = off)"));		cl::desc("Enable the Unroll and Jam pass for the new PM (default = off)"));

static cl::opt<bool> EnableLoopFlatten(
"enable-npm-loop-flatten", cl::init(false), cl::Hidden,
cl::desc("Enable the Loop flattening pass for the new PM (default = off)"));

static cl::opt<bool> EnableSyntheticCounts(		static cl::opt<bool> EnableSyntheticCounts(
"enable-npm-synthetic-counts", cl::init(false), cl::Hidden, cl::ZeroOrMore,		"enable-npm-synthetic-counts", cl::init(false), cl::Hidden, cl::ZeroOrMore,
cl::desc("Run synthetic function entry count generation "		cl::desc("Run synthetic function entry count generation "
"pass"));		"pass"));

static const Regex DefaultAliasRegex(		static const Regex DefaultAliasRegex(
"^(default\|thinlto-pre-link\|thinlto\|lto-pre-link\|lto)<(O[0123sz])>$");		"^(default\|thinlto-pre-link\|thinlto\|lto-pre-link\|lto)<(O[0123sz])>$");

▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	FunctionPassManager PassBuilder::buildO1FunctionSimplificationPipeline(
LPM1.addPass(SimpleLoopUnswitchPass());		LPM1.addPass(SimpleLoopUnswitchPass());
LPM2.addPass(IndVarSimplifyPass());		LPM2.addPass(IndVarSimplifyPass());
LPM2.addPass(LoopIdiomRecognizePass());		LPM2.addPass(LoopIdiomRecognizePass());

for (auto &C : LateLoopOptimizationsEPCallbacks)		for (auto &C : LateLoopOptimizationsEPCallbacks)
C(LPM2, Level);		C(LPM2, Level);

LPM2.addPass(LoopDeletionPass());		LPM2.addPass(LoopDeletionPass());
if (EnableLoopFlatten)
LPM2.addPass(LoopFlattenPass());
// Do not enable unrolling in PreLinkThinLTO phase during sample PGO		// Do not enable unrolling in PreLinkThinLTO phase during sample PGO
// because it changes IR to makes profile annotation in back compile		// because it changes IR to makes profile annotation in back compile
// inaccurate. The normal unroller doesn't pay attention to forced full unroll		// inaccurate. The normal unroller doesn't pay attention to forced full unroll
// attributes so we need to make sure and allow the full unroll pass to pay		// attributes so we need to make sure and allow the full unroll pass to pay
// attention to it.		// attention to it.
if (Phase != ThinLTOPhase::PreLink \|\| !PGOOpt \|\|		if (Phase != ThinLTOPhase::PreLink \|\| !PGOOpt \|\|
PGOOpt->Action != PGOOptions::SampleUse)		PGOOpt->Action != PGOOptions::SampleUse)
LPM2.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),		LPM2.addPass(LoopFullUnrollPass(Level.getSpeedupLevel(),
▲ Show 20 Lines • Show All 2,312 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 360 Lines • ▼ Show 20 Lines
	LOOP_PASS("licm", LICMPass())			LOOP_PASS("licm", LICMPass())
	LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())			LOOP_PASS("loop-idiom", LoopIdiomRecognizePass())
	LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())			LOOP_PASS("loop-instsimplify", LoopInstSimplifyPass())
	LOOP_PASS("loop-interchange", LoopInterchangePass())			LOOP_PASS("loop-interchange", LoopInterchangePass())
	LOOP_PASS("loop-rotate", LoopRotatePass())			LOOP_PASS("loop-rotate", LoopRotatePass())
	LOOP_PASS("no-op-loop", NoOpLoopPass())			LOOP_PASS("no-op-loop", NoOpLoopPass())
	LOOP_PASS("print", PrintLoopPass(dbgs()))			LOOP_PASS("print", PrintLoopPass(dbgs()))
	LOOP_PASS("loop-deletion", LoopDeletionPass())			LOOP_PASS("loop-deletion", LoopDeletionPass())
	LOOP_PASS("loop-flatten", LoopFlattenPass())
	LOOP_PASS("loop-simplifycfg", LoopSimplifyCFGPass())			LOOP_PASS("loop-simplifycfg", LoopSimplifyCFGPass())
	LOOP_PASS("loop-reduce", LoopStrengthReducePass())			LOOP_PASS("loop-reduce", LoopStrengthReducePass())
	LOOP_PASS("indvars", IndVarSimplifyPass())			LOOP_PASS("indvars", IndVarSimplifyPass())
	LOOP_PASS("loop-unroll-full", LoopFullUnrollPass())			LOOP_PASS("loop-unroll-full", LoopFullUnrollPass())
	LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))			LOOP_PASS("print-access-info", LoopAccessInfoPrinterPass(dbgs()))
	LOOP_PASS("print<ddg>", DDGAnalysisPrinterPass(dbgs()))			LOOP_PASS("print<ddg>", DDGAnalysisPrinterPass(dbgs()))
	LOOP_PASS("print<iv-users>", IVUsersPrinterPass(dbgs()))			LOOP_PASS("print<iv-users>", IVUsersPrinterPass(dbgs()))
	LOOP_PASS("print<loopnest>", LoopNestPrinterPass(dbgs()))			LOOP_PASS("print<loopnest>", LoopNestPrinterPass(dbgs()))
	Show All 16 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
static cl::opt<bool> EnableLoopInterchange(		static cl::opt<bool> EnableLoopInterchange(
"enable-loopinterchange", cl::init(false), cl::Hidden,		"enable-loopinterchange", cl::init(false), cl::Hidden,
cl::desc("Enable the new, experimental LoopInterchange Pass"));		cl::desc("Enable the new, experimental LoopInterchange Pass"));

static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",		static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",
cl::init(false), cl::Hidden,		cl::init(false), cl::Hidden,
cl::desc("Enable Unroll And Jam Pass"));		cl::desc("Enable Unroll And Jam Pass"));

static cl::opt<bool> EnableLoopFlatten("enable-loop-flatten", cl::init(false),		extern cl::opt<bool> EnableLoopFlatten;
cl::Hidden,
cl::desc("Enable the LoopFlatten Pass"));

static cl::opt<bool>		static cl::opt<bool>
EnablePrepareForThinLTO("prepare-for-thinlto", cl::init(false), cl::Hidden,		EnablePrepareForThinLTO("prepare-for-thinlto", cl::init(false), cl::Hidden,
cl::desc("Enable preparation for ThinLTO."));		cl::desc("Enable preparation for ThinLTO."));

static cl::opt<bool>		static cl::opt<bool>
EnablePerformThinLTO("perform-thinlto", cl::init(false), cl::Hidden,		EnablePerformThinLTO("perform-thinlto", cl::init(false), cl::Hidden,
cl::desc("Enable performing ThinLTO."));		cl::desc("Enable performing ThinLTO."));
▲ Show 20 Lines • Show All 331 Lines • ▼ Show 20 Lines	else
MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));		MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3, DivergentTarget));
// FIXME: We break the loop pass pipeline here in order to do full		// FIXME: We break the loop pass pipeline here in order to do full
// simplify-cfg. Eventually loop-simplifycfg should be enhanced to replace the		// simplify-cfg. Eventually loop-simplifycfg should be enhanced to replace the
// need for this.		// need for this.
MPM.add(createCFGSimplificationPass());		MPM.add(createCFGSimplificationPass());
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());
// We resume loop passes creating a second loop pipeline here.		// We resume loop passes creating a second loop pipeline here.
MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars		MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
		if (EnableLoopFlatten)
		MPM.add(createLoopSimplifyCFGPass());

MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.		MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.
addExtensionsToPM(EP_LateLoopOptimizations, MPM);		addExtensionsToPM(EP_LateLoopOptimizations, MPM);
MPM.add(createLoopDeletionPass()); // Delete dead loops		MPM.add(createLoopDeletionPass()); // Delete dead loops

if (EnableLoopInterchange)		if (EnableLoopInterchange)
MPM.add(createLoopInterchangePass()); // Interchange loops		MPM.add(createLoopInterchangePass()); // Interchange loops
if (EnableLoopFlatten) {
MPM.add(createLoopFlattenPass()); // Flatten loops
MPM.add(createLoopSimplifyCFGPass());
}

// Unroll small loops		// Unroll small loops
MPM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,		MPM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,
ForgetAllSCEVInLoopUnroll));		ForgetAllSCEVInLoopUnroll));
addExtensionsToPM(EP_LoopOptimizerEnd, MPM);		addExtensionsToPM(EP_LoopOptimizerEnd, MPM);
// This ends the loop pass pipelines.		// This ends the loop pass pipelines.

// Break up allocas that may now be splittable after loop unrolling.		// Break up allocas that may now be splittable after loop unrolling.
▲ Show 20 Lines • Show All 578 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
PM.add(createDeadStoreEliminationPass());		PM.add(createDeadStoreEliminationPass());
PM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds.		PM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds.

// More loops are countable; try to optimize them.		// More loops are countable; try to optimize them.
PM.add(createIndVarSimplifyPass());		PM.add(createIndVarSimplifyPass());
PM.add(createLoopDeletionPass());		PM.add(createLoopDeletionPass());
if (EnableLoopInterchange)		if (EnableLoopInterchange)
PM.add(createLoopInterchangePass());		PM.add(createLoopInterchangePass());
if (EnableLoopFlatten)
PM.add(createLoopFlattenPass());

// Unroll small loops		// Unroll small loops
PM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,		PM.add(createSimpleLoopUnrollPass(OptLevel, DisableUnrollLoops,
ForgetAllSCEVInLoopUnroll));		ForgetAllSCEVInLoopUnroll));
PM.add(createLoopVectorizePass(true, !LoopVectorize));		PM.add(createLoopVectorizePass(true, !LoopVectorize));
// The vectorizer may have significantly shortened a loop body; unroll again.		// The vectorizer may have significantly shortened a loop body; unroll again.
PM.add(createLoopUnrollPass(OptLevel, DisableUnrollLoops,		PM.add(createLoopUnrollPass(OptLevel, DisableUnrollLoops,
ForgetAllSCEVInLoopUnroll));		ForgetAllSCEVInLoopUnroll));
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

Show All 26 Lines	add_llvm_component_library(LLVMScalarOpts
LoopSink.cpp		LoopSink.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopDataPrefetch.cpp		LoopDataPrefetch.cpp
LoopDistribute.cpp		LoopDistribute.cpp
LoopFuse.cpp		LoopFuse.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
LoopInterchange.cpp		LoopInterchange.cpp
LoopFlatten.cpp
LoopLoadElimination.cpp		LoopLoadElimination.cpp
LoopPassManager.cpp		LoopPassManager.cpp
LoopPredication.cpp		LoopPredication.cpp
LoopRerollPass.cpp		LoopRerollPass.cpp
LoopRotation.cpp		LoopRotation.cpp
LoopSimplifyCFG.cpp		LoopSimplifyCFG.cpp
LoopStrengthReduce.cpp		LoopStrengthReduce.cpp
LoopUnrollPass.cpp		LoopUnrollPass.cpp
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

Show All 30 Lines
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
Show All 33 Lines
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"		#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
#include "llvm/Transforms/Utils/SimplifyIndVar.h"		#include "llvm/Transforms/Utils/SimplifyIndVar.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;
		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "indvars"		#define DEBUG_TYPE "indvars"

STATISTIC(NumWidened , "Number of indvars widened");		STATISTIC(NumWidened , "Number of indvars widened");
STATISTIC(NumReplaced , "Number of exit values replaced");		STATISTIC(NumReplaced , "Number of exit values replaced");
STATISTIC(NumLFTR , "Number of loop exit tests replaced");		STATISTIC(NumLFTR , "Number of loop exit tests replaced");
STATISTIC(NumElimExt , "Number of IV sign/zero extends eliminated");		STATISTIC(NumElimExt , "Number of IV sign/zero extends eliminated");
STATISTIC(NumElimIV , "Number of congruent IVs eliminated");		STATISTIC(NumElimIV , "Number of congruent IVs eliminated");
Show All 30 Lines
static cl::opt<bool>		static cl::opt<bool>
LoopPredication("indvars-predicate-loops", cl::Hidden, cl::init(true),		LoopPredication("indvars-predicate-loops", cl::Hidden, cl::init(true),
cl::desc("Predicate conditions in read only loops"));		cl::desc("Predicate conditions in read only loops"));

static cl::opt<bool>		static cl::opt<bool>
AllowIVWidening("indvars-widen-indvars", cl::Hidden, cl::init(true),		AllowIVWidening("indvars-widen-indvars", cl::Hidden, cl::init(true),
cl::desc("Allow widening of indvars to eliminate s/zext"));		cl::desc("Allow widening of indvars to eliminate s/zext"));

		static cl::opt<unsigned> RepeatedInstructionThreshold(
		"loop-flatten-cost-threshold", cl::Hidden, cl::init(2),
		cl::desc("Limit on the cost of instructions that can be repeated due to "
		"loop flattening"));

		static cl::opt<bool>
		AssumeNoOverflow("loop-flatten-assume-no-overflow", cl::Hidden,
		cl::init(false),
		cl::desc("Assume that the product of the two iteration "
		"limits will never overflow"));

		cl::opt<bool> EnableLoopFlatten("enable-loop-flatten", cl::init(false),
		cl::Hidden,
		cl::desc("Enable loop flattening in "
		"IndVarSimplify"));

namespace {		namespace {

struct RewritePhi;		struct RewritePhi;

class IndVarSimplify {		class IndVarSimplify {
LoopInfo *LI;		LoopInfo *LI;
ScalarEvolution *SE;		ScalarEvolution *SE;
DominatorTree *DT;		DominatorTree *DT;
const DataLayout &DL;		const DataLayout &DL;
TargetLibraryInfo *TLI;		TargetLibraryInfo *TLI;
const TargetTransformInfo *TTI;		const TargetTransformInfo *TTI;
std::unique_ptr<MemorySSAUpdater> MSSAU;		std::unique_ptr<MemorySSAUpdater> MSSAU;
		AssumptionCache *AC;
		std::function<void(Loop *)> markLoopAsDeleted;

SmallVector<WeakTrackingVH, 16> DeadInsts;		SmallVector<WeakTrackingVH, 16> DeadInsts;

bool handleFloatingPointIV(Loop L, PHINode PH);		bool handleFloatingPointIV(Loop L, PHINode PH);
bool rewriteNonIntegerIVs(Loop *L);		bool rewriteNonIntegerIVs(Loop *L);

bool simplifyAndExtend(Loop L, SCEVExpander &Rewriter, LoopInfo LI);		bool simplifyAndExtend(Loop L, SCEVExpander &Rewriter, LoopInfo LI);

		bool tryFlattenLoopPair(Loop *L,
		SmallVectorImpl<WeakTrackingVH> &DeadInsts,
		SCEVExpander &Rewriter);

/// Try to eliminate loop exits based on analyzeable exit counts		/// Try to eliminate loop exits based on analyzeable exit counts
bool optimizeLoopExits(Loop *L, SCEVExpander &Rewriter);		bool optimizeLoopExits(Loop *L, SCEVExpander &Rewriter);
/// Try to form loop invariant tests for loop exits by changing how many		/// Try to form loop invariant tests for loop exits by changing how many
/// iterations of the loop run when that is unobservable.		/// iterations of the loop run when that is unobservable.
bool predicateLoopExits(Loop *L, SCEVExpander &Rewriter);		bool predicateLoopExits(Loop *L, SCEVExpander &Rewriter);

bool rewriteFirstIterationLoopExitValues(Loop *L);		bool rewriteFirstIterationLoopExitValues(Loop *L);

bool linearFunctionTestReplace(Loop L, BasicBlock ExitingBB,		bool linearFunctionTestReplace(Loop L, BasicBlock ExitingBB,
const SCEV *ExitCount,		const SCEV *ExitCount,
PHINode *IndVar, SCEVExpander &Rewriter);		PHINode *IndVar, SCEVExpander &Rewriter);

bool sinkUnusedInvariants(Loop *L);		bool sinkUnusedInvariants(Loop *L);

public:		public:
IndVarSimplify(LoopInfo LI, ScalarEvolution SE, DominatorTree *DT,		IndVarSimplify(LoopInfo LI, ScalarEvolution SE, DominatorTree *DT,
const DataLayout &DL, TargetLibraryInfo *TLI,		const DataLayout &DL, TargetLibraryInfo *TLI,
TargetTransformInfo TTI, MemorySSA MSSA)		TargetTransformInfo TTI, MemorySSA MSSA, AssumptionCache *AC,
: LI(LI), SE(SE), DT(DT), DL(DL), TLI(TLI), TTI(TTI) {		std::function<void(Loop *)> markLoopAsDeleted)
		: LI(LI), SE(SE), DT(DT), DL(DL), TLI(TLI), TTI(TTI), AC(AC),
		markLoopAsDeleted(markLoopAsDeleted) {
if (MSSA)		if (MSSA)
MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);		MSSAU = std::make_unique<MemorySSAUpdater>(MSSA);
}		}

bool run(Loop *L);		bool run(Loop *L);
};		};

} // end anonymous namespace		} // end anonymous namespace
▲ Show 20 Lines • Show All 2,494 Lines • ▼ Show 20 Lines	if (OldCond->use_empty())
DeadInsts.emplace_back(OldCond);		DeadInsts.emplace_back(OldCond);
Changed = true;		Changed = true;
}		}

return Changed;		return Changed;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// Flatten nested loops. Remove outer-loop induction variables.
		//===----------------------------------------------------------------------===//
		//
		// The intention is to optimise loop nests like this, which together access an
		// array linearly:
		// for (int i = 0; i < N; ++i)
		// for (int j = 0; j < M; ++j)
		// f(A[i*M+j]);
		// into one loop:
		// for (int i = 0; i < (N*M); ++i)
		// f(A[i]);
		//
		// It can also flatten loops where the induction variables are not used in the
		// loop. This is only worth doing if the induction variables are only used in an
		// expression like i*M+j. If they had any other uses, we would have to insert a
		// div/mod to reconstruct the original values, so this wouldn't be profitable.
		//
		// We also need to prove that N*M will not overflow.

		struct FlattenInfo {
		Loop *OuterLoop;
		Loop *InnerLoop;
		PHINode *InnerInductionPHI;
		PHINode *OuterInductionPHI;
		Value *InnerLimit;
		Value *OuterLimit;
		BinaryOperator *InnerIncrement;
		BinaryOperator *OuterIncrement;
		BranchInst *InnerBranch;
		BranchInst *OuterBranch;
		SmallPtrSet<Value *, 4> LinearIVUses;
		SmallPtrSet<PHINode *, 4> InnerPHIsToTransform;

		FlattenInfo(Loop OL, Loop IL) : OuterLoop(OL), InnerLoop(IL) {};
		};

		// Finds the induction variable, increment and limit for a simple loop that we
		// can flatten.
		static bool findLoopComponents(
		Loop L, SmallPtrSetImpl<Instruction > &IterationInstructions,
		PHINode &InductionPHI, Value &Limit, BinaryOperator *&Increment,
		BranchInst &BackBranch, ScalarEvolution SE) {
		LLVM_DEBUG(dbgs() << "Finding components of loop: " << L->getName() << "\n");

		if (!L->isLoopSimplifyForm()) {
		LLVM_DEBUG(dbgs() << "Loop is not in normal form\n");
		return false;
		}

		// There must be exactly one exiting block, and it must be the same at the
		// latch.
		BasicBlock *Latch = L->getLoopLatch();
		if (L->getExitingBlock() != Latch) {
		LLVM_DEBUG(dbgs() << "Exiting and latch block are different\n");
		return false;
		}
		// Latch block must end in a conditional branch.
		BackBranch = dyn_cast<BranchInst>(Latch->getTerminator());
		if (!BackBranch \|\| !BackBranch->isConditional()) {
		LLVM_DEBUG(dbgs() << "Could not find back-branch\n");
		return false;
		}
		IterationInstructions.insert(BackBranch);
		LLVM_DEBUG(dbgs() << "Found back branch: "; BackBranch->dump());
		bool ContinueOnTrue = L->contains(BackBranch->getSuccessor(0));

		// Find the induction PHI. If there is no induction PHI, we can't do the
		// transformation. TODO: could other variables trigger this? Do we have to
		// search for the best one?
		InductionPHI = nullptr;
		for (PHINode &PHI : L->getHeader()->phis()) {
		InductionDescriptor ID;
		if (InductionDescriptor::isInductionPHI(&PHI, L, SE, ID)) {
		InductionPHI = &PHI;
		LLVM_DEBUG(dbgs() << "Found induction PHI: "; InductionPHI->dump());
		break;
		}
		}
		if (!InductionPHI) {
		LLVM_DEBUG(dbgs() << "Could not find induction PHI\n");
		return false;
		}

		auto IsValidPredicate = [&](ICmpInst::Predicate Pred) {
		if (ContinueOnTrue)
		return Pred == CmpInst::ICMP_NE \|\| Pred == CmpInst::ICMP_ULT;
		else
		return Pred == CmpInst::ICMP_EQ;
		};

		// Find Compare and make sure it is valid
		ICmpInst *Compare = dyn_cast<ICmpInst>(BackBranch->getCondition());
		if (!Compare \|\| !IsValidPredicate(Compare->getUnsignedPredicate()) \|\|
		Compare->hasNUsesOrMore(2)) {
		LLVM_DEBUG(dbgs() << "Could not find valid comparison\n");
		return false;
		}
		IterationInstructions.insert(Compare);
		LLVM_DEBUG(dbgs() << "Found comparison: "; Compare->dump());

		// Find increment and limit from the compare
		Increment = nullptr;
		if (match(Compare->getOperand(0),
		m_c_Add(m_Specific(InductionPHI), m_ConstantInt<1>()))) {
		Increment = dyn_cast<BinaryOperator>(Compare->getOperand(0));
		Limit = Compare->getOperand(1);
		} else if (Compare->getUnsignedPredicate() == CmpInst::ICMP_NE &&
		match(Compare->getOperand(1),
		m_c_Add(m_Specific(InductionPHI), m_ConstantInt<1>()))) {
		Increment = dyn_cast<BinaryOperator>(Compare->getOperand(1));
		Limit = Compare->getOperand(0);
		}
		if (!Increment \|\| Increment->hasNUsesOrMore(3)) {
		LLVM_DEBUG(dbgs() << "Cound not find valid increment\n");
		return false;
		}
		IterationInstructions.insert(Increment);
		LLVM_DEBUG(dbgs() << "Found increment: "; Increment->dump());
		LLVM_DEBUG(dbgs() << "Found limit: "; Limit->dump());

		assert(InductionPHI->getNumIncomingValues() == 2);
		assert(InductionPHI->getIncomingValueForBlock(Latch) == Increment &&
		"PHI value is not increment inst");

		auto *CI = dyn_cast<ConstantInt>(
		InductionPHI->getIncomingValueForBlock(L->getLoopPreheader()));
		if (!CI \|\| !CI->isZero()) {
		LLVM_DEBUG(dbgs() << "PHI value is not zero: "; CI->dump());
		return false;
		}

		LLVM_DEBUG(dbgs() << "Successfully found all loop components\n");
		return true;
		}

		static bool checkPHIs(struct FlattenInfo &FI,
		const TargetTransformInfo *TTI) {
		// All PHIs in the inner and outer headers must either be:
		// - The induction PHI, which we are going to rewrite as one induction in
		// the new loop. This is already checked by findLoopComponents.
		// - An outer header PHI with all incoming values from outside the loop.
		// LoopSimplify guarantees we have a pre-header, so we don't need to
		// worry about that here.
		// - Pairs of PHIs in the inner and outer headers, which implement a
		// loop-carried dependency that will still be valid in the new loop. To
		// be valid, this variable must be modified only in the inner loop.

		// The set of PHI nodes in the outer loop header that we know will still be
		// valid after the transformation. These will not need to be modified (with
		// the exception of the induction variable), but we do need to check that
		// there are no unsafe PHI nodes.
		SmallPtrSet<PHINode *, 4> SafeOuterPHIs;
		SafeOuterPHIs.insert(FI.OuterInductionPHI);

		// Check that all PHI nodes in the inner loop header match one of the valid
		// patterns.
		for (PHINode &InnerPHI : FI.InnerLoop->getHeader()->phis()) {
		// The induction PHIs break these rules, and that's OK because we treat
		// them specially when doing the transformation.
		if (&InnerPHI == FI.InnerInductionPHI)
		continue;

		// Each inner loop PHI node must have two incoming values/blocks - one
		// from the pre-header, and one from the latch.
		assert(InnerPHI.getNumIncomingValues() == 2);
		Value *PreHeaderValue =
		InnerPHI.getIncomingValueForBlock(FI.InnerLoop->getLoopPreheader());
		Value *LatchValue =
		InnerPHI.getIncomingValueForBlock(FI.InnerLoop->getLoopLatch());

		// The incoming value from the outer loop must be the PHI node in the
		// outer loop header, with no modifications made in the top of the outer
		// loop.
		PHINode *OuterPHI = dyn_cast<PHINode>(PreHeaderValue);
		if (!OuterPHI \|\| OuterPHI->getParent() != FI.OuterLoop->getHeader()) {
		LLVM_DEBUG(dbgs() << "value modified in top of outer loop\n");
		return false;
		}

		// The other incoming value must come from the inner loop, without any
		// modifications in the tail end of the outer loop. We are in LCSSA form,
		// so this will actually be a PHI in the inner loop's exit block, which
		// only uses values from inside the inner loop.
		PHINode *LCSSAPHI = dyn_cast<PHINode>(
		OuterPHI->getIncomingValueForBlock(FI.OuterLoop->getLoopLatch()));
		if (!LCSSAPHI) {
		LLVM_DEBUG(dbgs() << "could not find LCSSA PHI\n");
		return false;
		}

		// The value used by the LCSSA PHI must be the same one that the inner
		// loop's PHI uses.
		if (LCSSAPHI->hasConstantValue() != LatchValue) {
		LLVM_DEBUG(
		dbgs() << "LCSSA PHI incoming value does not match latch value\n");
		return false;
		}

		LLVM_DEBUG(dbgs() << "PHI pair is safe:\n");
		LLVM_DEBUG(dbgs() << " Inner: "; InnerPHI.dump());
		LLVM_DEBUG(dbgs() << " Outer: "; OuterPHI->dump());
		SafeOuterPHIs.insert(OuterPHI);
		FI.InnerPHIsToTransform.insert(&InnerPHI);
		}

		for (PHINode &OuterPHI : FI.OuterLoop->getHeader()->phis()) {
		if (!SafeOuterPHIs.count(&OuterPHI)) {
		LLVM_DEBUG(dbgs() << "found unsafe PHI in outer loop: "; OuterPHI.dump());
		return false;
		}
		}

		return true;
		}

		static bool
		checkOuterLoopInsts(struct FlattenInfo &FI,
		SmallPtrSetImpl<Instruction *> &IterationInstructions,
		const TargetTransformInfo *TTI) {
		// Check for instructions in the outer but not inner loop. If any of these
		// have side-effects then this transformation is not legal, and if there is
		// a significant amount of code here which can't be optimised out that it's
		// not profitable (as these instructions would get executed for each
		// iteration of the inner loop).
		unsigned RepeatedInstrCost = 0;
		for (auto *B : FI.OuterLoop->getBlocks()) {
		if (FI.InnerLoop->contains(B))
		continue;

		for (auto &I : *B) {
		if (!isa<PHINode>(&I) && !I.isTerminator() &&
		!isSafeToSpeculativelyExecute(&I)) {
		LLVM_DEBUG(dbgs() << "Cannot flatten because instruction may have "
		"side effects: ";
		I.dump());
		return false;
		}
		// The execution count of the outer loop's iteration instructions
		// (increment, compare and branch) will be increased, but the
		// equivalent instructions will be removed from the inner loop, so
		// they make a net difference of zero.
		if (IterationInstructions.count(&I))
		continue;
		// The uncoditional branch to the inner loop's header will turn into
		// a fall-through, so adds no cost.
		BranchInst *Br = dyn_cast<BranchInst>(&I);
		if (Br && Br->isUnconditional() &&
		Br->getSuccessor(0) == FI.InnerLoop->getHeader())
		continue;
		// Multiplies of the outer iteration variable and inner iteration
		// count will be optimised out.
		if (match(&I, m_c_Mul(m_Specific(FI.OuterInductionPHI),
		m_Specific(FI.InnerLimit))))
		continue;
		int Cost = TTI->getUserCost(&I, TargetTransformInfo::TCK_SizeAndLatency);
		LLVM_DEBUG(dbgs() << "Cost " << Cost << ": "; I.dump());
		RepeatedInstrCost += Cost;
		}
		}

		LLVM_DEBUG(dbgs() << "Cost of instructions that will be repeated: "
		<< RepeatedInstrCost << "\n");
		// Bail out if flattening the loops would cause instructions in the outer
		// loop but not in the inner loop to be executed extra times.
		if (RepeatedInstrCost > RepeatedInstructionThreshold)
		return false;

		return true;
		}

		static bool checkIVUsers(struct FlattenInfo &FI) {
		//SmallPtrSetImpl<Value *> &LinearIVUses) {
		// We require all uses of both induction variables to match this pattern:
		//
		// (OuterPHI * InnerLimit) + InnerPHI
		//
		// Any uses of the induction variables not matching that pattern would
		// require a div/mod to reconstruct in the flattened loop, so the
		// transformation wouldn't be profitable.

		// Check that all uses of the inner loop's induction variable match the
		// expected pattern, recording the uses of the outer IV.
		SmallPtrSet<Value *, 4> ValidOuterPHIUses;
		for (User *U : FI.InnerInductionPHI->users()) {
		if (U == FI.InnerIncrement)
		continue;

		LLVM_DEBUG(dbgs() << "Found use of inner induction variable: "; U->dump());

		Value MatchedMul, MatchedItCount;
		if (match(U, m_c_Add(m_Specific(FI.InnerInductionPHI), m_Value(MatchedMul))) &&
		match(MatchedMul,
		m_c_Mul(m_Specific(FI.OuterInductionPHI), m_Value(MatchedItCount))) &&
		MatchedItCount == FI.InnerLimit) {
		LLVM_DEBUG(dbgs() << "Use is optimisable\n");
		ValidOuterPHIUses.insert(MatchedMul);
		FI.LinearIVUses.insert(U);
		} else {
		LLVM_DEBUG(dbgs() << "Did not match expected pattern, bailing\n");
		return false;
		}
		}

		// Check that there are no uses of the outer IV other than the ones found
		// as part of the pattern above.
		for (User *U : FI.OuterInductionPHI->users()) {
		if (U == FI.OuterIncrement)
		continue;

		LLVM_DEBUG(dbgs() << "Found use of outer induction variable: "; U->dump());

		if (!ValidOuterPHIUses.count(U)) {
		LLVM_DEBUG(dbgs() << "Did not match expected pattern, bailing\n");
		return false;
		} else {
		LLVM_DEBUG(dbgs() << "Use is optimisable\n");
		}
		}

		LLVM_DEBUG(dbgs() << "Found " << FI.LinearIVUses.size()
		<< " value(s) that can be replaced:\n";
		for (Value *V : FI.LinearIVUses) {
		dbgs() << " ";
		V->dump();
		});

		return true;
		}

		// Return an OverflowResult dependant on if overflow of the multiplication of
		// InnerLimit and OuterLimit can be assumed not to happen.
		static OverflowResult checkOverflow(struct FlattenInfo &FI,
		DominatorTree DT, AssumptionCache AC) {
		Function *F = FI.OuterLoop->getHeader()->getParent();
		const DataLayout &DL = F->getParent()->getDataLayout();

		// For debugging/testing.
		if (AssumeNoOverflow)
		return OverflowResult::NeverOverflows;

		// Check if the multiply could not overflow due to known ranges of the
		// input values.
		OverflowResult OR = computeOverflowForUnsignedMul(
		FI.InnerLimit, FI.OuterLimit, DL, AC,
		FI.OuterLoop->getLoopPreheader()->getTerminator(), DT);
		if (OR != OverflowResult::MayOverflow)
		return OR;

		for (Value *V : FI.LinearIVUses) {
		for (Value *U : V->users()) {
		if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
		// The IV is used as the operand of a GEP, and the IV is at least as
		// wide as the address space of the GEP. In this case, the GEP would
		// wrap around the address space before the IV increment wraps, which
		// would be UB.
		if (GEP->isInBounds() &&
		V->getType()->getIntegerBitWidth() >=
		DL.getPointerTypeSizeInBits(GEP->getType())) {
		LLVM_DEBUG(
		dbgs() << "use of linear IV would be UB if overflow occurred: ";
		GEP->dump());
		return OverflowResult::NeverOverflows;
		}
		}
		}
		}

		return OverflowResult::MayOverflow;
		}

		static bool CanFlattenLoopPair(struct FlattenInfo &FI, DominatorTree *DT,
		LoopInfo LI, ScalarEvolution SE,
		AssumptionCache AC, const TargetTransformInfo TTI,
		std::function<void(Loop *)> markLoopAsDeleted) {
		Function *F = FI.OuterLoop->getHeader()->getParent();

		LLVM_DEBUG(dbgs() << "Loop flattening running on outer loop "
		<< FI.OuterLoop->getHeader()->getName() << " and inner loop "
		<< FI.InnerLoop->getHeader()->getName() << " in "
		<< F->getName() << "\n");

		SmallPtrSet<Instruction *, 8> IterationInstructions;

		if (!findLoopComponents(FI.InnerLoop, IterationInstructions, FI.InnerInductionPHI,
		FI.InnerLimit, FI.InnerIncrement, FI.InnerBranch, SE))
		return false;
		if (!findLoopComponents(FI.OuterLoop, IterationInstructions, FI.OuterInductionPHI,
		FI.OuterLimit, FI.OuterIncrement, FI.OuterBranch, SE))
		return false;

		// Both of the loop limit values must be invariant in the outer loop
		// (non-instructions are all inherently invariant).
		bool Changed;
		if (!FI.OuterLoop->makeLoopInvariant(FI.InnerLimit, Changed)) {
		LLVM_DEBUG(dbgs() << "inner loop limit not invariant\n");
		return false;
		}
		if (!FI.OuterLoop->makeLoopInvariant(FI.OuterLimit, Changed)) {
		LLVM_DEBUG(dbgs() << "outer loop limit not invariant\n");
		return false;
		}

		if (!checkPHIs(FI, TTI))
		return false;

		if (!checkOuterLoopInsts(FI, IterationInstructions, TTI))
		return false;

		// Find the values in the loop that can be replaced with the linearized
		// induction variable, and check that there are no other uses of the inner
		// or outer induction variable. If there were, we could still do this
		// transformation, but we'd have to insert a div/mod to calculate the
		// original IVs, so it wouldn't be profitable.
		if (!checkIVUsers(FI))
		return false;

		return true;
		}

		static void FlattenLoopPair(struct FlattenInfo &FI, DominatorTree *DT,
		LoopInfo LI, ScalarEvolution SE,
		AssumptionCache AC, const TargetTransformInfo TTI,
		std::function<void(Loop *)> markLoopAsDeleted) {
		LLVM_DEBUG(dbgs() << "Checks all passed, doing the transformation\n");
		Function *F = FI.OuterLoop->getHeader()->getParent();

		{
		using namespace ore;
		OptimizationRemark Remark(DEBUG_TYPE, "Flattened", FI.InnerLoop->getStartLoc(),
		FI.InnerLoop->getHeader());
		OptimizationRemarkEmitter ORE(F);
		Remark << "Flattened into outer loop";
		ORE.emit(Remark);
		}

		Value *NewTripCount =
		BinaryOperator::CreateMul(FI.InnerLimit, FI.OuterLimit, "flatten.tripcount",
		FI.OuterLoop->getLoopPreheader()->getTerminator());
		LLVM_DEBUG(dbgs() << "Created new trip count in preheader: ";
		NewTripCount->dump());

		// Fix up PHI nodes that take values from the inner loop back-edge, which
		// we are about to remove.
		FI.InnerInductionPHI->removeIncomingValue(FI.InnerLoop->getLoopLatch());
		for (PHINode *PHI : FI.InnerPHIsToTransform)
		PHI->removeIncomingValue(FI.InnerLoop->getLoopLatch());

		// Modify the trip count of the outer loop to be the product of the two
		// trip counts.
		cast<User>(FI.OuterBranch->getCondition())->setOperand(1, NewTripCount);

		// Replace the inner loop backedge with an unconditional branch to the exit.
		BasicBlock *InnerExitBlock = FI.InnerLoop->getExitBlock();
		BasicBlock *InnerExitingBlock = FI.InnerLoop->getExitingBlock();
		InnerExitingBlock->getTerminator()->eraseFromParent();
		BranchInst::Create(InnerExitBlock, InnerExitingBlock);
		DT->deleteEdge(InnerExitingBlock, FI.InnerLoop->getHeader());

		// Replace all uses of the polynomial calculated from the two induction
		// variables with the one new one.
		for (Value *V : FI.LinearIVUses)
		V->replaceAllUsesWith(FI.OuterInductionPHI);

		// Tell LoopInfo, SCEV and the pass manager that the inner loop has been
		// deleted, and any information that have about the outer loop invalidated.
		markLoopAsDeleted(FI.InnerLoop);
		SE->forgetLoop(FI.OuterLoop);
		SE->forgetLoop(FI.InnerLoop);
		LI->erase(FI.InnerLoop);
		}

		bool IndVarSimplify::tryFlattenLoopPair(Loop *L,
		SmallVectorImpl<WeakTrackingVH> &DeadInsts, SCEVExpander &Rewriter) {
		if (!EnableLoopFlatten)
		return false;
		if (!L->getParentLoop()) {
		return false;
		}

		struct FlattenInfo FI(L->getParentLoop(), L);

		if (!CanFlattenLoopPair(FI, DT, LI, SE, AC, TTI, markLoopAsDeleted))
		return false;

		LLVM_DEBUG(dbgs() << "INDVARS: flattening loop nest!\n");

		Module *M = L->getHeader()->getParent()->getParent();
		auto &DL = M->getDataLayout();
		auto *InnerType = FI.InnerInductionPHI->getType();
		auto *OuterType = FI.OuterInductionPHI->getType();
		unsigned MaxLegalSize = DL.getLargestLegalIntTypeSizeInBits();
		auto *MaxLegalType = DL.getLargestLegalIntType(M->getContext());

		// If both induction types are less than maximum integer width, promote
		// both to the widest type available so we know calculating Limit * Limit
		// as the new trip count is safe.
		if (InnerType == OuterType &&
		InnerType->getScalarSizeInBits() < MaxLegalSize) {
		LLVM_DEBUG(dbgs() << "Promote induction phis to " << MaxLegalSize << "\n");

		SmallVector<WideIVInfo, 8> WideIVs;
		auto AddCandidatePhi = [&] (BasicBlock::iterator I) {
		for ( ; isa<PHINode>(I); ++I) {
		LLVM_DEBUG(dbgs() << "INDVARS: widen phi: "; cast<PHINode>(I)->dump());
		WideIVs.push_back( {cast<PHINode>(I), MaxLegalType, false });
		}
		};

		AddCandidatePhi(L->getHeader()->begin());
		AddCandidatePhi(L->getParentLoop()->getHeader()->begin());

		for (; !WideIVs.empty(); WideIVs.pop_back()) {
		WidenIV Widener(WideIVs.back(), LI, SE, DT, DeadInsts, true);
		if (PHINode *WidePhi = Widener.createWideIV(Rewriter)) {
		LLVM_DEBUG(dbgs() << "INDVARS: created wide phi: "; WidePhi->dump());
		} else {
		return false;
		}
		}
		} else if (checkOverflow(FI, DT, AC) == OverflowResult::MayOverflow) {
		LLVM_DEBUG(dbgs() << "INDVARS: overflow possible, bailing...\n");
		return false;
		} else {
		// TODO: support different sized phis.
		return false;
		}

		FlattenLoopPair(FI, DT, LI, SE, AC, TTI, markLoopAsDeleted);
		return true;
		}

		//===----------------------------------------------------------------------===//
// IndVarSimplify driver. Manage several subpasses of IV simplification.		// IndVarSimplify driver. Manage several subpasses of IV simplification.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

bool IndVarSimplify::run(Loop *L) {		bool IndVarSimplify::run(Loop *L) {
// We need (and expect!) the incoming loop to be in LCSSA.		// We need (and expect!) the incoming loop to be in LCSSA.
assert(L->isRecursivelyLCSSAForm(DT, LI) &&		assert(L->isRecursivelyLCSSAForm(DT, LI) &&
"LCSSA required to run indvars!");		"LCSSA required to run indvars!");

Show All 32 Lines	#endif

// Eliminate redundant IV users.		// Eliminate redundant IV users.
//		//
// Simplification works best when run before other consumers of SCEV. We		// Simplification works best when run before other consumers of SCEV. We
// attempt to avoid evaluating SCEVs for sign/zero extend operations until		// attempt to avoid evaluating SCEVs for sign/zero extend operations until
// other expressions involving loop IVs have been evaluated. This helps SCEV		// other expressions involving loop IVs have been evaluated. This helps SCEV
// set no-wrap flags before normalizing sign/zero extension.		// set no-wrap flags before normalizing sign/zero extension.
Rewriter.disableCanonicalMode();		Rewriter.disableCanonicalMode();

		Changed \|= tryFlattenLoopPair(L, DeadInsts, Rewriter);
Changed \|= simplifyAndExtend(L, Rewriter, LI);		Changed \|= simplifyAndExtend(L, Rewriter, LI);

// Check to see if we can compute the final value of any expressions		// Check to see if we can compute the final value of any expressions
// that are recurrent in the loop, and substitute the exit values from the		// that are recurrent in the loop, and substitute the exit values from the
// loop into any instructions outside of the loop that use the final values		// loop into any instructions outside of the loop that use the final values
// of the current expressions.		// of the current expressions.
if (ReplaceExitValue != NeverRepl) {		if (ReplaceExitValue != NeverRepl) {
if (int Rewrites = rewriteLoopExitValues(L, LI, TLI, SE, TTI, Rewriter, DT,		if (int Rewrites = rewriteLoopExitValues(L, LI, TLI, SE, TTI, Rewriter, DT,
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	if (VerifyMemorySSA && MSSAU)
MSSAU->getMemorySSA()->verifyMemorySSA();		MSSAU->getMemorySSA()->verifyMemorySSA();
#endif		#endif

return Changed;		return Changed;
}		}

PreservedAnalyses IndVarSimplifyPass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses IndVarSimplifyPass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &) {		LPMUpdater &Updater) {
Function *F = L.getHeader()->getParent();		Function *F = L.getHeader()->getParent();
const DataLayout &DL = F->getParent()->getDataLayout();		const DataLayout &DL = F->getParent()->getDataLayout();

IndVarSimplify IVS(&AR.LI, &AR.SE, &AR.DT, DL, &AR.TLI, &AR.TTI, AR.MSSA);		std::string LoopName(L.getName());
		IndVarSimplify IVS(&AR.LI, &AR.SE, &AR.DT, DL, &AR.TLI, &AR.TTI, AR.MSSA, &AR.AC,
		[&](Loop L) { Updater.markLoopAsDeleted(L, LoopName); });

if (!IVS.run(&L))		if (!IVS.run(&L))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

auto PA = getLoopPassPreservedAnalyses();		auto PA = getLoopPassPreservedAnalyses();
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
if (AR.MSSA)		if (AR.MSSA)
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();
return PA;		return PA;
Show All 16 Lines	bool runOnLoop(Loop *L, LPPassManager &LPM) override {
auto *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();		auto *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto *TLIP = getAnalysisIfAvailable<TargetLibraryInfoWrapperPass>();		auto *TLIP = getAnalysisIfAvailable<TargetLibraryInfoWrapperPass>();
auto TLI = TLIP ? &TLIP->getTLI(L->getHeader()->getParent()) : nullptr;		auto TLI = TLIP ? &TLIP->getTLI(L->getHeader()->getParent()) : nullptr;
auto *TTIP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>();		auto *TTIP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>();
auto TTI = TTIP ? &TTIP->getTTI(L->getHeader()->getParent()) : nullptr;		auto TTI = TTIP ? &TTIP->getTTI(L->getHeader()->getParent()) : nullptr;
const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();		const DataLayout &DL = L->getHeader()->getModule()->getDataLayout();
auto *MSSAAnalysis = getAnalysisIfAvailable<MemorySSAWrapperPass>();		auto *MSSAAnalysis = getAnalysisIfAvailable<MemorySSAWrapperPass>();
		auto *AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(
		*L->getHeader()->getParent());

MemorySSA *MSSA = nullptr;		MemorySSA *MSSA = nullptr;
if (MSSAAnalysis)		if (MSSAAnalysis)
MSSA = &MSSAAnalysis->getMSSA();		MSSA = &MSSAAnalysis->getMSSA();

IndVarSimplify IVS(LI, SE, DT, DL, TLI, TTI, MSSA);		IndVarSimplify IVS(LI, SE, DT, DL, TLI, TTI, MSSA, AC,
		[&](Loop L) { LPM.markLoopAsDeleted(L); });
return IVS.run(L);		return IVS.run(L);
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addPreserved<MemorySSAWrapperPass>();		AU.addPreserved<MemorySSAWrapperPass>();
getLoopAnalysisUsage(AU);		getLoopAnalysisUsage(AU);
		AU.addRequired<AssumptionCacheTracker>();
		AU.addPreserved<AssumptionCacheTracker>();
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

char IndVarSimplifyLegacyPass::ID = 0;		char IndVarSimplifyLegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(IndVarSimplifyLegacyPass, "indvars",		INITIALIZE_PASS_BEGIN(IndVarSimplifyLegacyPass, "indvars",
"Induction Variable Simplification", false, false)		"Induction Variable Simplification", false, false)
INITIALIZE_PASS_DEPENDENCY(LoopPass)		INITIALIZE_PASS_DEPENDENCY(LoopPass)
		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_END(IndVarSimplifyLegacyPass, "indvars",		INITIALIZE_PASS_END(IndVarSimplifyLegacyPass, "indvars",
"Induction Variable Simplification", false, false)		"Induction Variable Simplification", false, false)

Pass *llvm::createIndVarSimplifyPass() {		Pass *llvm::createIndVarSimplifyPass() {
return new IndVarSimplifyLegacyPass();		return new IndVarSimplifyLegacyPass();
}		}

llvm/lib/Transforms/Scalar/LoopFlatten.cpp

This file was deleted.

	//===- LoopFlatten.cpp - Loop flattening pass------------------------------===//
	//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//
	//===----------------------------------------------------------------------===//
	//
	// This pass flattens pairs nested loops into a single loop.
	//
	// The intention is to optimise loop nests like this, which together access an
	// array linearly:
	// for (int i = 0; i < N; ++i)
	// for (int j = 0; j < M; ++j)
	// f(A[i*M+j]);
	// into one loop:
	// for (int i = 0; i < (N*M); ++i)
	// f(A[i]);
	//
	// It can also flatten loops where the induction variables are not used in the
	// loop. This is only worth doing if the induction variables are only used in an
	// expression like i*M+j. If they had any other uses, we would have to insert a
	// div/mod to reconstruct the original values, so this wouldn't be profitable.
	//
	// We also need to prove that N*M will not overflow.
	//
	//===----------------------------------------------------------------------===//

	#include "llvm/Transforms/Scalar/LoopFlatten.h"
	#include "llvm/Analysis/AssumptionCache.h"
	#include "llvm/Analysis/LoopInfo.h"
	#include "llvm/Analysis/LoopPass.h"
	#include "llvm/Analysis/OptimizationRemarkEmitter.h"
	#include "llvm/Analysis/ScalarEvolution.h"
	#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/Analysis/ValueTracking.h"
	#include "llvm/IR/Dominators.h"
	#include "llvm/IR/Function.h"
	#include "llvm/IR/Module.h"
	#include "llvm/IR/PatternMatch.h"
	#include "llvm/IR/Verifier.h"
	#include "llvm/InitializePasses.h"
	#include "llvm/Pass.h"
	#include "llvm/Support/Debug.h"
	#include "llvm/Support/raw_ostream.h"
	#include "llvm/Transforms/Scalar.h"
	#include "llvm/Transforms/Utils/LoopUtils.h"

	#define DEBUG_TYPE "loop-flatten"

	using namespace llvm;
	using namespace llvm::PatternMatch;

	static cl::opt<unsigned> RepeatedInstructionThreshold(
	"loop-flatten-cost-threshold", cl::Hidden, cl::init(2),
	cl::desc("Limit on the cost of instructions that can be repeated due to "
	"loop flattening"));

	static cl::opt<bool>
	AssumeNoOverflow("loop-flatten-assume-no-overflow", cl::Hidden,
	cl::init(false),
	cl::desc("Assume that the product of the two iteration "
	"limits will never overflow"));

	// Finds the induction variable, increment and limit for a simple loop that we
	// can flatten.
	static bool findLoopComponents(
	Loop L, SmallPtrSetImpl<Instruction > &IterationInstructions,
	PHINode &InductionPHI, Value &Limit, BinaryOperator *&Increment,
	BranchInst &BackBranch, ScalarEvolution SE) {
	LLVM_DEBUG(dbgs() << "Finding components of loop: " << L->getName() << "\n");

	if (!L->isLoopSimplifyForm()) {
	LLVM_DEBUG(dbgs() << "Loop is not in normal form\n");
	return false;
	}

	// There must be exactly one exiting block, and it must be the same at the
	// latch.
	BasicBlock *Latch = L->getLoopLatch();
	if (L->getExitingBlock() != Latch) {
	LLVM_DEBUG(dbgs() << "Exiting and latch block are different\n");
	return false;
	}
	// Latch block must end in a conditional branch.
	BackBranch = dyn_cast<BranchInst>(Latch->getTerminator());
	if (!BackBranch \|\| !BackBranch->isConditional()) {
	LLVM_DEBUG(dbgs() << "Could not find back-branch\n");
	return false;
	}
	IterationInstructions.insert(BackBranch);
	LLVM_DEBUG(dbgs() << "Found back branch: "; BackBranch->dump());
	bool ContinueOnTrue = L->contains(BackBranch->getSuccessor(0));

	// Find the induction PHI. If there is no induction PHI, we can't do the
	// transformation. TODO: could other variables trigger this? Do we have to
	// search for the best one?
	InductionPHI = nullptr;
	for (PHINode &PHI : L->getHeader()->phis()) {
	InductionDescriptor ID;
	if (InductionDescriptor::isInductionPHI(&PHI, L, SE, ID)) {
	InductionPHI = &PHI;
	LLVM_DEBUG(dbgs() << "Found induction PHI: "; InductionPHI->dump());
	break;
	}
	}
	if (!InductionPHI) {
	LLVM_DEBUG(dbgs() << "Could not find induction PHI\n");
	return false;
	}

	auto IsValidPredicate = [&](ICmpInst::Predicate Pred) {
	if (ContinueOnTrue)
	return Pred == CmpInst::ICMP_NE \|\| Pred == CmpInst::ICMP_ULT;
	else
	return Pred == CmpInst::ICMP_EQ;
	};

	// Find Compare and make sure it is valid
	ICmpInst *Compare = dyn_cast<ICmpInst>(BackBranch->getCondition());
	if (!Compare \|\| !IsValidPredicate(Compare->getUnsignedPredicate()) \|\|
	Compare->hasNUsesOrMore(2)) {
	LLVM_DEBUG(dbgs() << "Could not find valid comparison\n");
	return false;
	}
	IterationInstructions.insert(Compare);
	LLVM_DEBUG(dbgs() << "Found comparison: "; Compare->dump());

	// Find increment and limit from the compare
	Increment = nullptr;
	if (match(Compare->getOperand(0),
	m_c_Add(m_Specific(InductionPHI), m_ConstantInt<1>()))) {
	Increment = dyn_cast<BinaryOperator>(Compare->getOperand(0));
	Limit = Compare->getOperand(1);
	} else if (Compare->getUnsignedPredicate() == CmpInst::ICMP_NE &&
	match(Compare->getOperand(1),
	m_c_Add(m_Specific(InductionPHI), m_ConstantInt<1>()))) {
	Increment = dyn_cast<BinaryOperator>(Compare->getOperand(1));
	Limit = Compare->getOperand(0);
	}
	if (!Increment \|\| Increment->hasNUsesOrMore(3)) {
	LLVM_DEBUG(dbgs() << "Cound not find valid increment\n");
	return false;
	}
	IterationInstructions.insert(Increment);
	LLVM_DEBUG(dbgs() << "Found increment: "; Increment->dump());
	LLVM_DEBUG(dbgs() << "Found limit: "; Limit->dump());

	assert(InductionPHI->getNumIncomingValues() == 2);
	assert(InductionPHI->getIncomingValueForBlock(Latch) == Increment &&
	"PHI value is not increment inst");

	auto *CI = dyn_cast<ConstantInt>(
	InductionPHI->getIncomingValueForBlock(L->getLoopPreheader()));
	if (!CI \|\| !CI->isZero()) {
	LLVM_DEBUG(dbgs() << "PHI value is not zero: "; CI->dump());
	return false;
	}

	LLVM_DEBUG(dbgs() << "Successfully found all loop components\n");
	return true;
	}

	static bool checkPHIs(Loop OuterLoop, Loop InnerLoop,
	SmallPtrSetImpl<PHINode *> &InnerPHIsToTransform,
	PHINode InnerInductionPHI, PHINode OuterInductionPHI,
	TargetTransformInfo *TTI) {
	// All PHIs in the inner and outer headers must either be:
	// - The induction PHI, which we are going to rewrite as one induction in
	// the new loop. This is already checked by findLoopComponents.
	// - An outer header PHI with all incoming values from outside the loop.
	// LoopSimplify guarantees we have a pre-header, so we don't need to
	// worry about that here.
	// - Pairs of PHIs in the inner and outer headers, which implement a
	// loop-carried dependency that will still be valid in the new loop. To
	// be valid, this variable must be modified only in the inner loop.

	// The set of PHI nodes in the outer loop header that we know will still be
	// valid after the transformation. These will not need to be modified (with
	// the exception of the induction variable), but we do need to check that
	// there are no unsafe PHI nodes.
	SmallPtrSet<PHINode *, 4> SafeOuterPHIs;
	SafeOuterPHIs.insert(OuterInductionPHI);

	// Check that all PHI nodes in the inner loop header match one of the valid
	// patterns.
	for (PHINode &InnerPHI : InnerLoop->getHeader()->phis()) {
	// The induction PHIs break these rules, and that's OK because we treat
	// them specially when doing the transformation.
	if (&InnerPHI == InnerInductionPHI)
	continue;

	// Each inner loop PHI node must have two incoming values/blocks - one
	// from the pre-header, and one from the latch.
	assert(InnerPHI.getNumIncomingValues() == 2);
	Value *PreHeaderValue =
	InnerPHI.getIncomingValueForBlock(InnerLoop->getLoopPreheader());
	Value *LatchValue =
	InnerPHI.getIncomingValueForBlock(InnerLoop->getLoopLatch());

	// The incoming value from the outer loop must be the PHI node in the
	// outer loop header, with no modifications made in the top of the outer
	// loop.
	PHINode *OuterPHI = dyn_cast<PHINode>(PreHeaderValue);
	if (!OuterPHI \|\| OuterPHI->getParent() != OuterLoop->getHeader()) {
	LLVM_DEBUG(dbgs() << "value modified in top of outer loop\n");
	return false;
	}

	// The other incoming value must come from the inner loop, without any
	// modifications in the tail end of the outer loop. We are in LCSSA form,
	// so this will actually be a PHI in the inner loop's exit block, which
	// only uses values from inside the inner loop.
	PHINode *LCSSAPHI = dyn_cast<PHINode>(
	OuterPHI->getIncomingValueForBlock(OuterLoop->getLoopLatch()));
	if (!LCSSAPHI) {
	LLVM_DEBUG(dbgs() << "could not find LCSSA PHI\n");
	return false;
	}

	// The value used by the LCSSA PHI must be the same one that the inner
	// loop's PHI uses.
	if (LCSSAPHI->hasConstantValue() != LatchValue) {
	LLVM_DEBUG(
	dbgs() << "LCSSA PHI incoming value does not match latch value\n");
	return false;
	}

	LLVM_DEBUG(dbgs() << "PHI pair is safe:\n");
	LLVM_DEBUG(dbgs() << " Inner: "; InnerPHI.dump());
	LLVM_DEBUG(dbgs() << " Outer: "; OuterPHI->dump());
	SafeOuterPHIs.insert(OuterPHI);
	InnerPHIsToTransform.insert(&InnerPHI);
	}

	for (PHINode &OuterPHI : OuterLoop->getHeader()->phis()) {
	if (!SafeOuterPHIs.count(&OuterPHI)) {
	LLVM_DEBUG(dbgs() << "found unsafe PHI in outer loop: "; OuterPHI.dump());
	return false;
	}
	}

	return true;
	}

	static bool
	checkOuterLoopInsts(Loop OuterLoop, Loop InnerLoop,
	SmallPtrSetImpl<Instruction *> &IterationInstructions,
	Value InnerLimit, PHINode OuterPHI,
	TargetTransformInfo *TTI) {
	// Check for instructions in the outer but not inner loop. If any of these
	// have side-effects then this transformation is not legal, and if there is
	// a significant amount of code here which can't be optimised out that it's
	// not profitable (as these instructions would get executed for each
	// iteration of the inner loop).
	unsigned RepeatedInstrCost = 0;
	for (auto *B : OuterLoop->getBlocks()) {
	if (InnerLoop->contains(B))
	continue;

	for (auto &I : *B) {
	if (!isa<PHINode>(&I) && !I.isTerminator() &&
	!isSafeToSpeculativelyExecute(&I)) {
	LLVM_DEBUG(dbgs() << "Cannot flatten because instruction may have "
	"side effects: ";
	I.dump());
	return false;
	}
	// The execution count of the outer loop's iteration instructions
	// (increment, compare and branch) will be increased, but the
	// equivalent instructions will be removed from the inner loop, so
	// they make a net difference of zero.
	if (IterationInstructions.count(&I))
	continue;
	// The uncoditional branch to the inner loop's header will turn into
	// a fall-through, so adds no cost.
	BranchInst *Br = dyn_cast<BranchInst>(&I);
	if (Br && Br->isUnconditional() &&
	Br->getSuccessor(0) == InnerLoop->getHeader())
	continue;
	// Multiplies of the outer iteration variable and inner iteration
	// count will be optimised out.
	if (match(&I, m_c_Mul(m_Specific(OuterPHI), m_Specific(InnerLimit))))
	continue;
	int Cost = TTI->getUserCost(&I, TargetTransformInfo::TCK_SizeAndLatency);
	LLVM_DEBUG(dbgs() << "Cost " << Cost << ": "; I.dump());
	RepeatedInstrCost += Cost;
	}
	}

	LLVM_DEBUG(dbgs() << "Cost of instructions that will be repeated: "
	<< RepeatedInstrCost << "\n");
	// Bail out if flattening the loops would cause instructions in the outer
	// loop but not in the inner loop to be executed extra times.
	if (RepeatedInstrCost > RepeatedInstructionThreshold)
	return false;

	return true;
	}

	static bool checkIVUsers(PHINode InnerPHI, PHINode OuterPHI,
	BinaryOperator *InnerIncrement,
	BinaryOperator OuterIncrement, Value InnerLimit,
	SmallPtrSetImpl<Value *> &LinearIVUses) {
	// We require all uses of both induction variables to match this pattern:
	//
	// (OuterPHI * InnerLimit) + InnerPHI
	//
	// Any uses of the induction variables not matching that pattern would
	// require a div/mod to reconstruct in the flattened loop, so the
	// transformation wouldn't be profitable.

	// Check that all uses of the inner loop's induction variable match the
	// expected pattern, recording the uses of the outer IV.
	SmallPtrSet<Value *, 4> ValidOuterPHIUses;
	for (User *U : InnerPHI->users()) {
	if (U == InnerIncrement)
	continue;

	LLVM_DEBUG(dbgs() << "Found use of inner induction variable: "; U->dump());

	Value MatchedMul, MatchedItCount;
	if (match(U, m_c_Add(m_Specific(InnerPHI), m_Value(MatchedMul))) &&
	match(MatchedMul,
	m_c_Mul(m_Specific(OuterPHI), m_Value(MatchedItCount))) &&
	MatchedItCount == InnerLimit) {
	LLVM_DEBUG(dbgs() << "Use is optimisable\n");
	ValidOuterPHIUses.insert(MatchedMul);
	LinearIVUses.insert(U);
	} else {
	LLVM_DEBUG(dbgs() << "Did not match expected pattern, bailing\n");
	return false;
	}
	}

	// Check that there are no uses of the outer IV other than the ones found
	// as part of the pattern above.
	for (User *U : OuterPHI->users()) {
	if (U == OuterIncrement)
	continue;

	LLVM_DEBUG(dbgs() << "Found use of outer induction variable: "; U->dump());

	if (!ValidOuterPHIUses.count(U)) {
	LLVM_DEBUG(dbgs() << "Did not match expected pattern, bailing\n");
	return false;
	} else {
	LLVM_DEBUG(dbgs() << "Use is optimisable\n");
	}
	}

	LLVM_DEBUG(dbgs() << "Found " << LinearIVUses.size()
	<< " value(s) that can be replaced:\n";
	for (Value *V : LinearIVUses) {
	dbgs() << " ";
	V->dump();
	});

	return true;
	}

	// Return an OverflowResult dependant on if overflow of the multiplication of
	// InnerLimit and OuterLimit can be assumed not to happen.
	static OverflowResult checkOverflow(Loop OuterLoop, Value InnerLimit,
	Value *OuterLimit,
	SmallPtrSetImpl<Value *> &LinearIVUses,
	DominatorTree DT, AssumptionCache AC) {
	Function *F = OuterLoop->getHeader()->getParent();
	const DataLayout &DL = F->getParent()->getDataLayout();

	// For debugging/testing.
	if (AssumeNoOverflow)
	return OverflowResult::NeverOverflows;

	// Check if the multiply could not overflow due to known ranges of the
	// input values.
	OverflowResult OR = computeOverflowForUnsignedMul(
	InnerLimit, OuterLimit, DL, AC,
	OuterLoop->getLoopPreheader()->getTerminator(), DT);
	if (OR != OverflowResult::MayOverflow)
	return OR;

	for (Value *V : LinearIVUses) {
	for (Value *U : V->users()) {
	if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
	// The IV is used as the operand of a GEP, and the IV is at least as
	// wide as the address space of the GEP. In this case, the GEP would
	// wrap around the address space before the IV increment wraps, which
	// would be UB.
	if (GEP->isInBounds() &&
	V->getType()->getIntegerBitWidth() >=
	DL.getPointerTypeSizeInBits(GEP->getType())) {
	LLVM_DEBUG(
	dbgs() << "use of linear IV would be UB if overflow occurred: ";
	GEP->dump());
	return OverflowResult::NeverOverflows;
	}
	}
	}
	}

	return OverflowResult::MayOverflow;
	}

	static bool FlattenLoopPair(Loop OuterLoop, Loop InnerLoop, DominatorTree *DT,
	LoopInfo LI, ScalarEvolution SE,
	AssumptionCache AC, TargetTransformInfo TTI,
	std::function<void(Loop *)> markLoopAsDeleted) {
	Function *F = OuterLoop->getHeader()->getParent();

	LLVM_DEBUG(dbgs() << "Loop flattening running on outer loop "
	<< OuterLoop->getHeader()->getName() << " and inner loop "
	<< InnerLoop->getHeader()->getName() << " in "
	<< F->getName() << "\n");

	SmallPtrSet<Instruction *, 8> IterationInstructions;

	PHINode InnerInductionPHI, OuterInductionPHI;
	Value InnerLimit, OuterLimit;
	BinaryOperator InnerIncrement, OuterIncrement;
	BranchInst InnerBranch, OuterBranch;

	if (!findLoopComponents(InnerLoop, IterationInstructions, InnerInductionPHI,
	InnerLimit, InnerIncrement, InnerBranch, SE))
	return false;
	if (!findLoopComponents(OuterLoop, IterationInstructions, OuterInductionPHI,
	OuterLimit, OuterIncrement, OuterBranch, SE))
	return false;

	// Both of the loop limit values must be invariant in the outer loop
	// (non-instructions are all inherently invariant).
	bool Changed;
	if (!OuterLoop->makeLoopInvariant(InnerLimit, Changed)) {
	LLVM_DEBUG(dbgs() << "inner loop limit not invariant\n");
	return false;
	}
	if (!OuterLoop->makeLoopInvariant(OuterLimit, Changed)) {
	LLVM_DEBUG(dbgs() << "outer loop limit not invariant\n");
	return false;
	}

	SmallPtrSet<PHINode *, 4> InnerPHIsToTransform;
	if (!checkPHIs(OuterLoop, InnerLoop, InnerPHIsToTransform, InnerInductionPHI,
	OuterInductionPHI, TTI))
	return false;

	// FIXME: it should be possible to handle different types correctly.
	if (InnerInductionPHI->getType() != OuterInductionPHI->getType())
	return false;

	if (!checkOuterLoopInsts(OuterLoop, InnerLoop, IterationInstructions,
	InnerLimit, OuterInductionPHI, TTI))
	return false;

	// Find the values in the loop that can be replaced with the linearized
	// induction variable, and check that there are no other uses of the inner
	// or outer induction variable. If there were, we could still do this
	// transformation, but we'd have to insert a div/mod to calculate the
	// original IVs, so it wouldn't be profitable.
	SmallPtrSet<Value *, 4> LinearIVUses;
	if (!checkIVUsers(InnerInductionPHI, OuterInductionPHI, InnerIncrement,
	OuterIncrement, InnerLimit, LinearIVUses))
	return false;

	// Check if the new iteration variable might overflow. In this case, we
	// need to version the loop, and select the original version at runtime if
	// the iteration space is too large.
	// TODO: We currently don't version the loop.
	// TODO: it might be worth using a wider iteration variable rather than
	// versioning the loop, if a wide enough type is legal.
	bool MustVersionLoop = true;
	OverflowResult OR =
	checkOverflow(OuterLoop, InnerLimit, OuterLimit, LinearIVUses, DT, AC);
	if (OR == OverflowResult::AlwaysOverflowsHigh \|\|
	OR == OverflowResult::AlwaysOverflowsLow) {
	LLVM_DEBUG(dbgs() << "Multiply would always overflow, so not profitable\n");
	return false;
	} else if (OR == OverflowResult::MayOverflow) {
	LLVM_DEBUG(dbgs() << "Multiply might overflow, not flattening\n");
	} else {
	LLVM_DEBUG(dbgs() << "Multiply cannot overflow, modifying loop in-place\n");
	MustVersionLoop = false;
	}

	// We cannot safely flatten the loop. Exit now.
	if (MustVersionLoop)
	return false;

	// Do the actual transformation.
	LLVM_DEBUG(dbgs() << "Checks all passed, doing the transformation\n");

	{
	using namespace ore;
	OptimizationRemark Remark(DEBUG_TYPE, "Flattened", InnerLoop->getStartLoc(),
	InnerLoop->getHeader());
	OptimizationRemarkEmitter ORE(F);
	Remark << "Flattened into outer loop";
	ORE.emit(Remark);
	}

	Value *NewTripCount =
	BinaryOperator::CreateMul(InnerLimit, OuterLimit, "flatten.tripcount",
	OuterLoop->getLoopPreheader()->getTerminator());
	LLVM_DEBUG(dbgs() << "Created new trip count in preheader: ";
	NewTripCount->dump());

	// Fix up PHI nodes that take values from the inner loop back-edge, which
	// we are about to remove.
	InnerInductionPHI->removeIncomingValue(InnerLoop->getLoopLatch());
	for (PHINode *PHI : InnerPHIsToTransform)
	PHI->removeIncomingValue(InnerLoop->getLoopLatch());

	// Modify the trip count of the outer loop to be the product of the two
	// trip counts.
	cast<User>(OuterBranch->getCondition())->setOperand(1, NewTripCount);

	// Replace the inner loop backedge with an unconditional branch to the exit.
	BasicBlock *InnerExitBlock = InnerLoop->getExitBlock();
	BasicBlock *InnerExitingBlock = InnerLoop->getExitingBlock();
	InnerExitingBlock->getTerminator()->eraseFromParent();
	BranchInst::Create(InnerExitBlock, InnerExitingBlock);
	DT->deleteEdge(InnerExitingBlock, InnerLoop->getHeader());

	// Replace all uses of the polynomial calculated from the two induction
	// variables with the one new one.
	for (Value *V : LinearIVUses)
	V->replaceAllUsesWith(OuterInductionPHI);

	// Tell LoopInfo, SCEV and the pass manager that the inner loop has been
	// deleted, and any information that have about the outer loop invalidated.
	markLoopAsDeleted(InnerLoop);
	SE->forgetLoop(OuterLoop);
	SE->forgetLoop(InnerLoop);
	LI->erase(InnerLoop);

	return true;
	}

	PreservedAnalyses LoopFlattenPass::run(Loop &L, LoopAnalysisManager &AM,
	LoopStandardAnalysisResults &AR,
	LPMUpdater &Updater) {
	if (L.getSubLoops().size() != 1)
	return PreservedAnalyses::all();

	Loop InnerLoop = L.begin();
	std::string LoopName(InnerLoop->getName());
	if (!FlattenLoopPair(
	&L, InnerLoop, &AR.DT, &AR.LI, &AR.SE, &AR.AC, &AR.TTI,
	[&](Loop L) { Updater.markLoopAsDeleted(L, LoopName); }))
	return PreservedAnalyses::all();
	return getLoopPassPreservedAnalyses();
	}

	namespace {
	class LoopFlattenLegacyPass : public LoopPass {
	public:
	static char ID; // Pass ID, replacement for typeid
	LoopFlattenLegacyPass() : LoopPass(ID) {
	initializeLoopFlattenLegacyPassPass(*PassRegistry::getPassRegistry());
	}

	// Possibly flatten loop L into its child.
	bool runOnLoop(Loop *L, LPPassManager &) override;

	void getAnalysisUsage(AnalysisUsage &AU) const override {
	getLoopAnalysisUsage(AU);
	AU.addRequired<TargetTransformInfoWrapperPass>();
	AU.addPreserved<TargetTransformInfoWrapperPass>();
	AU.addRequired<AssumptionCacheTracker>();
	AU.addPreserved<AssumptionCacheTracker>();
	}
	};
	} // namespace

	char LoopFlattenLegacyPass::ID = 0;
	INITIALIZE_PASS_BEGIN(LoopFlattenLegacyPass, "loop-flatten", "Flattens loops",
	false, false)
	INITIALIZE_PASS_DEPENDENCY(LoopPass)
	INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
	INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
	INITIALIZE_PASS_END(LoopFlattenLegacyPass, "loop-flatten", "Flattens loops",
	false, false)

	Pass *llvm::createLoopFlattenPass() { return new LoopFlattenLegacyPass(); }

	bool LoopFlattenLegacyPass::runOnLoop(Loop *L, LPPassManager &LPM) {
	if (skipLoop(L))
	return false;

	if (L->getSubLoops().size() != 1)
	return false;

	ScalarEvolution *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
	LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
	auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();
	DominatorTree *DT = DTWP ? &DTWP->getDomTree() : nullptr;
	auto &TTIP = getAnalysis<TargetTransformInfoWrapperPass>();
	TargetTransformInfo TTI = &TTIP.getTTI(L->getHeader()->getParent());
	AssumptionCache *AC =
	&getAnalysis<AssumptionCacheTracker>().getAssumptionCache(
	*L->getHeader()->getParent());

	Loop InnerLoop = L->begin();
	return FlattenLoopPair(L, InnerLoop, DT, LI, SE, AC, TTI,
	[&](Loop L) { LPM.markLoopAsDeleted(L); });
	}

llvm/lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeLegacyLICMPassPass(Registry);		initializeLegacyLICMPassPass(Registry);
initializeLegacyLoopSinkPassPass(Registry);		initializeLegacyLoopSinkPassPass(Registry);
initializeLoopFuseLegacyPass(Registry);		initializeLoopFuseLegacyPass(Registry);
initializeLoopDataPrefetchLegacyPassPass(Registry);		initializeLoopDataPrefetchLegacyPassPass(Registry);
initializeLoopDeletionLegacyPassPass(Registry);		initializeLoopDeletionLegacyPassPass(Registry);
initializeLoopAccessLegacyAnalysisPass(Registry);		initializeLoopAccessLegacyAnalysisPass(Registry);
initializeLoopInstSimplifyLegacyPassPass(Registry);		initializeLoopInstSimplifyLegacyPassPass(Registry);
initializeLoopInterchangeLegacyPassPass(Registry);		initializeLoopInterchangeLegacyPassPass(Registry);
initializeLoopFlattenLegacyPassPass(Registry);
initializeLoopPredicationLegacyPassPass(Registry);		initializeLoopPredicationLegacyPassPass(Registry);
initializeLoopRotateLegacyPassPass(Registry);		initializeLoopRotateLegacyPassPass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollLegacyPassPass(Registry);		initializeLoopRerollLegacyPassPass(Registry);
initializeLoopUnrollPass(Registry);		initializeLoopUnrollPass(Registry);
initializeLoopUnrollAndJamPass(Registry);		initializeLoopUnrollAndJamPass(Registry);
initializeLoopUnswitchPass(Registry);		initializeLoopUnswitchPass(Registry);
initializeWarnMissedTransformationsLegacyPass(Registry);		initializeWarnMissedTransformationsLegacyPass(Registry);
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines
void LLVMAddLICMPass(LLVMPassManagerRef PM) {		void LLVMAddLICMPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLICMPass());		unwrap(PM)->add(createLICMPass());
}		}

void LLVMAddLoopDeletionPass(LLVMPassManagerRef PM) {		void LLVMAddLoopDeletionPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopDeletionPass());		unwrap(PM)->add(createLoopDeletionPass());
}		}

void LLVMAddLoopFlattenPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopFlattenPass());
}

void LLVMAddLoopIdiomPass(LLVMPassManagerRef PM) {		void LLVMAddLoopIdiomPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopIdiomPass());		unwrap(PM)->add(createLoopIdiomPass());
}		}

void LLVMAddLoopRotatePass(LLVMPassManagerRef PM) {		void LLVMAddLoopRotatePass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopRotatePass());		unwrap(PM)->add(createLoopRotatePass());
}		}

▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/utils/gn/secondary/llvm/lib/Transforms/Scalar/BUILD.gn

Show All 32 Lines	sources = [
"InferAddressSpaces.cpp",		"InferAddressSpaces.cpp",
"InstSimplifyPass.cpp",		"InstSimplifyPass.cpp",
"JumpThreading.cpp",		"JumpThreading.cpp",
"LICM.cpp",		"LICM.cpp",
"LoopAccessAnalysisPrinter.cpp",		"LoopAccessAnalysisPrinter.cpp",
"LoopDataPrefetch.cpp",		"LoopDataPrefetch.cpp",
"LoopDeletion.cpp",		"LoopDeletion.cpp",
"LoopDistribute.cpp",		"LoopDistribute.cpp",
"LoopFlatten.cpp",
"LoopFuse.cpp",		"LoopFuse.cpp",
"LoopIdiomRecognize.cpp",		"LoopIdiomRecognize.cpp",
"LoopInstSimplify.cpp",		"LoopInstSimplify.cpp",
"LoopInterchange.cpp",		"LoopInterchange.cpp",
"LoopLoadElimination.cpp",		"LoopLoadElimination.cpp",
"LoopPassManager.cpp",		"LoopPassManager.cpp",
"LoopPredication.cpp",		"LoopPredication.cpp",
"LoopRerollPass.cpp",		"LoopRerollPass.cpp",
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[IndVarSimplify] Add loop-flatteningAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 298615

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/LinkAllPasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/LoopFlatten.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

llvm/lib/Transforms/Scalar/LoopFlatten.cpp

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/utils/gn/secondary/llvm/lib/Transforms/Scalar/BUILD.gn

[IndVarSimplify] Add loop-flattening
AbandonedPublic