This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Loop Interchane Pass
AbandonedPublic

Authored by karthikthecool on Feb 5 2015, 5:25 AM.

Download Raw Diff

Details

Reviewers: None

Summary

Hi All,
I have been working on a loop interchange pass for llvm. The motivation is to improve the cache hit to improve performance.
Would like to get your inputs on the same.

Currently this pass only handles loop of depth 2 other loops are ignored. Goining forward we would like to fix this.
This opt is disabledby default.

LoopInterchange Pass is divided into 3 parts-

LoopInterchangeLegality
LoopInterchangeProfitability
LoopInterchangeTransform

LoopInterchangeLegality:
This class checks all the memory instructions in the loop and uses DependenceAnalysis to conclude if we can interchange the loop or not.

LoopInterchangeLegality Functions:

canInterchangeLoops - Checks if the loops can be interchanged.
checkDependence - Called by canInterchangeLoops. Does the actual DependenceAnalysis to conclude if we can interchange the loops
currentLimiations - This function marks loops are illegal due to current limitation in the way the transform is written. I intend to fix these issues.

LoopInterchangeProfitability:
This class checks if it is profitable to interchange the loops. Currently i use only 1 heuristic which is the order in which the array elements are accessed (i.e. row major/column major) and count the good and bad order. If we have bad order more than good order we interchange. We can improve the heuristics here later on.

LoopInterchangeProfitability Functions:

isProfitable - Concludes if it is profitable to interchange.
getInstrOrderCost - Calculates the array access order heuristics.

LoopInterchangeTransform:
This transforms the loop and interchanges the inner loop with the outer loop. I'm writing a loop optimization for the first time and have few doubts here.
The way we have interchanged the loop is -

Split the inner loop header and move all phi nodes into a seperate block. This will be the new outer header.
Split the loop latch at the indvar.next instruction( This may need improvement as indicated in a TODO in the code) and this will be the new outer loop latch.
Adjust the loop links by adjusting the branch instructions so that we move the inner loop outside.
Fix PHi nodes due to the step 3 as previous basicblock from which it branched will have changed.

After this step the loop is interchanged. It gives the correct results for few of the sample loops which i checked but I'm not sure if this is the right way to perform interchange transform. I suppose i will have to update the dom tree etc? Could someone please guide me if this is the right way to go forward or we need to transform in some other way?

I checked the transform with the following and some other examples it seems to interchange the loops when legal and profitable.I checked the o/p with and without the transform. They seem to be the same in the cases which i have checked.

#include <iostream>
using namespace std;
int N,M;
int A[100][100],B[100][100];
int k;
int main() {

cin >> N >> M;
for(int i=0;i<N;i++)

for(int j=0;j<M;j++)
 cin >> A[i][j];

for(int j=0;j<N;j++) {

for(int i=0;i<M;i++) {
      if(i%2)
        A[i+2][j+2] = A[i][j]+k;
      else
       A[i][j+1] = A[i][j]+k;
  }
}

for(int i=0;i<N;i++)

 for(int j=0;j<M;j++)
  cout << A[i][j] <<"\n";

return 0;

}

Running loop-interchange-

opt -basicaa -loop-interchange test.ll -o test.bc

Awaiting comments and inputs.

Thanks and Regards
Karthik Bhat

Diff Detail

Event Timeline

karthikthecool updated this revision to Diff 19400.Feb 5 2015, 5:25 AM

karthikthecool retitled this revision from to [RFC] Loop Interchane Pass.

karthikthecool updated this object.

karthikthecool edited the test plan for this revision. (Show Details)

jmolloy added a subscriber: jmolloy.Feb 5 2015, 8:17 AM

Hi Karthik,

Thanks for working on this! Loop interchange is something I think LLVM really needs. I have some high level comments on your approach.

You mention you only handle 2-deep nested loops. But actually, you can handle any level of nesting by doing pairwise swaps working from innermost nest out.
Your function to determine whether to swap or not is too simplistic. It will only catch a very small number of cases (GEPs from the IV). The obvious way to do this is to grab the SCEV for each load and store, and if it is an AddRecExpr, check the step value.
Your described algorithm for performing the swap seems fine, but the implementation is awfully complicated. Once you have split the latch, I'd have thought you could just split the PHIs from the top of each header block. This would give you a bunch of blocks with branches between them - just rearrange the branches (and perform trivial PHI updates).

Cheers,

James

lib/Transforms/Scalar/LoopInterchange.cpp
64	Loop *Inner = L.getSubLoops().back(); ?

ramshankar123 added a subscriber: ramshankar123.Feb 5 2015, 10:06 AM

Great to see this being worked on! It will help the implicit pocl work-group vectorization after the 2-level loop restricting is lifted. Related to this, be careful with the loop id metadata and especially the parallel loop metadata. The parallel loop metadata can be exploited to skip dependency analysis in some cases (At least when both of the loops are parallel? Needs to be thought through.). But if the metadata is moved to a wrong loop during the interchange it might result in a miscompilation.

include/llvm/Transforms/Scalar.h
136	Another common motivation for loop interchange is the improved utilization of SIMD instructions which your pass incidentally does too?
lib/Transforms/Scalar/LoopInterchange.cpp
50	Seems like a useful utility function, maybe could be moved to Loop?
185	If this restriction can be removed, this pass would be really useful for OpenCL C implicit work-group vectorization in pocl.
191	Typo 'Worlist'
239	Better use an explicit NULL comparison? Not sure what the LLVM convention is here.
267	Typo dependece
271	Typo dependece
273	This part needs comments (what is being checked?).
298	Comment needed (purpose of the function?).
397	Doesn't this require the loops to be perfectly nested? Is it being (implicitly) checked now?
673	Shouldn't this update the loop metadata?

Hi James,Pekka,
Thanks for the comments and spending your valuable time on this. Please find my comments.

[James] You mention you only handle 2-deep nested loops. But actually, you can handle any level of nesting by doing pairwise swaps working from  innermost nest out.

Yes i agree we can extend this to handle any level of interchange. But since i 'm writing a loop opt for the first time i was thinking to implement a level 2 interchange first and then build upon it. Will that be ok?

[James] Your function to determine whether to swap or not is too simplistic. It will only catch a very small number of cases (GEPs from the IV). The obvious way to do this is to grab the SCEV for each load and store, and if it is an AddRecExpr, check the step value.

I will look into this in more detail. It would be great if you could point out an example were the current check would fail that could have been caught using SCEV. One thing i though about was a scalar reduction variable in a loop but can't we catch it using DependencyAnalysis pass as well? I'm a bit new to Loop optmitization it would be great if you could clarify this a bit more.

[James]  Once you have split the latch, I'd have thought you could just split the PHIs from the top of each header block. This would give you a bunch of blocks with branches between them - just rearrange the branches (and perform trivial PHI updates)

I have tried to do the same in the current implementation in adjustLoopLinks after splitting innerlooplatch and inner loop header. I will try to simplify the code a bit more in the upcoming patch update.

Hi Pekka,
Thanks for the review. Please find my comments inline.

Is it ok if i move the updated patch to llvm-commits mailing list for further review?
Thank you once again for your valuable comments i will get back with an updated patch shortly.
Regards
Karthik Bhat

include/llvm/Transforms/Scalar.h
136	Yes i agree. One thing i can think of is adjusting the LoopInterchangeProfitability to keep loops of stride 1 as the innermost loop when possible which could help loop vectorizer etc.
lib/Transforms/Scalar/LoopInterchange.cpp
50	Sure we can move this utility to Loop. Will modify in my next patch update.
185	Yes i agree. The final goal is to make this pass generic for any level. But since i'm a bit new to Loop optmizations I was thinking to do this in steps first by handling loops of depth 2 and then building on it to support for any levels.
191	Will update this and share the patch shortly.
273	If we have detected an anti dependency between 2 memory instructions here we try to get the dependency distance or direction of the dependence for each loop nesting level. We can interchange any 2 levels of the loop nest only if we do not have positive distance or direction in those levels. This will ensure that rearranging is legal. I reffered https://engineering.purdue.edu/~milind/ece573/2011spring/lecture-14.pdf for legality of loop interchange transform.
298	This is current limitations in our transform because of which we are not interchanging a vaild nested loop.This will be finally removed when we completly fix the transform part.
397	Yes we need to check for perfectly nested loops in legality. I have added the code for it and will share the updated code shortly.
673	Currently we only update the successor node of the branch instruction of the loop latch. So the metadata portion should be intact. I'm not sure if we have to update the metadata portion after moving outer loop as inner loop. It would be great if you could help me out with an example were we might have to update the metadata due to reordering.

pekka.jaaskelainen added inline comments.Feb 6 2015, 5:32 AM

lib/Transforms/Scalar/LoopInterchange.cpp
273	OK. Worth adding this as a comment to the code as this is not obvious to a reader new to the loop interchange optimization.
673	OK. I didn't read the transformation part in detail so I cannot immediately tell if it breaks or not. If the loop id still points to the correct original loop, it should be OK.

rahuljain_1989 added a subscriber: rahuljain_1989.Feb 6 2015, 9:56 AM

suyog added a subscriber: suyog.Feb 6 2015, 10:31 AM

hfinkel added a subscriber: Unknown Object (MLST).Feb 6 2015, 2:03 PM

After this step the loop is interchanged. It gives the correct results for few of the sample loops which i checked but I'm not sure if this is the right way to perform interchange transform. I suppose i will have to update the dom tree etc?

The method sounds reasonable (as James pointed out, I think there might be a simpler implementation). You only have to update the domtree if you claim to preserve it (same with LoopInfo, for example).

Is it ok if i move the updated patch to llvm-commits mailing list for further review?

Yes, please do. You should have done this from the very beginning (and subscribed llvm-commits to this review, unfortunately, none of the (fairly extensive) comments here were mirrored to the list).

Also, there are a couple of test cases in the TSVC benchmark (which is in our test suite), where loop interchange would enable vectorization. It would be nice to know that we get those cases with this pass.

Hi Karthik,

I will look into this in more detail. It would be great if you could point out an example were the current check would fail that could have been caught using SCEV. One thing i though about was a scalar reduction variable in a loop but can't we catch it using DependencyAnalysis pass as well? I'm a bit new to Loop optmitization it would be great if you could clarify this a bit more.

Sure. Consider this:

int a = ...;
for (i : ...) {
  for (j : ...) {
    x[j][i+a];
  }
}

This will produce an add, then a GEP. Your code will fail to match it as it won't see through the add. Also, the add is of an unknown quantity, but SCEV knows that this is loop invariant, so SCEV can help you here.

Hi James,Hal,
Thanks for your inputs. Hal I will move this to llvm commits shortly was working on few issues which i found during testing.
James i tried the example-

 int a = 2;
 for (i=0;i<N;++i) {
  for (j=0;j<M;++j) {
    A[j][i+a] = A[j][i]+1;
  }
}

The dependency analysis returns an anti dependecy of [-2 0] between the load and store and we interchange successfully.
But if add is unknown then i assume it is not safe to interchange as we do not know for sure what dependecy exits.

Dependency Analysis seems to be internally using ScalarEvolution.
If there is an loop independent variable it is captured using |< symbol.
e.g.
In the matrix multiplication code-

for(int i=0;i<N;i++)
  for(int j=0;j<M;j++)
    for(int k=0;k<K;k++)
      A[i][j]=A[i][j]+B[i][j]*C[j][k];

we get a dependecy vector of -

consistent anti [0 0|<]

i.e. the load is independent of inner most iterator (k). So in this case we can move it to outermost as other 2 dependency are 0 (or =) and the code will be vectorize.

The reason I was using this module was because it was giving me the required direction/distance vector/matrix which i can use to decide legality/profitability of interchange.(Based on "Optimizing Compilers for Modern Architectures: A Dependence-Based Approach" by Ken Kennedy)
I think we can use ScalarEvolution to construct the direction/distance matrix as well based on your suggestion but i thought Dependency Analysis already does that for us. I will recheck.

Thanks
Karthik Bhat

Patch was moved to D7499 and committed as r231458.

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

lib/

Transforms/

Scalar/

CMakeLists.txt

1 line

LoopInterchange.cpp

700 lines

Scalar.cpp

1 line

Diff 19400

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	void initializeLiveRegMatrixPass(PassRegistry&);			void initializeLiveRegMatrixPass(PassRegistry&);
	void initializeLiveStacksPass(PassRegistry&);			void initializeLiveStacksPass(PassRegistry&);
	void initializeLiveVariablesPass(PassRegistry&);			void initializeLiveVariablesPass(PassRegistry&);
	void initializeLoaderPassPass(PassRegistry&);			void initializeLoaderPassPass(PassRegistry&);
	void initializeLocalStackSlotPassPass(PassRegistry&);			void initializeLocalStackSlotPassPass(PassRegistry&);
	void initializeLoopDeletionPass(PassRegistry&);			void initializeLoopDeletionPass(PassRegistry&);
	void initializeLoopExtractorPass(PassRegistry&);			void initializeLoopExtractorPass(PassRegistry&);
	void initializeLoopInfoWrapperPassPass(PassRegistry&);			void initializeLoopInfoWrapperPassPass(PassRegistry&);
				void initializeLoopInterchangePass(PassRegistry&);
	void initializeLoopInstSimplifyPass(PassRegistry&);			void initializeLoopInstSimplifyPass(PassRegistry&);
	void initializeLoopRotatePass(PassRegistry&);			void initializeLoopRotatePass(PassRegistry&);
	void initializeLoopSimplifyPass(PassRegistry&);			void initializeLoopSimplifyPass(PassRegistry&);
	void initializeLoopStrengthReducePass(PassRegistry&);			void initializeLoopStrengthReducePass(PassRegistry&);
	void initializeGlobalMergePass(PassRegistry&);			void initializeGlobalMergePass(PassRegistry&);
	void initializeLoopRerollPass(PassRegistry&);			void initializeLoopRerollPass(PassRegistry&);
	void initializeLoopUnrollPass(PassRegistry&);			void initializeLoopUnrollPass(PassRegistry&);
	void initializeLoopUnswitchPass(PassRegistry&);			void initializeLoopUnswitchPass(PassRegistry&);
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createInstructionCombiningPass();		(void) llvm::createInstructionCombiningPass();
(void) llvm::createInternalizePass();		(void) llvm::createInternalizePass();
(void) llvm::createJumpInstrTableInfoPass();		(void) llvm::createJumpInstrTableInfoPass();
(void) llvm::createJumpInstrTablesPass();		(void) llvm::createJumpInstrTablesPass();
(void) llvm::createLCSSAPass();		(void) llvm::createLCSSAPass();
(void) llvm::createLICMPass();		(void) llvm::createLICMPass();
(void) llvm::createLazyValueInfoPass();		(void) llvm::createLazyValueInfoPass();
(void) llvm::createLoopExtractorPass();		(void) llvm::createLoopExtractorPass();
		(void) llvm::createLoopInterchangePass();
(void) llvm::createLoopSimplifyPass();		(void) llvm::createLoopSimplifyPass();
(void) llvm::createLoopStrengthReducePass();		(void) llvm::createLoopStrengthReducePass();
(void) llvm::createLoopRerollPass();		(void) llvm::createLoopRerollPass();
(void) llvm::createLoopUnrollPass();		(void) llvm::createLoopUnrollPass();
(void) llvm::createLoopUnswitchPass();		(void) llvm::createLoopUnswitchPass();
(void) llvm::createLoopIdiomPass();		(void) llvm::createLoopIdiomPass();
(void) llvm::createLoopRotatePass();		(void) llvm::createLoopRotatePass();
(void) llvm::createLowerExpectIntrinsicPass();		(void) llvm::createLowerExpectIntrinsicPass();
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	FunctionPass *createInstructionCombiningPass();			FunctionPass *createInstructionCombiningPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LICM - This pass is a loop invariant code motion and memory promotion pass.			// LICM - This pass is a loop invariant code motion and memory promotion pass.
	//			//
	Pass *createLICMPass();			Pass *createLICMPass();


				//===----------------------------------------------------------------------===//
				//
				// LoopInterchange - This pass is interchanges loops to give better cache hits.
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Another common motivation for loop interchange is the improved utilization of SIMD instructions which your pass incidentally does too? pekka.jaaskelainen: Another common motivation for loop interchange is the improved utilization of SIMD instructions…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes i agree. One thing i can think of is adjusting the LoopInterchangeProfitability to keep loops of stride 1 as the innermost loop when possible which could help loop vectorizer etc. karthikthecool: Yes i agree. One thing i can think of is adjusting the LoopInterchangeProfitability to keep…
				//
				Pass *createLoopInterchangePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopStrengthReduce - This pass is strength reduces GEP instructions that use			// LoopStrengthReduce - This pass is strength reduces GEP instructions that use
	// a loop's canonical induction variable as one of their indices.			// a loop's canonical induction variable as one of their indices.
	//			//
	Pass *createLoopStrengthReducePass();			Pass *createLoopStrengthReducePass();

	Pass createGlobalMergePass(const TargetMachine TM = nullptr);			Pass createGlobalMergePass(const TargetMachine TM = nullptr);
	▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

lib/Transforms/Scalar/CMakeLists.txt

Show All 11 Lines	add_llvm_library(LLVMScalarOpts
InductiveRangeCheckElimination.cpp		InductiveRangeCheckElimination.cpp
IndVarSimplify.cpp		IndVarSimplify.cpp
JumpThreading.cpp		JumpThreading.cpp
LICM.cpp		LICM.cpp
LoadCombine.cpp		LoadCombine.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
		LoopInterchange.cpp
LoopRerollPass.cpp		LoopRerollPass.cpp
LoopRotation.cpp		LoopRotation.cpp
LoopStrengthReduce.cpp		LoopStrengthReduce.cpp
LoopUnrollPass.cpp		LoopUnrollPass.cpp
LoopUnswitch.cpp		LoopUnswitch.cpp
LowerAtomic.cpp		LowerAtomic.cpp
LowerExpectIntrinsic.cpp		LowerExpectIntrinsic.cpp
MemCpyOptimizer.cpp		MemCpyOptimizer.cpp
Show All 18 Lines

lib/Transforms/Scalar/LoopInterchange.cpp

				//===- LoopInterchange.cpp - Loop interchange pass
				//--------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This Pass handles loop interchange transform. This interchanges the inner
				// and outer loop if interchanging can result in better cache hits.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/AliasSetTracker.h"
				#include "llvm/Analysis/AssumptionCache.h"
				#include "llvm/Analysis/BlockFrequencyInfo.h"
				#include "llvm/Analysis/CodeMetrics.h"
				#include "llvm/Analysis/DependenceAnalysis.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/LoopIterator.h"
				#include "llvm/Analysis/LoopPass.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/ScalarEvolutionExpander.h"
				#include "llvm/Analysis/ScalarEvolutionExpressions.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Transforms/Utils/SSAUpdater.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				#define DEBUG_TYPE "loop-interchange"

				namespace {

				typedef std::pair<Loop , Loop > LoopPair;

				unsigned getInnerLoopCount(Loop &L, unsigned level) {
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Seems like a useful utility function, maybe could be moved to Loop? pekka.jaaskelainen: Seems like a useful utility function, maybe could be moved to Loop?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Sure we can move this utility to Loop. Will modify in my next patch update. karthikthecool: Sure we can move this utility to Loop. Will modify in my next patch update.
				int lev = 0;
				if (L.empty())
				return level + 1;
				for (Loop *InnerL : L)
				lev = getInnerLoopCount(*InnerL, level + 1);
				return lev;
				}

				static void populateWorklist(Loop &L, SmallVector<LoopPair, 8> &V) {
				// TODO: Currently only handled for loops depth of 2.
				// Also handle loop depth of 2 more appropriatly.
				if (getInnerLoopCount(L, 0) == 2) {
				Loop *Inner;
				for (Loop *InnerL : L)
				jmolloyUnsubmitted Not Done Reply Inline Actions Loop Inner = L.getSubLoops().back(); ? jmolloy:* Loop *Inner = L.getSubLoops().back(); ?
				Inner = InnerL;
				V.push_back(std::make_pair(&L, Inner));
				}
				}

				/// LoopInterchangeLegality checks if it is legal to interchange the loop.
				class LoopInterchangeLegality {
				public:
				LoopInterchangeLegality(Loop Outer, Loop Inner, ScalarEvolution *SE,
				DependenceAnalysis *DA)
				: OuterLoop(Outer), InnerLoop(Inner), SE(SE), DA(DA) {}

				/// Check if the loops can be interchanged.
				bool canInterchangeLoops();

				bool currentLimitations();

				private:
				bool checkDependence(Loop Outer, DependenceAnalysis DA);
				bool tightlyNested(Loop Outer, Loop Inner);

				Loop *OuterLoop;
				Loop *InnerLoop;

				/// Scev analysis.
				ScalarEvolution *SE;
				/// Dependence analysis.
				DependenceAnalysis *DA;
				};

				/// LoopInterchangeProfitability checks if it is profitable to interchange the
				/// loop.
				class LoopInterchangeProfitability {
				public:
				LoopInterchangeProfitability(Loop Outer, Loop Inner, ScalarEvolution *SE)
				: OuterLoop(Outer), InnerLoop(Inner), SE(SE) {}

				/// Check if the loop interchange is profitable
				bool isProfitable();

				private:
				int getInstrOrderCost(PHINode *IV);

				Loop *OuterLoop;
				Loop *InnerLoop;

				/// Scev analysis.
				ScalarEvolution *SE;
				};

				/// LoopInterchangeTransform interchanges the loop
				class LoopInterchangeTransform {
				public:
				LoopInterchangeTransform(Loop Outer, Loop Inner, ScalarEvolution *SE,
				LoopInfo LI, DominatorTree DT)
				: outerLoop(Outer), innerLoop(Inner), SE(SE), LI(LI), DT(DT) {
				initialize();
				}

				/// Interchange OuterLoop and InnerLoop.
				bool transform();
				void initialize();

				private:
				void splitInnerLoopLatch(DominatorTree , LoopInfo , Instruction *);
				void splitInnerLoopHeader(DominatorTree , LoopInfo );
				bool adjustLoopLinks(DominatorTree *DT);
				bool adjustOuterLoopPreheader();
				bool adjustInnerLoopPreheader();

				Loop *outerLoop;
				Loop *innerLoop;
				BasicBlock *innerLoopHeader;
				BasicBlock *outerLoopHeader;
				BasicBlock *innerLoopLatch;
				BasicBlock *outerLoopLatch;
				BasicBlock *outerLoopPreHeader;
				BasicBlock *innerLoopPreHeader;
				PHINode *innerIndexVar;
				PHINode *outerIndexVar;
				BasicBlock *outerLoopSuccessor;
				// Instruction* innerIndexVarInc;
				BasicBlock *innerLoopLatchPred;
				BasicBlock *innerLoopHeaderSucc;
				std::vector<std::pair<Loop , Loop >> interchangedLoops;
				/// Scev analysis.
				ScalarEvolution *SE;
				LoopInfo *LI;
				DominatorTree *DT;
				};

				// Main LoopInterchange Pass
				struct LoopInterchange : public FunctionPass {
				static char ID;
				ScalarEvolution *SE;
				LoopInfo *LI;
				DependenceAnalysis *DA;
				DominatorTree *DT;
				LoopInterchange()
				: FunctionPass(ID), SE(nullptr), LI(nullptr), DA(nullptr), DT(nullptr) {
				initializeLoopInterchangePass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequiredID(LCSSAID);
				AU.addRequired<ScalarEvolution>();
				AU.addRequired<AliasAnalysis>();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<LoopInfoWrapperPass>();
				AU.addRequired<DependenceAnalysis>();
				}

				bool runOnFunction(Function &F) override {
				SE = &getAnalysis<ScalarEvolution>();
				LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
				DA = &getAnalysis<DependenceAnalysis>();
				auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();
				DT = DTWP ? &DTWP->getDomTree() : nullptr;
				// Build up a worklist of loop pairs to analyze.
				// [TODO] Currently only supports loop with level 2.
				// Handle for loops greater than level 2.
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions If this restriction can be removed, this pass would be really useful for OpenCL C implicit work-group vectorization in pocl. pekka.jaaskelainen: If this restriction can be removed, this pass would be really useful for OpenCL C implicit work…
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes i agree. The final goal is to make this pass generic for any level. But since i'm a bit new to Loop optmizations I was thinking to do this in steps first by handling loops of depth 2 and then building on it to support for any levels. karthikthecool: Yes i agree. The final goal is to make this pass generic for any level. But since i'm a bit…
				SmallVector<LoopPair, 8> Worklist;

				for (Loop L : LI)
				populateWorklist(*L, Worklist);

				DEBUG(dbgs() << "Worlist size = " << Worklist.size() << "\n");
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Typo 'Worlist' pekka.jaaskelainen: Typo 'Worlist'
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Will update this and share the patch shortly. karthikthecool: Will update this and share the patch shortly.

				bool Changed = false;
				while (!Worklist.empty())
				Changed \|= processLoop(Worklist.pop_back_val());

				return Changed;
				}

				bool processLoop(LoopPair P) {

				// Check if it is legal to interchange loop
				LoopInterchangeLegality LIL(P.first, P.second, SE, DA);
				if (!LIL.canInterchangeLoops()) {
				DEBUG(dbgs() << "Not interchanging Loops. Cannot prove legality\n");
				return false;
				}

				DEBUG(dbgs() << "Loops are legal to interchange\n");

				LoopInterchangeProfitability LIP(P.first, P.second, SE);
				if (!LIP.isProfitable()) {
				DEBUG(dbgs() << "Interchanging Loops not profitable\n");
				return false;
				}

				LoopInterchangeTransform LIT(P.first, P.second, SE, LI, DT);
				LIT.transform();
				DEBUG(dbgs() << "Loops interchanged\n");
				return true;
				}
				};

				} // end of namespace

				bool LoopInterchangeLegality::checkDependence(Loop *Outer,
				DependenceAnalysis *DA) {

				typedef SmallVector<Value *, 16> ValueVector;
				// Holds Load and Store instructions.
				ValueVector MemInstr;
				// For each block.
				for (Loop::block_iterator bb = Outer->block_begin(), be = Outer->block_end();
				bb != be; ++bb) {
				// Scan the BB and collect legal loads and stores.
				for (BasicBlock::iterator it = (bb)->begin(), e = (bb)->end(); it != e;
				++it) {
				Instruction *I = dyn_cast<Instruction>(it);
				if (!I)
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Better use an explicit NULL comparison? Not sure what the LLVM convention is here. pekka.jaaskelainen: Better use an explicit NULL comparison? Not sure what the LLVM convention is here.
				return false;
				LoadInst *Ld = dyn_cast<LoadInst>(it);
				StoreInst *St = dyn_cast<StoreInst>(it);
				if (!St && !Ld)
				continue;
				if (Ld && !Ld->isSimple())
				return false;
				if (St && !St->isSimple())
				return false;
				MemInstr.push_back(I);
				}
				}

				DEBUG(dbgs() << "Found " << MemInstr.size()
				<< " Loads and stores to analyze\n");

				ValueVector::iterator I, IE, J, JE;
				for (I = MemInstr.begin(), IE = MemInstr.end(); I != IE; ++I) {
				for (J = I, JE = MemInstr.end(); J != JE; ++J) {
				Instruction Src = dyn_cast<Instruction>(I);
				Instruction Des = dyn_cast<Instruction>(J);
				if (Src == Des)
				continue;
				if (auto D = DA->depends(Src, Des, true)) {
				// TODO: Fix his handle only anti/output dep for now.
				if (D->isFlow()) {
				// TODO: Flow dependency can be interchanged??
				DEBUG(dbgs() << "Flow dependece not handled");
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Typo dependece pekka.jaaskelainen: Typo dependece
				return false;
				}
				if (D->isAnti()) {
				DEBUG(dbgs() << "Found Anti dependece \n");
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Typo dependece pekka.jaaskelainen: Typo dependece
				unsigned Levels = D->getLevels();
				for (unsigned II = 1; II <= Levels; ++II) {
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions This part needs comments (what is being checked?). pekka.jaaskelainen: This part needs comments (what is being checked?).
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions If we have detected an anti dependency between 2 memory instructions here we try to get the dependency distance or direction of the dependence for each loop nesting level. We can interchange any 2 levels of the loop nest only if we do not have positive distance or direction in those levels. This will ensure that rearranging is legal. I reffered https://engineering.purdue.edu/~milind/ece573/2011spring/lecture-14.pdf for legality of loop interchange transform. karthikthecool: If we have detected an anti dependency between 2 memory instructions here we try to get the…
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions OK. Worth adding this as a comment to the code as this is not obvious to a reader new to the loop interchange optimization. pekka.jaaskelainen: OK. Worth adding this as a comment to the code as this is not obvious to a reader new to the…
				const SCEV *Distance = D->getDistance(II);
				const SCEVConstant *SCEVConst =
				dyn_cast_or_null<SCEVConstant>(Distance);
				if (SCEVConst) {
				const ConstantInt *CI = SCEVConst->getValue();
				if (!CI \|\| (!CI->isNegative() && !CI->isZeroValue()))
				return false;
				} else {
				unsigned Direction = D->getDirection(II);
				if (Direction == Dependence::DVEntry::LT \|\|
				Direction == Dependence::DVEntry::LE \|\|
				Direction == Dependence::DVEntry::EQ)
				continue;
				return false;
				}
				}
				}
				}
				}
				}

				return true;
				}

				bool LoopInterchangeLegality::currentLimitations() {
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Comment needed (purpose of the function?). pekka.jaaskelainen: Comment needed (purpose of the function?).
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions This is current limitations in our transform because of which we are not interchanging a vaild nested loop.This will be finally removed when we completly fix the transform part. karthikthecool: This is current limitations in our transform because of which we are not interchanging a vaild…

				BasicBlock *innerLoopHeader = InnerLoop->getHeader();
				BasicBlock *outerLoopLatch = OuterLoop->getLoopLatch();
				BasicBlock *innerLoopLatch = InnerLoop->getLoopLatch();
				BasicBlock *outerLoopHeader = OuterLoop->getHeader();
				BasicBlock *ExitBlock;
				unsigned userCount = 0;
				int PhiCount = 0;
				PHINode *PHI;

				for (auto I = innerLoopHeader->begin(), E = innerLoopHeader->end(); I != E;
				++I) {
				if (isa<PHINode>(I)) {
				PHI = dyn_cast<PHINode>(I);
				PhiCount++;
				}
				if (PhiCount > 1)
				return true;
				}
				// TODO: Current limitation: Since we split the inner loop latch at the point
				// were
				// induction variable is incremented (induction.next); We cannot have more
				// than 1 user of induction.next since it would result in broken code after
				// split.
				// e.g.
				// for(i=0;i<N;i++) {
				// for(j = 0;j<M;j++) {
				// A[j+1][i+2] = A[j][i]+k;
				// }
				// }
				Instruction *innerIndexVarInc = nullptr;
				innerIndexVarInc = dyn_cast<Instruction>(PHI->getIncomingValue(1));
				for (auto UI = innerIndexVarInc->user_begin(),
				UE = innerIndexVarInc->user_end();
				UI != UE; ++UI) {
				Instruction II = dyn_cast<Instruction>(UI);
				BasicBlock *BB = II->getParent();
				if (BB == innerLoopLatch)
				userCount++;
				if (userCount > 1)
				return true;
				}

				// TODO: Current limitation: LCSSA PHI nodes not handled yet in transform
				// return failure.
				BranchInst *BI = dyn_cast<BranchInst>(outerLoopLatch->getTerminator());
				if (!BI)
				return true;

				if (BI->getSuccessor(0) == outerLoopHeader)
				ExitBlock = BI->getSuccessor(1);
				else
				ExitBlock = BI->getSuccessor(0);

				// We have an lcssa phi node return as current limitation
				if (isa<PHINode>(ExitBlock->begin()))
				return true;

				return false;
				}

				bool LoopInterchangeLegality::canInterchangeLoops() {

				// We must have a loop in canonical form. Loops with indirectbr in them cannot
				// be canonicalized.
				if (!OuterLoop->getLoopPreheader() \|\| !InnerLoop->getLoopPreheader()) {
				DEBUG(dbgs() << "loop control flow is not understood");
				return false;
				}

				// ScalarEvolution needs to be able to find the exit count.
				const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(OuterLoop);
				const SCEV *ExitCountInner = SE->getBackedgeTakenCount(InnerLoop);
				if (ExitCountOuter == SE->getCouldNotCompute() \|\|
				ExitCountInner == SE->getCouldNotCompute()) {
				DEBUG(dbgs() << "Could not determine number of loop iterations\n");
				return false;
				}

				// We must have a single backedge.
				if (OuterLoop->getNumBackEdges() != 1 \|\| InnerLoop->getNumBackEdges() != 1) {
				DEBUG(dbgs() << "loop control flow is not understood by vectorizer");
				return false;
				}

				// We must have a single exiting block.
				if (!OuterLoop->getExitingBlock() \|\| !InnerLoop->getExitingBlock()) {
				DEBUG(dbgs() << "loop control flow is not understood by vectorizer");
				return false;
				}

				// TODO: The loops could not be interchanged due to current limitations in the
				// transform module.
				if (currentLimitations()) {
				DEBUG(dbgs() << "Not legal because of current transform limitation\n");
				return false;
				}

				// TODO: Do Additional checks to see if the Loop is valid before jumping into
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Doesn't this require the loops to be perfectly nested? Is it being (implicitly) checked now? pekka.jaaskelainen: Doesn't this require the loops to be perfectly nested? Is it being (implicitly) checked now?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Yes we need to check for perfectly nested loops in legality. I have added the code for it and will share the updated code shortly. karthikthecool: Yes we need to check for perfectly nested loops in legality. I have added the code for it and…
				// checkDependence.

				return checkDependence(OuterLoop, DA);
				}

				int LoopInterchangeProfitability::getInstrOrderCost(PHINode *IV) {
				unsigned goodOrder, badOrder;
				badOrder = goodOrder = 0;
				for (auto IB = IV->user_begin(), IE = IV->user_end(); IB != IE; ++IB) {
				Instruction UseInstr = cast<Instruction>(IB);
				if (isa<GetElementPtrInst>(UseInstr)) {
				GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(UseInstr);
				if (GEP->getOperand(GEP->getNumOperands() - 1) == IV)
				goodOrder += 1;
				else
				badOrder += 1;
				}
				}
				return goodOrder - badOrder;
				}

				bool LoopInterchangeProfitability::isProfitable() {
				// TODO: Add Better Profitibility Checks
				// e.g
				// 1) The outer loop is small and inner loop is large dont interchange.
				// 2) If reordering results in inner loop having stride of 1 etc.

				int Cost = 0;
				for (BasicBlock::iterator I = InnerLoop->getHeader()->begin();
				isa<PHINode>(I); ++I) {
				Cost += getInstrOrderCost(cast<PHINode>(I));
				}
				DEBUG(dbgs() << "Cost = " << Cost << "\n");
				if (Cost < 0)
				return true;
				return false;
				}

				bool LoopInterchangeTransform::transform() {
				DEBUG(dbgs() << "transform\n");
				bool transformed = false;

				for (BasicBlock::iterator I = innerLoop->getHeader()->begin();
				isa<PHINode>(I); ++I) {
				innerIndexVar = dyn_cast<PHINode>(I);
				break;
				}

				Instruction *innerIndexVarInc = nullptr;
				innerIndexVarInc = dyn_cast<Instruction>(innerIndexVar->getIncomingValue(1));
				if (!innerIndexVarInc)
				return false;

				// Split at the place were the induction variable is incremented/decremented.
				// TODO: This splitting logic may not work always. Fix this.
				splitInnerLoopLatch(DT, LI, innerIndexVarInc);
				DEBUG(dbgs() << "splitInnerLoopLatch Done\n");
				splitInnerLoopHeader(DT, LI);
				DEBUG(dbgs() << "splitInnerLoopHeader Done\n");

				transformed \|= adjustLoopLinks(DT);
				if (!transformed)
				DEBUG(dbgs() << "adjustLoopLinks Failed\n");
				return true;
				}

				void LoopInterchangeTransform::initialize() {
				innerLoopHeader = innerLoop->getHeader();
				outerLoopHeader = outerLoop->getHeader();
				innerLoopLatch = innerLoop->getLoopLatch();
				outerLoopLatch = outerLoop->getLoopLatch();
				outerLoopPreHeader = outerLoop->getLoopPreheader();
				innerLoopPreHeader = innerLoop->getLoopPreheader();
				innerIndexVar = innerLoop->getCanonicalInductionVariable();
				outerIndexVar = outerLoop->getCanonicalInductionVariable();
				}

				void LoopInterchangeTransform::splitInnerLoopLatch(DominatorTree *DT,
				LoopInfo *LI,
				Instruction *inc) {
				BasicBlock::iterator I = innerLoopLatch->begin();
				BasicBlock::iterator E = innerLoopLatch->end();
				for (; I != E; ++I) {
				if (inc == I)
				break;
				}

				// Split the inner loop latch out.
				innerLoopLatchPred = innerLoopLatch;
				innerLoopLatch = innerLoopLatch->splitBasicBlock(I);
				innerLoop->addBasicBlockToLoop(innerLoopLatch, *LI);
				}

				void LoopInterchangeTransform::splitInnerLoopHeader(DominatorTree *DT,
				LoopInfo *LI) {

				// Split the inner loop header out.
				innerLoopHeaderSucc =
				innerLoopHeader->splitBasicBlock(innerLoopHeader->getFirstNonPHI());

				DEBUG(dbgs() << "Output of splitInnerLoopHeader innerLoopHeaderSucc & "
				"innerLoopHeader \n");
				}

				bool LoopInterchangeTransform::adjustOuterLoopPreheader() {
				// Adjust the outerLoop preheader to jump to inner loop preheader.
				BranchInst *outerLoopHeaderBI =
				dyn_cast<BranchInst>(outerLoopHeader->getTerminator());
				BranchInst *outerLoopPreHeaderBI =
				dyn_cast<BranchInst>(outerLoopPreHeader->getTerminator());
				BasicBlock *TrueBlock;
				BasicBlock *FalseBlock;
				if (outerLoopHeaderBI->isUnconditional()) {
				BranchInst::Create(innerLoopPreHeader, outerLoopPreHeaderBI);
				} else {
				if (outerLoopHeaderBI->getSuccessor(0) == innerLoopPreHeader) {
				TrueBlock = innerLoopPreHeader;
				BranchInst *ExitInst =
				dyn_cast<BranchInst>(outerLoopLatch->getTerminator());
				if (!ExitInst)
				return false;
				if (ExitInst->isUnconditional()) {
				FalseBlock = dyn_cast<BasicBlock>(ExitInst->getSuccessor(0));
				} else {
				if (ExitInst->getSuccessor(0) == outerLoopHeader) {
				FalseBlock = dyn_cast<BasicBlock>(ExitInst->getSuccessor(1));
				} else if (ExitInst->getSuccessor(1) == outerLoopHeader) {
				FalseBlock = dyn_cast<BasicBlock>(ExitInst->getSuccessor(0));
				} else {
				return false;
				}
				}
				} else if (outerLoopHeaderBI->getSuccessor(1) == innerLoopPreHeader) {
				FalseBlock = innerLoopPreHeader;
				BranchInst *ExitInst =
				dyn_cast<BranchInst>(outerLoopLatch->getTerminator());
				if (!ExitInst)
				return false;
				if (ExitInst->isUnconditional()) {
				BasicBlock *B = dyn_cast<BasicBlock>(ExitInst->getSuccessor(0));
				outerLoopHeaderBI->setSuccessor(0, B);
				} else {
				if (ExitInst->getSuccessor(0) == outerLoopHeader) {
				TrueBlock = dyn_cast<BasicBlock>(ExitInst->getSuccessor(1));
				} else if (ExitInst->getSuccessor(1) == outerLoopHeader) {
				TrueBlock = dyn_cast<BasicBlock>(ExitInst->getSuccessor(0));
				} else {
				return false;
				}
				}
				}
				BranchInst::Create(TrueBlock, FalseBlock, outerLoopHeaderBI->getCondition(),
				outerLoopPreHeaderBI);
				}
				outerLoopPreHeaderBI->eraseFromParent();
				return true;
				}

				bool LoopInterchangeTransform::adjustInnerLoopPreheader() {

				SmallVector<PHINode *, 8> LoopPhis;
				SmallVector<Instruction *, 8> Instr;
				BranchInst *innerPreheaderBI =
				dyn_cast<BranchInst>(innerLoopPreHeader->getTerminator());
				BranchInst *innerHeaderBI =
				dyn_cast<BranchInst>(innerLoopHeader->getTerminator());
				BranchInst *outerHeaderBI =
				dyn_cast<BranchInst>(outerLoopHeader->getTerminator());
				BranchInst *innerLoopLatchPredBI =
				dyn_cast<BranchInst>(innerLoopLatchPred->getTerminator());
				BranchInst *outerLoopLatchBI =
				dyn_cast<BranchInst>(outerLoopLatch->getTerminator());
				BranchInst *innerLoopLatchBI =
				dyn_cast<BranchInst>(innerLoopLatch->getTerminator());
				BranchInst *innerLoopHeaderSuccBI =
				dyn_cast<BranchInst>(innerLoopHeaderSucc->getTerminator());

				BasicBlock *LoopNestExit;
				if (!innerPreheaderBI \|\| !innerHeaderBI \|\| !innerPreheaderBI \|\|
				!innerLoopLatchPredBI \|\| !outerLoopLatchBI \|\| !innerLoopLatchBI \|\|
				!innerLoopHeaderSuccBI)
				return false;
				BranchInst::Create(innerLoopHeader, innerPreheaderBI);
				innerPreheaderBI->eraseFromParent();

				BranchInst::Create(outerLoopHeader, innerHeaderBI);
				innerHeaderBI->eraseFromParent();

				// Update outerLoopHeader PHI nodes
				// Collect all Phi nodes and instructions in the outer loop header.
				for (auto I = outerLoopHeader->begin(), E = outerLoopHeader->end(); I != E;
				++I) {
				LoopPhis.push_back(cast<PHINode>(I));
				}

				// Adjust Phi nodes for the outer loop header which now becomes inner loop
				// header
				while (!LoopPhis.empty()) {
				PHINode *CurrIV = LoopPhis.pop_back_val();
				unsigned numIncomingBlocks = CurrIV->getNumOperands();
				for (unsigned i = 0; i < numIncomingBlocks; ++i) {
				if (CurrIV->getIncomingBlock(i) == outerLoopPreHeader) {
				CurrIV->setIncomingBlock(i, innerLoopHeader);
				}
				}
				}

				// TODO: Check correctness of this. What if preheaders data is used in phi
				// nodes in inner header?
				for (auto I = innerLoopPreHeader->begin(), E = innerLoopPreHeader->end();
				I != E; ++I) {
				Instruction *Ins = I;
				if (isa<BranchInst>(Ins))
				break;
				Instr.push_back(I);
				}

				for (auto I = Instr.begin(), E = Instr.end(); I != E; ++I) {
				Instruction Ins = I;
				Ins->moveBefore(outerLoopHeader->getTerminator());
				}

				if (outerHeaderBI->getSuccessor(0) == innerLoopPreHeader) {
				outerHeaderBI->setSuccessor(0, innerLoopHeaderSucc);
				if (!outerHeaderBI->isUnconditional())
				outerHeaderBI->setSuccessor(1, innerLoopLatch);
				} else {
				if (outerHeaderBI->getSuccessor(1) == innerLoopPreHeader) {
				outerHeaderBI->setSuccessor(1, innerLoopHeaderSucc);
				outerHeaderBI->setSuccessor(0, innerLoopLatch);
				} else {
				assert(0);
				}
				}

				if (innerLoopHeaderSuccBI->getSuccessor(0) == innerLoopLatch) {
				innerLoopHeaderSuccBI->setSuccessor(0, outerLoopLatch);
				} else if (!innerLoopHeaderSuccBI->isUnconditional() &&
				innerLoopHeaderSuccBI->getSuccessor(1) == innerLoopLatch) {
				innerLoopHeaderSuccBI->setSuccessor(1, outerLoopLatch);
				}

				// TODO: Update phi nodes if any in innerLoopLatch

				if (innerLoopLatchPredBI->getSuccessor(0) == innerLoopLatch) {
				innerLoopLatchPredBI->setSuccessor(0, outerLoopLatch);
				} else if (innerLoopLatchPredBI->getSuccessor(1) == innerLoopLatch) {
				innerLoopLatchPredBI->setSuccessor(1, outerLoopLatch);
				} else {
				assert(0);
				}

				// TODO: Update phi nodes if any in outerLoopLatch

				// Connect outerLoopLatch to inner loop latch.
				if (outerLoopLatchBI->getSuccessor(0) == outerLoopHeader) {
				LoopNestExit = outerLoopLatchBI->getSuccessor(1);
				outerLoopLatchBI->setSuccessor(1, innerLoopLatch);
				} else if (outerLoopLatchBI->getSuccessor(1) == outerLoopHeader) {
				LoopNestExit = outerLoopLatchBI->getSuccessor(0);
				outerLoopLatchBI->setSuccessor(0, innerLoopLatch);
				} else {
				assert(0);
				}

				// TODO: Handle LCSSA Phi nodes

				if (innerLoopLatchBI->getSuccessor(0) == innerLoopHeader) {
				innerLoopLatchBI->setSuccessor(1, LoopNestExit);
				} else if (innerLoopLatchBI->getSuccessor(1) == innerLoopHeader) {
				innerLoopLatchBI->setSuccessor(0, LoopNestExit);
				} else {
				assert(0);
				}

				return true;
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions Shouldn't this update the loop metadata? pekka.jaaskelainen: Shouldn't this update the loop metadata?
				karthikthecoolAuthorUnsubmitted Not Done Reply Inline Actions Currently we only update the successor node of the branch instruction of the loop latch. So the metadata portion should be intact. I'm not sure if we have to update the metadata portion after moving outer loop as inner loop. It would be great if you could help me out with an example were we might have to update the metadata due to reordering. karthikthecool: Currently we only update the successor node of the branch instruction of the loop latch. So the…
				pekka.jaaskelainenUnsubmitted Not Done Reply Inline Actions OK. I didn't read the transformation part in detail so I cannot immediately tell if it breaks or not. If the loop id still points to the correct original loop, it should be OK. pekka.jaaskelainen: OK. I didn't read the transformation part in detail so I cannot immediately tell if it breaks…
				}

				bool LoopInterchangeTransform::adjustLoopLinks(DominatorTree *DT) {

				DEBUG(dbgs() << "adjustLoopLinks called\n");
				// Adjust the outerLoop preheader to jump to inner loop preheader.
				adjustOuterLoopPreheader();
				adjustInnerLoopPreheader();

				return true;
				}

				char LoopInterchange::ID = 0;
				INITIALIZE_PASS_BEGIN(LoopInterchange, "loop-interchange",
				"Interchanges loops for cache reuse", false, false)
				INITIALIZE_AG_DEPENDENCY(AliasAnalysis)
				INITIALIZE_PASS_DEPENDENCY(DependenceAnalysis)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)
				INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
				INITIALIZE_PASS_DEPENDENCY(LCSSA)
				INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)

				INITIALIZE_PASS_END(LoopInterchange, "loop-interchange",
				"Interchanges loops for cache reuse", false, false)

				Pass *llvm::createLoopInterchangePass() { return new LoopInterchange(); }

lib/Transforms/Scalar/Scalar.cpp

Show All 40 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeEarlyCSELegacyPassPass(Registry);		initializeEarlyCSELegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyPass(Registry);		initializeIndVarSimplifyPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLICMPass(Registry);		initializeLICMPass(Registry);
initializeLoopDeletionPass(Registry);		initializeLoopDeletionPass(Registry);
initializeLoopInstSimplifyPass(Registry);		initializeLoopInstSimplifyPass(Registry);
		initializeLoopInterchangePass(Registry);
initializeLoopRotatePass(Registry);		initializeLoopRotatePass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollPass(Registry);		initializeLoopRerollPass(Registry);
initializeLoopUnrollPass(Registry);		initializeLoopUnrollPass(Registry);
initializeLoopUnswitchPass(Registry);		initializeLoopUnswitchPass(Registry);
initializeLoopIdiomRecognizePass(Registry);		initializeLoopIdiomRecognizePass(Registry);
initializeLowerAtomicPass(Registry);		initializeLowerAtomicPass(Registry);
initializeLowerExpectIntrinsicPass(Registry);		initializeLowerExpectIntrinsicPass(Registry);
▲ Show 20 Lines • Show All 171 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Loop Interchane PassAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 19400

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/LoopInterchange.cpp

lib/Transforms/Scalar/Scalar.cpp

[RFC] Loop Interchane Pass
AbandonedPublic