This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
12/19
LoopInterchange.cpp
-
test/Transforms/LoopInterchange/
-
Transforms/
-
LoopInterchange/
1
inner-only-reductions.ll

Differential D144226

[Loop-Interchange] Allow inner-loop only reductions
Needs ReviewPublic

Authored by quic_aankit on Feb 16 2023, 3:30 PM.

Download Raw Diff

Details

Reviewers

fhahn
efriedma
Meinersbur
congzhe
bmahjour
kparzysz

Summary

Motivating example:

    for (j = 0; j < n; j++) {
        for (k = 0; k < l; k++) {
            z[j] += x[k]*y[n*k+j];
        }
    }

In the code above we should be able to use LoopInterchange to interchange the
j and k loop as it allows the innermost loop to be vectorized. However in the
current state the LICM promotes z[j] to scalar effective changing the code to:

    for (j = 0; j < n; j++) {
        int tmp = z[j];
        for (k = 0; k < l; k++) {
            tmp += x[k]*y[n*k+j];
        }
        z[j] = tmp;
    }

After LICM the loops are not tightly nested and hence LoopInterchange cannot
work.

The patch aims to look for removablePHIs in the InnerLoop Header
and sink the load and hoist store of LICM promoted variables if they
help in loop-interchange.

Used lit-test from fhahn's patch here: https://reviews.llvm.org/D53027?vs=on&id=168822#toc

Diff Detail

Unit TestsFailed

	Time	Test
	70 ms	x64 debian > Flang.Lower/OpenACC::acc-reduction.f90

Event Timeline

quic_aankit created this revision.Feb 16 2023, 3:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 3:30 PM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

quic_aankit requested review of this revision.Feb 16 2023, 3:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 3:30 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

@fhahn I used the lit-test from your patch from here: https://reviews.llvm.org/D53027?vs=on&id=168822#toc

quic_aankit retitled this revision from Allow inner-loop only reductions to [Loop-Interchange] Allow inner-loop only reductions.Feb 16 2023, 3:36 PM

quic_aankit edited the summary of this revision. (Show Details)Feb 16 2023, 3:40 PM

Harbormaster completed remote builds in B214281: Diff 498171.Feb 16 2023, 5:36 PM

Ping

@fhahn Ping!

@fhahn @eli.friedman Can you please review this patch?

Adding more reviewers. (Better for someone who's spent more time with this pass recently to look.)

Ping! Thank you @efriedma for adding more reviewers

Ping!

@fhahn @efriedma @Meinersbur @congzhe @bmahjour @kparzysz Can you please help review this patch?

Can you summarize the algorithm in the description, in particular what a "removablePHIs" is and what distinguishes a inner-only reduction from an inner reduction?

llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll
149	[nit] unrelated change

In addition to Michael's comment, I'd like to suggest several comments:

This patch "undo" LICM within loop interchange in order to form a tightly nest loopnest. This sounds like a phase ordering problem, i.e., you could just place loop interchange before LICM and then you'll be able to interchange the loopnest. If it is a phase ordering problem then I'm not sure if it makes perfect sense to undo pass A within pass B, because things could quickly get messy if we choose to develop in this way.

Also, have you considered the "loopnest" version of loop passes? For example if you use the loopnest invariant code motion (LNICM) pass instead of LICM pass in the pipeline, you'll not hoist z[j] into int tmp = z[j] (in the example from your summary), because LNICM will only do hoisting if a variable is invariant across the entire loopnest.

In D53027, support for inner-only reductions is removed because of miscompilation of certain cases, as well as profitability concerns since interchange would move loads and stores from the outer loop into the inner loop. Have you thought about addressing these problems?

One possible miscompilation, as mentioned in https://reviews.llvm.org/D53027#1272786, is that it would miscompile if we interchange the following code, given that y is a global variable.

for (unsigned i = 0; i < N; i++) {
  y = 0;
  for (unsigned j = 0; j < N; j++) {
    A[i][j] += 1;
    y += A[i][j];
  }
}

congzhe added a project: Restricted Project.Apr 26 2023, 8:04 PM

In D144226#4299835, @Meinersbur wrote:

Summarize the algorithm in the description, in particular what a "removablePHIs" is and what distinguishes a inner-only reduction from an inner reduction?

@Meinersbur. Thank you for the review.

By inner-only reduction, I mean that the outer loop is not involved in the
reduction at all. For eg:

for (j = 0; j < n; j++) {
    for (k = 0; k < l; k++) {
        z[j] += x[k]*y[n*k+j];
    }
}

In the code above, reduction is performed on each z[j] only by the inner k loop. The
outer loop is not involved in the reduction at all. Loop interchange can handle such
cases if the load and store of z[j] remain within the inner loop. But due to LICM, the load
and store are moved into the outer loop, thereby creating a PHI in the inner-loop that
is not part of reduction across the outerloop. To handle this, we sink the loads
and hoist the stores to the inner loop. This eliminates the inner loop PHI and also
makes the two loops tightly nested, allowing us to loop interchange.

In the patch we try to keep track of such Load-LoadPHI-StorePHI-Store chains.
In places where we check if we can handle a PHI, we match if it's a "removablePHI" -a LoadPHI
or the StorePHI that can be removed by moving the loads and stores inwards

In the end, if we do decide that loop-interchange is profitable we move the loads
and store inwards and let LoopInterchange do the rest.

In D144226#4301073, @congzhe wrote:
In addition to Michael's comment, I'd like to suggest several comments:

This patch "undo" LICM within loop interchange in order to form a tightly nest loopnest. This sounds like a phase ordering problem, i.e., you could just place loop interchange before LICM and then you'll be able to interchange the loopnest. If it is a phase ordering problem then I'm not sure if it makes perfect sense to undo pass A within pass B, because things could quickly get messy if we choose to develop in this way.

Also, have you considered the "loopnest" version of loop passes? For example if you use the loopnest invariant code motion (LNICM) pass instead of LICM pass in the pipeline, you'll not hoist z[j] into int tmp = z[j] (in the example from your summary), because LNICM will only do hoisting if a variable is invariant across the entire loopnest.

In D53027, support for inner-only reductions is removed because of miscompilation of certain cases, as well as profitability concerns since interchange would move loads and stores from the outer loop into the inner loop. Have you thought about addressing these problems?

One possible miscompilation, as mentioned in https://reviews.llvm.org/D53027#1272786, is that it would miscompile if we interchange the following code, given that y is a global variable.
for (unsigned i = 0; i < N; i++) {
  y = 0;
  for (unsigned j = 0; j < N; j++) {
    A[i][j] += 1;
    y += A[i][j];
  }
}

@congzhe Thank you for the review.

If we just force a run of LoopInterchangePass before LICM then I do see the interchanged loop, but then if the programmer tries to write the program in a different manner, it might not work. For eg:

for (j = 0; j < n; j++) {
    for (k = 0; k < l; k++) {
        z[j] += x[k]*y[n*k+j];
    }
}

would work, but this would not:

for (j = 0; j < n; j++) {
    int tmp = z[j];
    for (k = 0; k < l; k++) {
        tmp += x[k]*y[n*k+j];
    }
    z[j] = tmp;
}

Arguement 1 holds here too, if the programmer writes the program a little differently then the interchange won't happen.

I don't think there should be miscompiles as we are just undoing LICM only in very specific cases. With regards to profitability, we still go through the same profitability analysis The mentioned code does not miscompile. Here are the debug logs:

	Processing LoopList of size = 2
	Found 2 Loads and Stores to analyze
	Found anti dependency between Src and Dst
	Src:  %1 = load i32, ptr %arrayidx5, align 4, !tbaa !5
	Dst:  store i32 %add, ptr %arrayidx5, align 4, !tbaa !5
	Processing InnerLoopId = 1 and OuterLoopId = 0
	Inner loop PHI is not part of reductions across the outer loop.
	Only inner loops with induction or reduction PHI nodes are supported currently.
	Not legal because of current transform limitation
	Not interchanging loops. Cannot prove legality.

I haven't done an extensive analysis of the performance characteristics of this
change. Can you please suggest some way on how I can check if this patch causes
any degradations? The pass is OFF by default unless one enables it explicitly.

Meinersbur added inline comments.May 17 2023, 12:48 PM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
275–280	Can you document this data structure and its fields?
385	I think "RemovablePHI" is the wrong name. The PHIs are not just removed, but demoted. Suggestion: "SinkablePHIs" ("sink" as in "sink into loop body").
561	In the loop optimization group call we discussed whether the CacheCost analysis will have the wrong result because it only sees the promoted reduction register, not the additional load/stores in the innermost loop. After thinking about it, CacheCost is defined in number of touched cache lines: This does not change for the outer loop, and the inner loop only accesses the same cache line repeatedly, so I don't think it matters.
725	[nit] unrelated change
871	There are other instructions that can access memory, e.g. CallInst (as in `memset(...)`). `tightlyNested` might already check for such instructions; if so, please add a comment about this here.
872	DI might not be necessary there. Since the idea is to undo what LICM does, you can use the same-strength analysis as LICM. LICM uses AliasSetTracker and MemorySSA, i.e. just asking AliasAnalysis/MemorySSA whether the pointers cannot alias (which DI does internally as well) should be sufficient.
880	`++` modifies the iterator. Might have consequences if `getIterator()` returns a reference.
909	If I am not mistaken, there is no previous check whether all accesses are simple already. That is, it will crash ind debug builds if someone uses e.g. `std::atomic`. I think it should also `return False` in this case.
914–916	[style]
934	Leftover code
970	It seems unnecessary to clone the instruction and create a new one? Why not using the existing instruction and move it into the loop?
979	[nit] Leftover code
982–984	Shouldn't the `LoopExitPHI` be removed as well?
1085	Unless you removed it yourself, shouldn't all instructions have a parent?

@Meinersbur Sorry I was on vacation for a couple of months. I've made the changes you recommended. PTAL

Herald added a subscriber: wangpc. · View Herald TranscriptJul 12 2023, 1:13 PM

Harbormaster completed remote builds in B244885: Diff 539699.Jul 12 2023, 1:14 PM

quic_aankit marked 6 inline comments as done.Jul 12 2023, 1:21 PM

quic_aankit added inline comments.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
871	You are right. TightlyNested will later check for presence of other instructions that may read or write to memory and we do not proceed with loop interchange in that case.
909	At this time, `populateDependencyMatrix` has already ensured that all the loads and stores are simple.
970	Not sure why I was cloning earlier. Fixed it.
982–984	Yep. It should have been if there are no users. Fixed it.
1085	Seems like a mistake I made earlier. Fixed.

quic_aankit updated this revision to Diff 539703.Jul 12 2023, 1:24 PM

quic_aankit marked an inline comment as done.

Harbormaster completed remote builds in B244888: Diff 539703.Jul 12 2023, 4:12 PM

@Meinersbur Ping! Is it possible to get this patch in before the release/17.x branch is created?

Ping for review!

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopInterchange.cpp

272 lines

test/

Transforms/

LoopInterchange/

inner-only-reductions.ll

357 lines

Diff 539703

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Show All 11 Lines

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/LoopInterchange.h" #include "llvm/Transforms/Scalar/LoopInterchange.h"

#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/SmallVector.h" #include "llvm/ADT/SmallVector.h"

#include "llvm/ADT/Statistic.h" #include "llvm/ADT/Statistic.h"

#include "llvm/ADT/StringRef.h" #include "llvm/ADT/StringRef.h"

#include "llvm/Analysis/AliasAnalysis.h"

#include "llvm/Analysis/DependenceAnalysis.h" #include "llvm/Analysis/DependenceAnalysis.h"

#include "llvm/Analysis/LoopCacheAnalysis.h" #include "llvm/Analysis/LoopCacheAnalysis.h"

#include "llvm/Analysis/LoopInfo.h" #include "llvm/Analysis/LoopInfo.h"

#include "llvm/Analysis/LoopNestAnalysis.h" #include "llvm/Analysis/LoopNestAnalysis.h"

#include "llvm/Analysis/LoopPass.h" #include "llvm/Analysis/LoopPass.h"

#include "llvm/Analysis/OptimizationRemarkEmitter.h" #include "llvm/Analysis/OptimizationRemarkEmitter.h"

#include "llvm/Analysis/ScalarEvolution.h" #include "llvm/Analysis/ScalarEvolution.h"

#include "llvm/Analysis/ScalarEvolutionExpressions.h" #include "llvm/Analysis/ScalarEvolutionExpressions.h"

▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines

} }

namespace { namespace {

/// LoopInterchangeLegality checks if it is legal to interchange the loop. /// LoopInterchangeLegality checks if it is legal to interchange the loop.

class LoopInterchangeLegality { class LoopInterchangeLegality {

public: public:

LoopInterchangeLegality(Loop *Outer, Loop *Inner, ScalarEvolution *SE, LoopInterchangeLegality(Loop *Outer, Loop *Inner, ScalarEvolution *SE,

OptimizationRemarkEmitter *ORE) AliasAnalysis *AA, OptimizationRemarkEmitter *ORE)

: OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE) {} : OuterLoop(Outer), InnerLoop(Inner), SE(SE), AA(AA), ORE(ORE) {}

/// Check if the loops can be interchanged. /// Check if the loops can be interchanged.

bool canInterchangeLoops(unsigned InnerLoopId, unsigned OuterLoopId, bool canInterchangeLoops(unsigned InnerLoopId, unsigned OuterLoopId,

CharMatrix &DepMatrix); CharMatrix &DepMatrix);

/// Discover induction PHIs in the header of \p L. Induction /// Discover induction PHIs in the header of \p L. Induction

/// PHIs are added to \p Inductions. /// PHIs are added to \p Inductions.

bool findInductions(Loop *L, SmallVectorImpl<PHINode *> &Inductions); bool findInductions(Loop *L, SmallVectorImpl<PHINode *> &Inductions);

/// Check if the loop structure is understood. We do not handle triangular /// Check if the loop structure is understood. We do not handle triangular

/// loops for now. /// loops for now.

bool isLoopStructureUnderstood(); bool isLoopStructureUnderstood();

bool currentLimitations(); bool currentLimitations();

const SmallPtrSetImpl<PHINode *> &getOuterInnerReductions() const { const SmallPtrSetImpl<PHINode *> &getOuterInnerReductions() const {

return OuterInnerReductions; return OuterInnerReductions;

} }

const SmallVectorImpl<PHINode *> &getInnerLoopInductions() const { const SmallVectorImpl<PHINode *> &getInnerLoopInductions() const {

return InnerLoopInductions; return InnerLoopInductions;

} }

bool containsSinkablePHIs() const { return !SinkablePHIs.empty(); }

void removeSinkablePHIs();

private: private:

/// Structure to track sinkable PHIs. These PHIs can be removed by

/// sinking the loads and hoisting the stores from the outer to the

/// inner loops. For eg, consider the loop structure below:

///

/// inner_loop.preheader:

/// %X.promoted = load i32, i32* @X ---> (1)

MeinersburUnsubmitted

Done

Can you document this data structure and its fields?

Meinersbur: Can you document this data structure and its fields?

/// br label %inner_loop.header

///

/// inner_loop.header:

/// %phi_val = phi i32 [ %X.promoted, %inner_loop.preheader ],

/// [ %var, %inner_loop.header ] ---> (2)

/// ....

/// %var = ....

/// ....

/// br i1 %exitcond, label %inner_loop.exit, label %inner_loop.header

///

/// inner_loop.exit:

/// %add.lcssa = phi i32 [ %var, %inner_loop.header ] ---> (3)

/// store i32 %add.lcssa, i32* @X ---> (4)

///

/// (1) Load - Sinkable Load

/// (2) LoopHeaderPHI - Inner Loop Header PHI with one value coming from the

/// Load Instruction & other from the LICM promoted var

/// (3) LoopExitPHI - Inner Loop Exit LCSSA PHI for LICM promoted value

/// (4) Store - Hoistable store of the LCSSA PHI value

struct PHIUseChain {

LoadInst *Load;

PHINode *LoopHeaderPHI;

PHINode *LoopExitPHI;

StoreInst *Store;

};

/// Check if SinkablePHIs contain \p I. The removable instruction can be

/// any of:

/// 1. Sinkable Load.

/// 2. Inner Loop Header PHI with one value coming from the Load Instruction

/// and other from the LICM promoted variable.

/// 3. Inner Loop Exit LCSSA PHI for LICM promoted value if it has just one

/// user (hoistable store).

/// 4. Hoistable store of the LCSSA PHI value.

bool isRemovableInst(const Instruction *I) const {

return any_of(SinkablePHIs, [I](const auto PHIChain) {

return PHIChain.Load == I || PHIChain.LoopHeaderPHI == I ||

PHIChain.Store == I ||

(PHIChain.LoopExitPHI == I && PHIChain.LoopExitPHI->hasOneUser());

});

}

bool tightlyNested(Loop *Outer, Loop *Inner); bool tightlyNested(Loop *Outer, Loop *Inner);

bool containsUnsafeInstructions(BasicBlock *BB); bool containsUnsafeInstructions(BasicBlock *BB);

/// Discover induction and reduction PHIs in the header of \p L. Induction /// Discover induction and reduction PHIs in the header of \p L. Induction

/// PHIs are added to \p Inductions, reductions are added to /// PHIs are added to \p Inductions, reductions are added to

/// OuterInnerReductions. When the outer loop is passed, the inner loop needs /// OuterInnerReductions. When the outer loop is passed, the inner loop needs

/// to be passed as \p InnerLoop. /// to be passed as \p InnerLoop.

bool findInductionAndReductions(Loop *L, bool findInductionAndReductions(Loop *L,

SmallVector<PHINode *, 8> &Inductions, SmallVector<PHINode *, 8> &Inductions,

Loop *InnerLoop); Loop *InnerLoop);

bool areInnerLoopExitPHIsSupported();

/// Populate SinkablePHIs with all sinkable PHIs in the header of

/// \p InnerLoop. Currently we try to match LICM promoted variables in

/// the InnerLoop. We look for below structure:

///

/// inner_loop.preheader:

/// %X.promoted = load i32, i32* @X

/// br label %inner_loop.header

///

/// inner_loop.header:

/// %phi_val = phi i32 [ %X.promoted, %inner_loop.preheader ],

/// [ %var, %inner_loop.header ]

/// ....

/// %var = ....

/// ....

/// br i1 %exitcond, label %inner_loop.exit, label %inner_loop.header

///

/// inner_loop.exit:

/// %add.lcssa = phi i32 [ %var, %inner_loop.header ]

/// store i32 %add.lcssa, i32* @X

///

/// It's legal to move the load of X into the inner loop if there is a

/// correspoding store to the same location in the inner loop exit block

/// and the store value is same as the PHI value coming from the loop (%var).

/// By sinking the load and hoisting the store we can remove %add.lcssa &

/// %phi_val.

void findSinkablePHIs();

// Look for conflicts between \p Store and Instructions in the set:

// {\p L, from Load to \p L's preheader terminator, \p L's Exit till \p Store}

bool hasMemoryConflicts(Loop *L, LoadInst *Load, StoreInst *Store);

Loop *OuterLoop; Loop *OuterLoop;

Loop *InnerLoop; Loop *InnerLoop;

ScalarEvolution *SE; ScalarEvolution *SE;

AliasAnalysis *AA;

/// Interface to emit optimization remarks. /// Interface to emit optimization remarks.

OptimizationRemarkEmitter *ORE; OptimizationRemarkEmitter *ORE;

/// Set of reduction PHIs taking part of a reduction across the inner and /// Set of reduction PHIs taking part of a reduction across the inner and

/// outer loop. /// outer loop.

SmallPtrSet<PHINode *, 4> OuterInnerReductions; SmallPtrSet<PHINode *, 4> OuterInnerReductions;

/// Set of inner loop induction PHIs /// Set of inner loop induction PHIs

SmallVector<PHINode *, 8> InnerLoopInductions; SmallVector<PHINode *, 8> InnerLoopInductions;

// Set of Sinkable PHI chains

SmallVector<PHIUseChain, 8> SinkablePHIs;

MeinersburUnsubmitted

Done

I think "RemovablePHI" is the wrong name. The PHIs are not just removed, but demoted. Suggestion: "SinkablePHIs" ("sink" as in "sink into loop body").

Meinersbur: I think "RemovablePHI" is the wrong name. The PHIs are not just removed, but demoted.

}; };

/// LoopInterchangeProfitability checks if it is profitable to interchange the /// LoopInterchangeProfitability checks if it is profitable to interchange the

/// loop. /// loop.

class LoopInterchangeProfitability { class LoopInterchangeProfitability {

public: public:

LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE, LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE,

OptimizationRemarkEmitter *ORE) OptimizationRemarkEmitter *ORE)

Show All 25 Lines private:

OptimizationRemarkEmitter *ORE; OptimizationRemarkEmitter *ORE;

}; };

/// LoopInterchangeTransform interchanges the loop. /// LoopInterchangeTransform interchanges the loop.

class LoopInterchangeTransform { class LoopInterchangeTransform {

public: public:

LoopInterchangeTransform(Loop *Outer, Loop *Inner, ScalarEvolution *SE, LoopInterchangeTransform(Loop *Outer, Loop *Inner, ScalarEvolution *SE,

LoopInfo *LI, DominatorTree *DT, LoopInfo *LI, DominatorTree *DT,

const LoopInterchangeLegality &LIL) LoopInterchangeLegality &LIL)

: OuterLoop(Outer), InnerLoop(Inner), SE(SE), LI(LI), DT(DT), LIL(LIL) {} : OuterLoop(Outer), InnerLoop(Inner), SE(SE), LI(LI), DT(DT), LIL(LIL) {}

/// Interchange OuterLoop and InnerLoop. /// Interchange OuterLoop and InnerLoop.

bool transform(); bool transform();

void restructureLoops(Loop *NewInner, Loop *NewOuter, void restructureLoops(Loop *NewInner, Loop *NewOuter,

BasicBlock *OrigInnerPreHeader, BasicBlock *OrigInnerPreHeader,

BasicBlock *OrigOuterPreHeader); BasicBlock *OrigOuterPreHeader);

void removeChildLoop(Loop *OuterLoop, Loop *InnerLoop); void removeChildLoop(Loop *OuterLoop, Loop *InnerLoop);

private: private:

bool adjustLoopLinks(); bool adjustLoopLinks();

bool adjustLoopBranches(); bool adjustLoopBranches();

Loop *OuterLoop; Loop *OuterLoop;

Loop *InnerLoop; Loop *InnerLoop;

/// Scev analysis. /// Scev analysis.

ScalarEvolution *SE; ScalarEvolution *SE;

LoopInfo *LI; LoopInfo *LI;

DominatorTree *DT; DominatorTree *DT;

const LoopInterchangeLegality &LIL; LoopInterchangeLegality &LIL;

}; };

struct LoopInterchange { struct LoopInterchange {

ScalarEvolution *SE = nullptr; ScalarEvolution *SE = nullptr;

LoopInfo *LI = nullptr; LoopInfo *LI = nullptr;

DependenceInfo *DI = nullptr; DependenceInfo *DI = nullptr;

DominatorTree *DT = nullptr; DominatorTree *DT = nullptr;

AliasAnalysis *AA = nullptr;

std::unique_ptr<CacheCost> CC = nullptr; std::unique_ptr<CacheCost> CC = nullptr;

/// Interface to emit optimization remarks. /// Interface to emit optimization remarks.

OptimizationRemarkEmitter *ORE; OptimizationRemarkEmitter *ORE;

LoopInterchange(ScalarEvolution *SE, LoopInfo *LI, DependenceInfo *DI, LoopInterchange(ScalarEvolution *SE, LoopInfo *LI, DependenceInfo *DI,

DominatorTree *DT, std::unique_ptr<CacheCost> &CC, DominatorTree *DT, AliasAnalysis *AA,

std::unique_ptr<CacheCost> &CC,

OptimizationRemarkEmitter *ORE) OptimizationRemarkEmitter *ORE)

: SE(SE), LI(LI), DI(DI), DT(DT), CC(std::move(CC)), ORE(ORE) {} : SE(SE), LI(LI), DI(DI), DT(DT), AA(AA), CC(std::move(CC)), ORE(ORE) {}

bool run(Loop *L) { bool run(Loop *L) {

if (L->getParentLoop()) if (L->getParentLoop())

return false; return false;

SmallVector<Loop *, 8> LoopList; SmallVector<Loop *, 8> LoopList;

populateWorklist(*L, LoopList); populateWorklist(*L, LoopList);

return processLoopList(LoopList); return processLoopList(LoopList);

} }

▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines #endif

// later. Indices in loop vector reprsent the optimal order of the // later. Indices in loop vector reprsent the optimal order of the

// corresponding loop, e.g., given a loopnest with depth N, index 0 // corresponding loop, e.g., given a loopnest with depth N, index 0

// indicates the loop should be placed as the outermost loop and index N // indicates the loop should be placed as the outermost loop and index N

// indicates the loop should be placed as the innermost loop. // indicates the loop should be placed as the innermost loop.

// //

// For the old pass manager CacheCost would be null. // For the old pass manager CacheCost would be null.

DenseMap<const Loop *, unsigned> CostMap; DenseMap<const Loop *, unsigned> CostMap;

if (CC != nullptr) { if (CC != nullptr) {

const auto &LoopCosts = CC->getLoopCosts(); const auto &LoopCosts = CC->getLoopCosts();

MeinersburUnsubmitted

Not Done

In the loop optimization group call we discussed whether the CacheCost analysis will have the wrong result because it only sees the promoted reduction register, not the additional load/stores in the innermost loop.

After thinking about it, CacheCost is defined in number of touched cache lines: This does not change for the outer loop, and the inner loop only accesses the same cache line repeatedly, so I don't think it matters.

Meinersbur: In the loop optimization group call we discussed whether the CacheCost analysis will have the…

for (unsigned i = 0; i < LoopCosts.size(); i++) { for (unsigned i = 0; i < LoopCosts.size(); i++) {

CostMap[LoopCosts[i].first] = i; CostMap[LoopCosts[i].first] = i;

} }

// We try to achieve the globally optimal memory access for the loopnest, // We try to achieve the globally optimal memory access for the loopnest,

// and do interchange based on a bubble-sort fasion. We start from // and do interchange based on a bubble-sort fasion. We start from

// the innermost loop, move it outwards to the best possible position // the innermost loop, move it outwards to the best possible position

// and repeat this process. // and repeat this process.

Show All 24 Lines #endif

} }

bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId, bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId,

unsigned OuterLoopId, unsigned OuterLoopId,

std::vector<std::vector<char>> &DependencyMatrix, std::vector<std::vector<char>> &DependencyMatrix,

const DenseMap<const Loop *, unsigned> &CostMap) { const DenseMap<const Loop *, unsigned> &CostMap) {

LLVM_DEBUG(dbgs() << "Processing InnerLoopId = " << InnerLoopId LLVM_DEBUG(dbgs() << "Processing InnerLoopId = " << InnerLoopId

<< " and OuterLoopId = " << OuterLoopId << "\n"); << " and OuterLoopId = " << OuterLoopId << "\n");

LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, ORE); LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, AA, ORE);

if (!LIL.canInterchangeLoops(InnerLoopId, OuterLoopId, DependencyMatrix)) { if (!LIL.canInterchangeLoops(InnerLoopId, OuterLoopId, DependencyMatrix)) {

LLVM_DEBUG(dbgs() << "Not interchanging loops. Cannot prove legality.\n"); LLVM_DEBUG(dbgs() << "Not interchanging loops. Cannot prove legality.\n");

return false; return false;

} }

LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n"); LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n");

LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE); LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE);

if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId, if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId,

DependencyMatrix, CostMap, CC)) { DependencyMatrix, CostMap, CC)) {

Show All 16 Lines bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId,

llvm::formLCSSARecursively(*OuterLoop, *DT, LI, SE); llvm::formLCSSARecursively(*OuterLoop, *DT, LI, SE);

return true; return true;

} }

}; };

} // end anonymous namespace } // end anonymous namespace

bool LoopInterchangeLegality::containsUnsafeInstructions(BasicBlock *BB) { bool LoopInterchangeLegality::containsUnsafeInstructions(BasicBlock *BB) {

return any_of(*BB, [](const Instruction &I) { return any_of(*BB, [this](const Instruction &I) {

return I.mayHaveSideEffects() || I.mayReadFromMemory(); return !isRemovableInst(&I) &&

(I.mayHaveSideEffects() || I.mayReadFromMemory());

}); });

} }

bool LoopInterchangeLegality::tightlyNested(Loop *OuterLoop, Loop *InnerLoop) { bool LoopInterchangeLegality::tightlyNested(Loop *OuterLoop, Loop *InnerLoop) {

BasicBlock *OuterLoopHeader = OuterLoop->getHeader(); BasicBlock *OuterLoopHeader = OuterLoop->getHeader();

BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader(); BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();

BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch(); BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();

▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines if (PHINode *PHI = dyn_cast<PHINode>(User)) {

// Detect floating point reduction only when it can be reordered. // Detect floating point reduction only when it can be reordered.

if (RD.getExactFPMathInst() != nullptr) if (RD.getExactFPMathInst() != nullptr)

return nullptr; return nullptr;

return PHI; return PHI;

} }

return nullptr; return nullptr;

} }

MeinersburUnsubmitted

Not Done

[nit] unrelated change

Meinersbur: [nit] unrelated change

return nullptr; return nullptr;

} }

bool LoopInterchangeLegality::findInductionAndReductions( bool LoopInterchangeLegality::findInductionAndReductions(

Loop *L, SmallVector<PHINode *, 8> &Inductions, Loop *InnerLoop) { Loop *L, SmallVector<PHINode *, 8> &Inductions, Loop *InnerLoop) {

if (!L->getLoopLatch() || !L->getLoopPredecessor()) if (!L->getLoopLatch() || !L->getLoopPredecessor())

return false; return false;

for (PHINode &PHI : L->getHeader()->phis()) { for (PHINode &PHI : L->getHeader()->phis()) {

InductionDescriptor ID; InductionDescriptor ID;

if (InductionDescriptor::isInductionPHI(&PHI, L, SE, ID)) if (InductionDescriptor::isInductionPHI(&PHI, L, SE, ID))

Inductions.push_back(&PHI); Inductions.push_back(&PHI);

else { else {

// PHIs in inner loops need to be part of a reduction in the outer loop, // PHIs in inner loops need to be part of a reduction in the outer loop,

// discovered when checking the PHIs of the outer loop earlier. // discovered when checking the PHIs of the outer loop earlier.

if (!InnerLoop) { if (!InnerLoop) {

if (!OuterInnerReductions.count(&PHI)) { if (!OuterInnerReductions.count(&PHI) && !isRemovableInst(&PHI)) {

LLVM_DEBUG(dbgs() << "Inner loop PHI is not part of reductions " LLVM_DEBUG(dbgs() << "Inner loop PHI is not part of reductions "

"across the outer loop.\n"); "across the outer loop.\n");

return false; return false;

} }

} else { } else {

assert(PHI.getNumIncomingValues() == 2 && assert(PHI.getNumIncomingValues() == 2 &&

"Phis in loop header should have exactly 2 incoming values"); "Phis in loop header should have exactly 2 incoming values");

// Check if we have a PHI node in the outer loop that has a reduction // Check if we have a PHI node in the outer loop that has a reduction

Show All 10 Lines else {

OuterInnerReductions.insert(&PHI); OuterInnerReductions.insert(&PHI);

OuterInnerReductions.insert(InnerRedPhi); OuterInnerReductions.insert(InnerRedPhi);

} }

return true; return true;

} }

bool LoopInterchangeLegality::hasMemoryConflicts(Loop *L, LoadInst *Load,

StoreInst *Store) {

assert(Load->getParent() == L->getLoopPreheader() &&

"Expecting Load to be in the loop preheader.");

assert(Store->getParent() == L->getExitBlock() &&

"Expecting Store to be in the loop exit block.");

auto conflicts = [this](BasicBlock::iterator Begin, BasicBlock::iterator End,

StoreInst *SI) {

for (auto I = Begin; I != End; I++)

// We need to check only for Load/Store instructions, as we abort

MeinersburUnsubmitted

Not Done

There are other instructions that can access memory, e.g. CallInst (as in memset(...)). tightlyNested might already check for such instructions; if so, please add a comment about this here.

Meinersbur: There are other instructions that can access memory, e.g. CallInst (as in `memset(...)`).

quic_aankitAuthorUnsubmitted

Done

You are right. TightlyNested will later check for presence of other instructions that may read or write to memory and we do not proceed with loop interchange in that case.

quic_aankit: You are right. TightlyNested will later check for presence of other instructions that may read…

// LoopInterchange in the presence of other non load/store instructions

MeinersburUnsubmitted

Done

DI might not be necessary there. Since the idea is to undo what LICM does, you can use the same-strength analysis as LICM. LICM uses AliasSetTracker and MemorySSA, i.e. just asking AliasAnalysis/MemorySSA whether the pointers cannot alias (which DI does internally as well) should be sufficient.

Meinersbur: DI might not be necessary there. Since the idea is to undo what LICM does, you can use the same…

// that may read/write to memory as the loops don't remain tightly nested.

if ((isa<LoadInst>(I) || isa<StoreInst>(I)) &&

!AA->isNoAlias(MemoryLocation::get(SI), MemoryLocation::get(&*I)))

return true;

return false;

};

// Check for conflicts between:

MeinersburUnsubmitted

Done

// 2. Store and instructions from L's ExitBlock to Store.

- if (conflicts(++Load->getIterator(),

+ if (conflicts(std::next(Load->getIterator()),

L->getLoopPreheader()->getTerminator()->getIterator(), Store) ||

++ modifies the iterator. Might have consequences if getIterator() returns a reference.

Meinersbur: `++` modifies the iterator. Might have consequences if `getIterator()` returns a reference.

// 1. Store and instructions from Load to L's preheader terminator.

// 2. Store and instructions from L's ExitBlock to Store.

if (conflicts(std::next(Load->getIterator()),

L->getLoopPreheader()->getTerminator()->getIterator(), Store) ||

conflicts(L->getExitBlock()->getFirstNonPHI()->getIterator(),

Store->getIterator(), Store))

return true;

// Check for conflicts between store and all instructions in L.

for (BasicBlock *BB : L->blocks()) {

if (conflicts(BB->getFirstNonPHI()->getIterator(),

BB->getTerminator()->getIterator(), Store))

return true;

}

return false;

}

void LoopInterchangeLegality::findSinkablePHIs() {

// Check if the PHI is part of a LICM promoted variable.

auto isSinkablePHI = [this](Loop *L, PHINode *PHI) {

assert(PHI->getParent() == L->getHeader() &&

"Expecting PHI to be part of the Loop Header");

assert(PHI->getNumIncomingValues() == 2 &&

"Phis in loop header should have exactly 2 incoming values");

// Loop simplify guarantees that the PHIs 2 incoming values will be

// from LoopPreheader and LoopLatch.

Value *PromotedVarValue = PHI->getIncomingValueForBlock(L->getLoopLatch());

Value *LoadValue = PHI->getIncomingValueForBlock(L->getLoopPreheader());

if (!isa<LoadInst>(LoadValue))

return false;

MeinersburUnsubmitted

Not Done

If I am not mistaken, there is no previous check whether all accesses are simple already. That is, it will crash ind debug builds if someone uses e.g. std::atomic. I think it should also return False in this case.

Meinersbur: If I am not mistaken, there is no previous check whether all accesses are simple already. That…

quic_aankitAuthorUnsubmitted

Done

At this time, populateDependencyMatrix has already ensured that all the loads and stores are simple.

quic_aankit: At this time, `populateDependencyMatrix` has already ensured that all the loads and stores are…

LoadInst *Load = dyn_cast<LoadInst>(LoadValue);

// Load should only be used by the PHI node.

assert(Load->isSimple() && "Expecting a simple load!");

if (!Load->hasOneUser())

return false;

for (Value *PromotedVarUser : PromotedVarValue->users()) {

MeinersburUnsubmitted

Done

for (Value *PromotedVarUser : PromotedVarValue->users()) {

- if (PromotedVarUser == PHI) {

+ if (PromotedVarUser == PHI)

continue;

- }

// If the variable was LICM promoted then the PHI should only

[style]

Meinersbur: [style]

if (PromotedVarUser == PHI)

continue;

// If the variable was LICM promoted then the PHI should only

// have one incoming value from the loop.

PHINode *LCSSAPHI = dyn_cast<PHINode>(PromotedVarUser);

if (!LCSSAPHI)

continue;

// Check for store in the LoopExitBlock which stores the LCSSA Phi value

// to the same address as the Load instruction.

StoreInst *Store = nullptr;

for (Value *LCSSAPHIUser : LCSSAPHI->users()) {

// Ignore uses of LCSSAPHIUser in other BasicBlocks

if (isa<PHINode>(LCSSAPHIUser))

continue;

if (!isa<StoreInst>(LCSSAPHIUser))

return false;

Store = dyn_cast<StoreInst>(LCSSAPHIUser);

MeinersburUnsubmitted

Done

assert(Store->isSimple() && "Expecting a simple store!");

- // Value *Addr = Store->getPointerOperand();

if (Load->getPointerOperand() == Store->getPointerOperand()) {

Leftover code

Meinersbur: Leftover code

assert(Store->isSimple() && "Expecting a simple store!");

if (Load->getPointerOperand() == Store->getPointerOperand()) {

LLVM_DEBUG(

dbgs() << "Can demote PHI by sinking load and hoisting store\n";

PHI->dump());

// Make sure no other store instruction exists between the Load and

// the store that write to the same address:

// For eg: hoisting the 2nd store to inside the inner loop would be

// incorrect.

// Load X

// Innerloop

// Store X 4

// Store X PromotedVar

if (hasMemoryConflicts(L, Load, Store))

return false;

SinkablePHIs.push_back({Load, PHI, LCSSAPHI, Store});

return true;

}

return false;

};

for (PHINode &PHI : InnerLoop->getHeader()->phis())

isSinkablePHI(InnerLoop, &PHI);

}

void LoopInterchangeLegality::removeSinkablePHIs() {

RecurrenceDescriptor RD;

for (const PHIUseChain &LICMChain : SinkablePHIs) {

LoadInst *Load = LICMChain.Load;

PHINode *LoopHeaderPHI = LICMChain.LoopHeaderPHI;

PHINode *LoopExitPHI = LICMChain.LoopExitPHI;

StoreInst *Store = LICMChain.Store;

Load->moveBefore(&*InnerLoop->getHeader()->getFirstInsertionPt());

MeinersburUnsubmitted

Not Done

It seems unnecessary to clone the instruction and create a new one? Why not using the existing instruction and move it into the loop?

Meinersbur: It seems unnecessary to clone the instruction and create a new one? Why not using the existing…

quic_aankitAuthorUnsubmitted

Done

Not sure why I was cloning earlier. Fixed it.

quic_aankit: Not sure why I was cloning earlier. Fixed it.

LoopHeaderPHI->replaceAllUsesWith(Load);

Store->setOperand(0, LoopExitPHI->getIncomingValue(0));

Store->moveBefore(&*InnerLoop->getLoopLatch()->getTerminator());

LoopHeaderPHI->eraseFromParent();

if (LoopExitPHI->user_empty())

LoopExitPHI->eraseFromParent();

}

SinkablePHIs.clear();

}

MeinersburUnsubmitted

Done

SI->setOperand(0, LoopExitPHI->getIncomingValue(0));

- // SI->insertAfter(dyn_cast<Instruction>(LoopExitPHI->getIncomingValue(0)));

SI->insertBefore(&*InnerLoop->getLoopLatch()->getTerminator());

[nit] Leftover code

Meinersbur: [nit] Leftover code

// This function indicates the current limitations in the transform as a result // This function indicates the current limitations in the transform as a result

// of which we do not proceed. // of which we do not proceed.

bool LoopInterchangeLegality::currentLimitations() { bool LoopInterchangeLegality::currentLimitations() {

BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch(); BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();

MeinersburUnsubmitted

Not Done

Shouldn't the LoopExitPHI be removed as well?

Meinersbur: Shouldn't the `LoopExitPHI` be removed as well?

quic_aankitAuthorUnsubmitted

Done

Yep. It should have been if there are no users. Fixed it.

quic_aankit: Yep. It should have been if there are no users. Fixed it.

// transform currently expects the loop latches to also be the exiting // transform currently expects the loop latches to also be the exiting

// blocks. // blocks.

if (InnerLoop->getExitingBlock() != InnerLoopLatch || if (InnerLoop->getExitingBlock() != InnerLoopLatch ||

OuterLoop->getExitingBlock() != OuterLoop->getLoopLatch() || OuterLoop->getExitingBlock() != OuterLoop->getLoopLatch() ||

!isa<BranchInst>(InnerLoopLatch->getTerminator()) || !isa<BranchInst>(InnerLoopLatch->getTerminator()) ||

!isa<BranchInst>(OuterLoop->getLoopLatch()->getTerminator())) { !isa<BranchInst>(OuterLoop->getLoopLatch()->getTerminator())) {

LLVM_DEBUG( LLVM_DEBUG(

dbgs() << "Loops where the latch is not the exiting block are not" dbgs() << "Loops where the latch is not the exiting block are not"

<< " supported currently.\n"); << " supported currently.\n");

ORE->emit([&]() { ORE->emit([&]() {

return OptimizationRemarkMissed(DEBUG_TYPE, "ExitingNotLatch", return OptimizationRemarkMissed(DEBUG_TYPE, "ExitingNotLatch",

OuterLoop->getStartLoc(), OuterLoop->getStartLoc(),

OuterLoop->getHeader()) OuterLoop->getHeader())

<< "Loops where the latch is not the exiting block cannot be" << "Loops where the latch is not the exiting block cannot be"

" interchange currently."; " interchange currently.";

}); });

return true; return true;

} }

// Find all the sinkable PHIs in the inner-loop header by finding LICM

// promoted variables.

findSinkablePHIs();

SmallVector<PHINode *, 8> Inductions; SmallVector<PHINode *, 8> Inductions;

if (!findInductionAndReductions(OuterLoop, Inductions, InnerLoop)) { if (!findInductionAndReductions(OuterLoop, Inductions, InnerLoop)) {

LLVM_DEBUG( LLVM_DEBUG(

dbgs() << "Only outer loops with induction or reduction PHI nodes " dbgs() << "Only outer loops with induction or reduction PHI nodes "

<< "are supported currently.\n"); << "are supported currently.\n");

ORE->emit([&]() { ORE->emit([&]() {

return OptimizationRemarkMissed(DEBUG_TYPE, "UnsupportedPHIOuter", return OptimizationRemarkMissed(DEBUG_TYPE, "UnsupportedPHIOuter",

OuterLoop->getStartLoc(), OuterLoop->getStartLoc(),

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines for (PHINode &PHI : L->getHeader()->phis()) {

InductionDescriptor ID; InductionDescriptor ID;

if (InductionDescriptor::isInductionPHI(&PHI, L, SE, ID)) if (InductionDescriptor::isInductionPHI(&PHI, L, SE, ID))

Inductions.push_back(&PHI); Inductions.push_back(&PHI);

} }

return !Inductions.empty(); return !Inductions.empty();

} }

// We currently only support LCSSA PHI nodes in the inner loop exit, if their // We currently only support LCSSA PHI nodes in the inner loop exit, if their

// users are either reduction PHIs or PHIs outside the outer loop (which means // users are either:

// the we are only interested in the final value after the loop). // 1. Reduction PHIs.

static bool // 2. PHIs outside the outer loop (which means that we are only interested in

areInnerLoopExitPHIsSupported(Loop *InnerL, Loop *OuterL, // the final value after the loop).

SmallPtrSetImpl<PHINode *> &Reductions) { // 3. One of the removable instructions from the SinkablePHIs structure.

BasicBlock *InnerExit = OuterL->getUniqueExitBlock(); bool LoopInterchangeLegality::areInnerLoopExitPHIsSupported() {

BasicBlock *InnerExit = InnerLoop->getUniqueExitBlock();

for (PHINode &PHI : InnerExit->phis()) { for (PHINode &PHI : InnerExit->phis()) {

// Reduction lcssa phi will have only 1 incoming block that from loop latch. // Reduction lcssa phi will have only 1 incoming block that from loop latch.

if (PHI.getNumIncomingValues() > 1) if (PHI.getNumIncomingValues() > 1)

return false; return false;

if (any_of(PHI.users(), [&Reductions, OuterL](User *U) { if (any_of(PHI.users(), [this](User *U) {

PHINode *PN = dyn_cast<PHINode>(U); PHINode *PN = dyn_cast<PHINode>(U); // false

MeinersburUnsubmitted

Not Done

Instruction *UI = dyn_cast<Instruction>(U);

- return !isRemovableInst(UI) && UI->getParent() &&

+ return !isRemovableInst(UI) &&

(!PN || (!OuterInnerReductions.count(PN) &&

Unless you removed it yourself, shouldn't all instructions have a parent?

Meinersbur: Unless you removed it yourself, shouldn't all instructions have a parent?

quic_aankitAuthorUnsubmitted

Done

Seems like a mistake I made earlier. Fixed.

quic_aankit: Seems like a mistake I made earlier. Fixed.

return !PN || Instruction *UI = dyn_cast<Instruction>(U);

(!Reductions.count(PN) && OuterL->contains(PN->getParent())); return (!PN || (!OuterInnerReductions.count(PN) &&

OuterLoop->contains(PN->getParent()))) &&

!isRemovableInst(UI);

})) { })) {

return false; return false;

} }

return true; return true;

} }

// We currently support LCSSA PHI nodes in the outer loop exit, if their // We currently support LCSSA PHI nodes in the outer loop exit, if their

▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines ORE->emit([&]() {

InnerLoop->getStartLoc(), InnerLoop->getStartLoc(),

InnerLoop->getHeader()) InnerLoop->getHeader())

<< "Cannot interchange loops because they are not tightly " << "Cannot interchange loops because they are not tightly "

"nested."; "nested.";

}); });

return false; return false;

} }

if (!areInnerLoopExitPHIsSupported(OuterLoop, InnerLoop, if (!areInnerLoopExitPHIsSupported()) {

OuterInnerReductions)) {

LLVM_DEBUG(dbgs() << "Found unsupported PHI nodes in inner loop exit.\n"); LLVM_DEBUG(dbgs() << "Found unsupported PHI nodes in inner loop exit.\n");

ORE->emit([&]() { ORE->emit([&]() {

return OptimizationRemarkMissed(DEBUG_TYPE, "UnsupportedExitPHI", return OptimizationRemarkMissed(DEBUG_TYPE, "UnsupportedExitPHI",

InnerLoop->getStartLoc(), InnerLoop->getStartLoc(),

InnerLoop->getHeader()) InnerLoop->getHeader())

<< "Found unsupported PHI node in loop exit."; << "Found unsupported PHI node in loop exit.";

}); });

return false; return false;

▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines void LoopInterchangeTransform::restructureLoops(

// Tell SE that we move the loops around. // Tell SE that we move the loops around.

SE->forgetLoop(NewOuter); SE->forgetLoop(NewOuter);

} }

bool LoopInterchangeTransform::transform() { bool LoopInterchangeTransform::transform() {

bool Transformed = false; bool Transformed = false;

if (LIL.containsSinkablePHIs()) {

LLVM_DEBUG(dbgs() << "Removing sinkable PHIs.\n");

LIL.removeSinkablePHIs();

}

if (InnerLoop->getSubLoops().empty()) { if (InnerLoop->getSubLoops().empty()) {

BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader(); BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();

LLVM_DEBUG(dbgs() << "Splitting the inner loop latch\n"); LLVM_DEBUG(dbgs() << "Splitting the inner loop latch\n");

auto &InductionPHIs = LIL.getInnerLoopInductions(); auto &InductionPHIs = LIL.getInnerLoopInductions();

if (InductionPHIs.empty()) { if (InductionPHIs.empty()) {

LLVM_DEBUG(dbgs() << "Failed to find the point to split loop latch \n"); LLVM_DEBUG(dbgs() << "Failed to find the point to split loop latch \n");

return false; return false;

} }

▲ Show 20 Lines • Show All 417 Lines • ▼ Show 20 Lines PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,

LoopStandardAnalysisResults &AR, LoopStandardAnalysisResults &AR,

LPMUpdater &U) { LPMUpdater &U) {

Function &F = *LN.getParent(); Function &F = *LN.getParent();

DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI); DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);

std::unique_ptr<CacheCost> CC = std::unique_ptr<CacheCost> CC =

CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI); CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);

OptimizationRemarkEmitter ORE(&F); OptimizationRemarkEmitter ORE(&F);

if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, CC, &ORE).run(LN)) if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, &AR.AA, CC, &ORE).run(LN))

return PreservedAnalyses::all(); return PreservedAnalyses::all();

U.markLoopNestChanged(true); U.markLoopNestChanged(true);

return getLoopPassPreservedAnalyses(); return getLoopPassPreservedAnalyses();

} }

llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll

; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \		; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa 2>&1 \| FileCheck -check-prefix=IR %s		; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa 2>&1 \| FileCheck -check-prefix=IR %s
; RUN: FileCheck --input-file=%t %s		; RUN: FileCheck --input-file=%t %s

; Inner loop only reductions are not supported currently. See discussion at
; D53027 for more information on the required checks.

@A = common global [500 x [500 x i32]] zeroinitializer		@A = common global [500 x [500 x i32]] zeroinitializer
@X = common global i32 0		@X = common global i32 0
@B = common global [500 x [500 x i32]] zeroinitializer		@B = common global [500 x [500 x i32]] zeroinitializer
@Y = common global i32 0		@Y = common global i32 0

;; global X

;; for( int i=1;i<N;i++)		;; for( int i=1;i<N;i++)
;; for( int j=1;j<N;j++)		;; for( int j=1;j<N;j++)
;; X+=A[j][i];		;; X+=A[j][i];

; CHECK: --- !Missed		;; Loop is interchanged check that the phi nodes are split and the promoted value is used instead of the reduction phi.
		; CHECK: --- !Passed
; CHECK-NEXT: Pass: loop-interchange		; CHECK-NEXT: Pass: loop-interchange
; CHECK-NEXT: Name: UnsupportedPHI		; CHECK-NEXT: Name: Interchanged
; CHECK-NEXT: Function: reduction_01		; CHECK-NEXT: Function: reduction_01

; IR-LABEL: @reduction_01(		; IR-LABEL: @reduction_01(
; IR-NOT: split		; IR: for.body3.split

define void @reduction_01(i32 %N) {		define void @reduction_01(i32 %N) {
entry:		entry:
%cmp16 = icmp sgt i32 %N, 1		%cmp16 = icmp sgt i32 %N, 1
br i1 %cmp16, label %for.body3.lr.ph, label %for.end8		br i1 %cmp16, label %for.body3.lr.ph, label %for.end8

for.body3.lr.ph: ; preds = %for.cond1.for.inc6_crit_edge, %entry		for.body3.lr.ph: ; preds = %for.cond1.for.inc6_crit_edge, %entry
%indvars.iv18 = phi i64 [ %indvars.iv.next19, %for.cond1.for.inc6_crit_edge ], [ 1, %entry ]		%indvars.iv18 = phi i64 [ %indvars.iv.next19, %for.cond1.for.inc6_crit_edge ], [ 1, %entry ]
Show All 18 Lines	for.cond1.for.inc6_crit_edge: ; preds = %for.body3
%lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32		%lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32
%exitcond21 = icmp eq i32 %lftr.wideiv20, %N		%exitcond21 = icmp eq i32 %lftr.wideiv20, %N
br i1 %exitcond21, label %for.end8, label %for.body3.lr.ph		br i1 %exitcond21, label %for.end8, label %for.body3.lr.ph

for.end8: ; preds = %for.cond1.for.inc6_crit_edge, %entry		for.end8: ; preds = %for.cond1.for.inc6_crit_edge, %entry
ret void		ret void
}		}

		;; Test for more than 1 reductions inside a loop.
		;; for( int i=1;i<N;i++)
		;; for( int j=1;j<N;j++)
		;; for( int k=1;k<N;k++) {
		;; X+=A[k][j];
		;; Y+=B[k][i];
		;; }

		; Loop is interchanged check that the phi nodes are split and the promoted value is used instead of the reduction phi.

		; CHECK: --- !Passed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: Interchanged
		; CHECK-NEXT: Function: reduction_02

		; CHECK: --- !Passed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: Interchanged
		; CHECK-NEXT: Function: reduction_02

		; CHECK: --- !Missed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: InterchangeNotProfitable
		; CHECK-NEXT: Function: reduction_02

		; IR-LABEL: @reduction_02(
		; IR: for.body6.split

		define void @reduction_02(i32 %N) {
		entry:
		%cmp34 = icmp sgt i32 %N, 1
		br i1 %cmp34, label %for.cond4.preheader.preheader, label %for.end19

		for.cond4.preheader.preheader: ; preds = %for.inc17, %entry
		%indvars.iv40 = phi i64 [ %indvars.iv.next41, %for.inc17 ], [ 1, %entry ]
		br label %for.body6.lr.ph

		for.body6.lr.ph: ; preds = %for.cond4.for.inc14_crit_edge, %for.cond4.preheader.preheader
		%indvars.iv36 = phi i64 [ %indvars.iv.next37, %for.cond4.for.inc14_crit_edge ], [ 1, %for.cond4.preheader.preheader ]
		%X.promoted = load i32, ptr @X
		%Y.promoted = load i32, ptr @Y
		br label %for.body6

		for.body6: ; preds = %for.body6, %for.body6.lr.ph
		%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
		%add1331 = phi i32 [ %Y.promoted, %for.body6.lr.ph ], [ %add13, %for.body6 ]
		%add30 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
		%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], ptr @A, i64 0, i64 %indvars.iv, i64 %indvars.iv36
		%0 = load i32, ptr %arrayidx8
		%add = add nsw i32 %add30, %0
		%arrayidx12 = getelementptr inbounds [500 x [500 x i32]], ptr @B, i64 0, i64 %indvars.iv, i64 %indvars.iv40
		%1 = load i32, ptr %arrayidx12
		%add13 = add nsw i32 %add1331, %1
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %N
		br i1 %exitcond, label %for.cond4.for.inc14_crit_edge, label %for.body6

		for.cond4.for.inc14_crit_edge: ; preds = %for.body6
		%add.lcssa = phi i32 [ %add, %for.body6 ]
		%add13.lcssa = phi i32 [ %add13, %for.body6 ]
		store i32 %add.lcssa, ptr @X
		store i32 %add13.lcssa, ptr @Y
		%indvars.iv.next37 = add nuw nsw i64 %indvars.iv36, 1
		%lftr.wideiv38 = trunc i64 %indvars.iv.next37 to i32
		%exitcond39 = icmp eq i32 %lftr.wideiv38, %N
		br i1 %exitcond39, label %for.inc17, label %for.body6.lr.ph

		for.inc17: ; preds = %for.cond4.for.inc14_crit_edge
		%add.lcssa.lcssa = phi i32 [ %add.lcssa, %for.cond4.for.inc14_crit_edge ]
		%indvars.iv.next41 = add nuw nsw i64 %indvars.iv40, 1
		%lftr.wideiv42 = trunc i64 %indvars.iv.next41 to i32
		%exitcond43 = icmp eq i32 %lftr.wideiv42, %N
		br i1 %exitcond43, label %for.end19, label %for.cond4.preheader.preheader

		for.end19: ; preds = %for.inc17, %entry
		%res1 = phi i32 [ 0, %entry ], [ %add.lcssa.lcssa, %for.inc17 ]
		store i32 %res1, ptr @X
		ret void
		}

;; Not tightly nested. Do not interchange.		;; Not tightly nested. Do not interchange.
;; for( int i=1;i<N;i++)		;; for( int i=1;i<N;i++)
;; for( int j=1;j<N;j++) {		;; for( int j=1;j<N;j++) {
;; for( int k=1;k<N;k++) {		;; for( int k=1;k<N;k++) {
;; X+=A[k][j];		;; X+=A[k][j];
;; }		;; }
;; Y+=B[j][i];		;; Y+=B[j][i];
;; }		;; }

;; Not tightly nested. Do not interchange.		;; Not tightly nested. Do not interchange.
;; Not interchanged hence the phi's in the inner loop will not be split.		;; Not interchanged hence the phi's in the inner loop will not be split.


		MeinersburUnsubmitted Not Done Reply Inline Actions [nit] unrelated change Meinersbur: [nit] unrelated change
; CHECK: --- !Missed		; CHECK: --- !Missed
; CHECK-NEXT: Pass: loop-interchange		; CHECK-NEXT: Pass: loop-interchange
; CHECK-NEXT: Name: UnsupportedPHIOuter		; CHECK-NEXT: Name: UnsupportedPHIOuter
; CHECK-NEXT: Function: reduction_03		; CHECK-NEXT: Function: reduction_03

		; CHECK: --- !Missed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: UnsupportedPHIInner
		; CHECK-NEXT: Function: reduction_03

; IR-LABEL: @reduction_03(		; IR-LABEL: @reduction_03(
; IR-NOT: split		; IR-NOT: split

define void @reduction_03(i32 %N) {		define void @reduction_03(i32 %N) {
entry:		entry:
%cmp35 = icmp sgt i32 %N, 1		%cmp35 = icmp sgt i32 %N, 1
br i1 %cmp35, label %for.cond4.preheader.lr.ph, label %for.end19		br i1 %cmp35, label %for.cond4.preheader.lr.ph, label %for.end19

for.cond4.preheader.lr.ph: ; preds = %for.cond1.for.inc17_crit_edge, %entry		for.cond4.preheader.lr.ph: ; preds = %for.cond1.for.inc17_crit_edge, %entry
%indvars.iv41 = phi i64 [ %indvars.iv.next42, %for.cond1.for.inc17_crit_edge ], [ 1, %entry ]		%indvars.iv41 = phi i64 [ %indvars.iv.next42, %for.cond1.for.inc17_crit_edge ], [ 1, %entry ]
%Y.promoted = load i32, ptr @Y		%Y.promoted = load i32, ptr @Y
br label %for.body6.lr.ph		br label %for.body6.lr.ph

for.body6.lr.ph: ; preds = %for.cond4.for.end_crit_edge, %for.cond4.preheader.lr.ph		for.body6.lr.ph: ; preds = %for.cond4.for.end_crit_edge, %for.cond4.preheader.lr.ph
%indvars.iv37 = phi i64 [ 1, %for.cond4.preheader.lr.ph ], [ %indvars.iv.next38, %for.cond4.for.end_crit_edge ]		%indvars.iv37 = phi i64 [ 1, %for.cond4.preheader.lr.ph ], [ %indvars.iv.next38, %for.cond4.for.end_crit_edge ]
%add1334 = phi i32 [ %Y.promoted, %for.cond4.preheader.lr.ph ], [ %add13, %for.cond4.for.end_crit_edge ]		%add1334 = phi i32 [ %Y.promoted, %for.cond4.preheader.lr.ph ], [ %add13, %for.cond4.for.end_crit_edge ]
%X.promoted = load i32, ptr @X		%X.promoted = load i32, ptr @X
br label %for.body6		br label %for.body6

for.body6: ; preds = %for.body6, %for.body6.lr.ph		for.body6: ; preds = %for.body6, %for.body6.lr.ph
%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]		%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
		%add31 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], ptr @A, i64 0, i64 %indvars.iv, i64 %indvars.iv37		%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], ptr @A, i64 0, i64 %indvars.iv, i64 %indvars.iv37
%0 = load i32, ptr %arrayidx8		%0 = load i32, ptr %arrayidx8
		%add = add nsw i32 %add31, %0
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%lftr.wideiv = trunc i64 %indvars.iv.next to i32		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
%exitcond = icmp eq i32 %lftr.wideiv, %N		%exitcond = icmp eq i32 %lftr.wideiv, %N
br i1 %exitcond, label %for.cond4.for.end_crit_edge, label %for.body6		br i1 %exitcond, label %for.cond4.for.end_crit_edge, label %for.body6

for.cond4.for.end_crit_edge: ; preds = %for.body6		for.cond4.for.end_crit_edge: ; preds = %for.body6
		%add.lcssa = phi i32 [ %add, %for.body6 ]
		store i32 %add.lcssa, ptr @X
%arrayidx12 = getelementptr inbounds [500 x [500 x i32]], ptr @B, i64 0, i64 %indvars.iv37, i64 %indvars.iv41		%arrayidx12 = getelementptr inbounds [500 x [500 x i32]], ptr @B, i64 0, i64 %indvars.iv37, i64 %indvars.iv41
%1 = load i32, ptr %arrayidx12		%1 = load i32, ptr %arrayidx12
%add13 = add nsw i32 %add1334, %1		%add13 = add nsw i32 %add1334, %1
%indvars.iv.next38 = add nuw nsw i64 %indvars.iv37, 1		%indvars.iv.next38 = add nuw nsw i64 %indvars.iv37, 1
%lftr.wideiv39 = trunc i64 %indvars.iv.next38 to i32		%lftr.wideiv39 = trunc i64 %indvars.iv.next38 to i32
%exitcond40 = icmp eq i32 %lftr.wideiv39, %N		%exitcond40 = icmp eq i32 %lftr.wideiv39, %N
br i1 %exitcond40, label %for.cond1.for.inc17_crit_edge, label %for.body6.lr.ph		br i1 %exitcond40, label %for.cond1.for.inc17_crit_edge, label %for.body6.lr.ph

for.cond1.for.inc17_crit_edge: ; preds = %for.cond4.for.end_crit_edge		for.cond1.for.inc17_crit_edge: ; preds = %for.cond4.for.end_crit_edge
%add13.lcssa = phi i32 [ %add13, %for.cond4.for.end_crit_edge ]		%add13.lcssa = phi i32 [ %add13, %for.cond4.for.end_crit_edge ]
store i32 %add13.lcssa, ptr @Y		store i32 %add13.lcssa, ptr @Y
%indvars.iv.next42 = add nuw nsw i64 %indvars.iv41, 1		%indvars.iv.next42 = add nuw nsw i64 %indvars.iv41, 1
%lftr.wideiv43 = trunc i64 %indvars.iv.next42 to i32		%lftr.wideiv43 = trunc i64 %indvars.iv.next42 to i32
%exitcond44 = icmp eq i32 %lftr.wideiv43, %N		%exitcond44 = icmp eq i32 %lftr.wideiv43, %N
br i1 %exitcond44, label %for.end19, label %for.cond4.preheader.lr.ph		br i1 %exitcond44, label %for.end19, label %for.cond4.preheader.lr.ph

for.end19: ; preds = %for.cond1.for.inc17_crit_edge, %entry		for.end19: ; preds = %for.cond1.for.inc17_crit_edge, %entry
ret void		ret void
}		}

		;; Multiple use of reduction are allowed.
		;; for( int i=1;i<N;i++)
		;; for( int j=1;j<N;j++)
		;; for( int k=1;k<N;k++) {
		;; X+=A[k][j];
		;; Y+=X;
		;; }

		; Loop is interchanged check that the phi nodes are split and the promoted value is used instead of the reduction phi.

		; CHECK: --- !Passed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: Interchanged
		; CHECK-NEXT: Function: reduction_04

		; CHECK: --- !Passed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: Interchanged
		; CHECK-NEXT: Function: reduction_04

		; CHECK: --- !Passed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: Interchanged
		; CHECK-NEXT: Function: reduction_04

		; IR-LABEL: @reduction_04(
		; IR: for.body6.split

		define void @reduction_04(i32 %N) {
		entry:
		%cmp28 = icmp sgt i32 %N, 1
		br i1 %cmp28, label %for.cond4.preheader.preheader, label %for.end15

		for.cond4.preheader.preheader: ; preds = %for.inc13, %entry
		%i.029 = phi i32 [ %inc14, %for.inc13 ], [ 1, %entry ]
		br label %for.body6.lr.ph

		for.body6.lr.ph: ; preds = %for.cond4.for.inc10_crit_edge, %for.cond4.preheader.preheader
		%indvars.iv30 = phi i64 [ %indvars.iv.next31, %for.cond4.for.inc10_crit_edge ], [ 1, %for.cond4.preheader.preheader ]
		%X.promoted = load i32, ptr @X
		%Y.promoted = load i32, ptr @Y
		br label %for.body6

		for.body6: ; preds = %for.body6, %for.body6.lr.ph
		%indvars.iv = phi i64 [ 1, %for.body6.lr.ph ], [ %indvars.iv.next, %for.body6 ]
		%add925 = phi i32 [ %Y.promoted, %for.body6.lr.ph ], [ %add9, %for.body6 ]
		%add24 = phi i32 [ %X.promoted, %for.body6.lr.ph ], [ %add, %for.body6 ]
		%arrayidx8 = getelementptr inbounds [500 x [500 x i32]], ptr @A, i64 0, i64 %indvars.iv, i64 %indvars.iv30
		%0 = load i32, ptr %arrayidx8
		%add = add nsw i32 %add24, %0
		%add9 = add nsw i32 %add925, %add
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %N
		br i1 %exitcond, label %for.cond4.for.inc10_crit_edge, label %for.body6

		for.cond4.for.inc10_crit_edge: ; preds = %for.body6
		%add.lcssa = phi i32 [ %add, %for.body6 ]
		%add9.lcssa = phi i32 [ %add9, %for.body6 ]
		store i32 %add.lcssa, ptr @X
		store i32 %add9.lcssa, ptr @Y
		%indvars.iv.next31 = add nuw nsw i64 %indvars.iv30, 1
		%lftr.wideiv32 = trunc i64 %indvars.iv.next31 to i32
		%exitcond33 = icmp eq i32 %lftr.wideiv32, %N
		br i1 %exitcond33, label %for.inc13, label %for.body6.lr.ph

		for.inc13: ; preds = %for.cond4.for.inc10_crit_edge
		%inc14 = add nuw nsw i32 %i.029, 1
		%exitcond34 = icmp eq i32 %inc14, %N
		br i1 %exitcond34, label %for.end15, label %for.cond4.preheader.preheader

		for.end15: ; preds = %for.inc13, %entry
		ret void
		}

		;; for( int i=1;i<N;i++)
		;; for( int j=1;j<N;j++)
		;; X+=A[j][i];
		;; Y = X

		; CHECK: --- !Passed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: Interchanged
		; CHECK-NEXT: Function: reduction_05

		; IR-LABEL: @reduction_05(
		; IR: for.body7.split

		define void @reduction_05(i32 %N) {
		entry:
		%cmp16 = icmp sgt i32 %N, 1
		br i1 %cmp16, label %for.body7.lr.ph, label %for.end8

		for.body7.lr.ph: ; preds = %for.cond1.for.inc6_crit_edge, %entry
		%indvars.iv18 = phi i64 [ %indvars.iv.next19, %for.cond1.for.inc6_crit_edge ], [ 1, %entry ]
		%X.promoted = load i32, ptr @X
		br label %for.body7

		for.body7: ; preds = %for.body7, %for.body7.lr.ph
		%indvars.iv = phi i64 [ 1, %for.body7.lr.ph ], [ %indvars.iv.next, %for.body7 ]
		%add15 = phi i32 [ %X.promoted, %for.body7.lr.ph ], [ %add, %for.body7 ]
		%arrayidx5 = getelementptr inbounds [500 x [500 x i32]], ptr @A, i64 0, i64 %indvars.iv, i64 %indvars.iv18
		%0 = load i32, ptr %arrayidx5
		%add = add nsw i32 %add15, %0
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %N
		br i1 %exitcond, label %for.cond1.for.inc6_crit_edge, label %for.body7

		for.cond1.for.inc6_crit_edge: ; preds = %for.body7
		%add.lcssa = phi i32 [ %add, %for.body7 ]
		store i32 %add.lcssa, ptr @X
		%indvars.iv.next19 = add nuw nsw i64 %indvars.iv18, 1
		%lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32
		%exitcond21 = icmp eq i32 %lftr.wideiv20, %N
		br i1 %exitcond21, label %for.end8, label %for.body7.lr.ph

		for.end8: ; preds = %for.cond1.for.inc6_crit_edge, %entry
		%add.res = phi i32 [ %add.lcssa, %for.cond1.for.inc6_crit_edge ], [ 0, %entry ]
		store i32 %add.res, ptr @Y
		ret void
		}

		; Reduction result is stored to a different location than the corresponding load.
		; CHECK: --- !Missed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: UnsupportedPHIInner
		; CHECK-NEXT: Function: reduction_06

		; IR-LABEL: @reduction_06(
		; IR-NOT: split

		define void @reduction_06(i32 %N) {
		entry:
		%cmp16 = icmp sgt i32 %N, 1
		br i1 %cmp16, label %for.body7.lr.ph, label %for.end8

		for.body7.lr.ph: ; preds = %for.cond1.for.inc6_crit_edge, %entry
		%indvars.iv18 = phi i64 [ %indvars.iv.next19, %for.cond1.for.inc6_crit_edge ], [ 1, %entry ]
		%X.promoted = load i32, ptr @X
		br label %for.body7

		for.body7: ; preds = %for.body7, %for.body7.lr.ph
		%indvars.iv = phi i64 [ 1, %for.body7.lr.ph ], [ %indvars.iv.next, %for.body7 ]
		%add15 = phi i32 [ %X.promoted, %for.body7.lr.ph ], [ %add, %for.body7 ]
		%arrayidx5 = getelementptr inbounds [500 x [500 x i32]], ptr @A, i64 0, i64 %indvars.iv, i64 %indvars.iv18
		%0 = load i32, ptr %arrayidx5
		%add = add nsw i32 %add15, %0
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %N
		br i1 %exitcond, label %for.cond1.for.inc6_crit_edge, label %for.body7

		for.cond1.for.inc6_crit_edge: ; preds = %for.body7
		%add.lcssa = phi i32 [ %add, %for.body7 ]
		store i32 %add.lcssa, ptr @Y
		%indvars.iv.next19 = add nuw nsw i64 %indvars.iv18, 1
		%lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32
		%exitcond21 = icmp eq i32 %lftr.wideiv20, %N
		br i1 %exitcond21, label %for.end8, label %for.body7.lr.ph

		for.end8: ; preds = %for.cond1.for.inc6_crit_edge, %entry
		%add.res = phi i32 [ %add.lcssa, %for.cond1.for.inc6_crit_edge ], [ 0, %entry ]
		store i32 %add.res, ptr @Y
		ret void
		}

		; No promoted load as incoming value for the reduction in the inner reduction.
		; CHECK: --- !Missed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: UnsupportedPHIInner
		; CHECK-NEXT: Function: reduction_07

		; IR-LABEL: @reduction_07(
		; IR-NOT: split

		define void @reduction_07(i32 %N) {
		entry:
		%cmp16 = icmp sgt i32 %N, 1
		br i1 %cmp16, label %for.body7.lr.ph, label %for.end8

		for.body7.lr.ph: ; preds = %for.cond1.for.inc6_crit_edge, %entry
		%indvars.iv18 = phi i64 [ %indvars.iv.next19, %for.cond1.for.inc6_crit_edge ], [ 1, %entry ]
		br label %for.body7

		for.body7: ; preds = %for.body7, %for.body7.lr.ph
		%indvars.iv = phi i64 [ 1, %for.body7.lr.ph ], [ %indvars.iv.next, %for.body7 ]
		%add15 = phi i32 [ 0, %for.body7.lr.ph ], [ %add, %for.body7 ]
		%arrayidx5 = getelementptr inbounds [500 x [500 x i32]], ptr @A, i64 0, i64 %indvars.iv, i64 %indvars.iv18
		%0 = load i32, ptr %arrayidx5
		%add = add nsw i32 %add15, %0
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %N
		br i1 %exitcond, label %for.cond1.for.inc6_crit_edge, label %for.body7

		for.cond1.for.inc6_crit_edge: ; preds = %for.body7
		%add.lcssa = phi i32 [ %add, %for.body7 ]
		store i32 %add.lcssa, ptr @Y
		%indvars.iv.next19 = add nuw nsw i64 %indvars.iv18, 1
		%lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32
		%exitcond21 = icmp eq i32 %lftr.wideiv20, %N
		br i1 %exitcond21, label %for.end8, label %for.body7.lr.ph

		for.end8: ; preds = %for.cond1.for.inc6_crit_edge, %entry
		%add.res = phi i32 [ %add.lcssa, %for.cond1.for.inc6_crit_edge ], [ 0, %entry ]
		store i32 %add.res, ptr @Y
		ret void
		}

		; No promoted store for the reduction result in the inner loop exit.
		; CHECK: --- !Missed
		; CHECK-NEXT: Pass: loop-interchange
		; CHECK-NEXT: Name: UnsupportedPHIInner
		; CHECK-NEXT: Function: reduction_08

		; IR-LABEL: @reduction_08(
		; IR-NOT: split
		; IR: ret void

		define void @reduction_08(i32 %N) {
		entry:
		%cmp16 = icmp sgt i32 %N, 1
		br i1 %cmp16, label %for.body7.lr.ph, label %for.end8

		for.body7.lr.ph: ; preds = %for.cond1.for.inc6_crit_edge, %entry
		%indvars.iv18 = phi i64 [ %indvars.iv.next19, %for.cond1.for.inc6_crit_edge ], [ 1, %entry ]
		%X.promoted = load i32, ptr @X
		br label %for.body7

		for.body7: ; preds = %for.body7, %for.body7.lr.ph
		%indvars.iv = phi i64 [ 1, %for.body7.lr.ph ], [ %indvars.iv.next, %for.body7 ]
		%add15 = phi i32 [ %X.promoted, %for.body7.lr.ph ], [ %add, %for.body7 ]
		%arrayidx5 = getelementptr inbounds [500 x [500 x i32]], ptr @A, i64 0, i64 %indvars.iv, i64 %indvars.iv18
		%0 = load i32, ptr %arrayidx5
		%add = add nsw i32 %add15, %0
		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
		%lftr.wideiv = trunc i64 %indvars.iv.next to i32
		%exitcond = icmp eq i32 %lftr.wideiv, %N
		br i1 %exitcond, label %for.cond1.for.inc6_crit_edge, label %for.body7

		for.cond1.for.inc6_crit_edge: ; preds = %for.body7
		%add.lcssa = phi i32 [ %add, %for.body7 ]
		%indvars.iv.next19 = add nuw nsw i64 %indvars.iv18, 1
		%lftr.wideiv20 = trunc i64 %indvars.iv.next19 to i32
		%exitcond21 = icmp eq i32 %lftr.wideiv20, %N
		br i1 %exitcond21, label %for.end8, label %for.body7.lr.ph

		for.end8: ; preds = %for.cond1.for.inc6_crit_edge, %entry
		%add.res = phi i32 [ %add.lcssa, %for.cond1.for.inc6_crit_edge ], [ 0, %entry ]
		store i32 %add.res, ptr @Y
		ret void
		}

This is an archive of the discontinued LLVM Phabricator instance.

[Loop-Interchange] Allow inner-loop only reductionsNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 539703

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll

[Loop-Interchange] Allow inner-loop only reductions
Needs ReviewPublic