This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
23/23
LoopInterchange.cpp
-
test/Transforms/LoopInterchange/
-
Transforms/
-
LoopInterchange/
-
loop-interchange-optimization-remarks.ll
1/1
perserve-lcssa.ll
-
pr57148.ll
3/3
profitability.ll

Differential D135808

[LoopInterchange] Correcting the profitability check
ClosedPublic

Authored by ram-NK on Oct 12 2022, 12:40 PM.

Download Raw Diff

Details

Reviewers

fhahn
congzhe
Meinersbur
bmahjour

Group Reviewers

Restricted Project

Commits

rGee7188c8b2ab: [LoopInterchange] Correcting the profitability check

Summary

If inner loop is loop independent or doesn't carry any dependency then, loop interchange is not profitable. Also if outer loop is not loop independent then, loop interchange is not profitable. if inner loop has dependence and outer loop is loop independent then, it is profitable to interchange to enable inner loop parallelism. Corrected the dependency checking inside isProfitableForVectorization(). Also addresses the endless interchange problem. If Cache analysis could decide the loop order then, isProfitable will not invoke isProfitableForVectorization for decidinding better loop interchange.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ram-NK created this revision.Oct 12 2022, 12:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 12:40 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

ram-NK requested review of this revision.Oct 12 2022, 12:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2022, 12:40 PM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

ram-NK edited the summary of this revision. (Show Details)Oct 12 2022, 12:51 PM

ram-NK added a reviewer: bmahjour.

Harbormaster completed remote builds in B191800: Diff 467235.Oct 12 2022, 1:33 PM

Before this patch, If outer loop dependency direction "=" and Inner loop dependency direction is "S" and "I" then, loop interchange is considered as profitable. Only two cases of dependency is profitable. But for vectorization, ">" and "<" dependency in outer loop is more profitable when interchanged. After patch [=,<] and [=,>] will be interchanged for vectorization.

bmahjour added a project: Restricted Project.Oct 17 2022, 6:10 AM

The proper profitability analysis for loop interchange uses CacheCost, although there are cases where it may be unable to analyze certain accesses (eg due to delinearization issues). It would be good to understand why we don't go through the CacheCost path in your use case.

The legacy heuristic doesn't look right to me, although your changes make the logic more aligned with the comments in the code.

llvm/test/Transforms/LoopInterchange/pr43797-lcssa-for-multiple-outer-loop-blocks.ll
11 ↗	(On Diff #467235)	I'm worried about losing functional coverage by avoiding interchange here due to profitability. Can you play with `-loop-interchange-threshold` or make slight changes to the memory accesses to make the test case profitable?

In D135808#3861367, @ram-NK wrote:

If outer loop dependency direction "=" and Inner loop dependency direction is "S" and "I" then, loop interchange is considered as profitable. Only two cases of dependency is profitable. But for vectorization, ">" and "<" dependency in outer loop is more profitable when interchanged. After patch [=,<] and [=,>] will be interchanged for vectorization.

I thought what you meant is that after this patch, [<, =] and [>, =] (not [=,<] and [=,>]) will be interchanged? Because after interchange the dependency vector would become [=, <] and [=, >] respectively, which could improve potential parallelization and enable finer-grained parallelism, i.e., outer loop parallelism instead of inner loop parallelism. I think this is what isProfitableForVectorization() is supposed to be.

I wonder if it makes sense to you @bmahjour ?

In D135808#3862874, @congzhe wrote:

In D135808#3861367, @ram-NK wrote:

If outer loop dependency direction "=" and Inner loop dependency direction is "S" and "I" then, loop interchange is considered as profitable. Only two cases of dependency is profitable. But for vectorization, ">" and "<" dependency in outer loop is more profitable when interchanged. After patch [=,<] and [=,>] will be interchanged for vectorization.

I thought what you meant is that after this patch, [<, =] and [>, =] (not [=,<] and [=,>]) will be interchanged? Because after interchange the dependency vector would become [=, <] and [=, >] respectively, which could improve potential parallelization and enable finer-grained parallelism, i.e., outer loop parallelism instead of inner loop parallelism. I think this is what isProfitableForVectorization() is supposed to be.

I wonder if it makes sense to you @bmahjour ?

I think profitability determination solely based on dependency matrix is fundamentally flawed. One obvious example is this

void f1() {
  for (int i = 0; i < n; i++)
    for (int j = 0; j < m; j++) {
      B[j][i] = A[j][i];
    }  
}

Assuming A and B don't alias, there are no dependencies, nevertheless interchange is profitable.

Another example is:

void f2() {
  // > =
  for (int i = 0; i < n; i++)
    for (int j = 0; j < m; j++) {
      ... = A[i][j];
      A[i-1][j] = ...;
    }
}

The dependence is carried by the outer loop, yet it's not profitable to interchange (since it would make both locality and parallelism worse).

The following example is profitable to interchange, but won't be recognized as profitable after this patch:

void f3() {
  // = =
  for (int i = 0; i < n; i++)
    for (int j = 0; j < m; j++) {
      ... = A[j][i];
      A[j][i] = ...;
    }
}

I'm worried about losing functional coverage by avoiding interchange here due to profitability. Can you play with -loop-interchange-threshold or make slight changes to the memory accesses to make the test case profitable?

Reverted the changes in pr43797-lcssa-for-multiple-outer-loop-blocks.ll. In this case, CostMap is empty and calculated legacy cost is 0. For retaining the loop interchange functionality, lowered the loop interchange threshold to -1000 (added -loop-interchange-threshold=-1000).

Harbormaster completed remote builds in B192793: Diff 468607.Oct 18 2022, 12:05 PM

In D135808#3865187, @bmahjour wrote:
In D135808#3862874, @congzhe wrote:

In D135808#3861367, @ram-NK wrote:

If outer loop dependency direction "=" and Inner loop dependency direction is "S" and "I" then, loop interchange is considered as profitable. Only two cases of dependency is profitable. But for vectorization, ">" and "<" dependency in outer loop is more profitable when interchanged. After patch [=,<] and [=,>] will be interchanged for vectorization.

I thought what you meant is that after this patch, [<, =] and [>, =] (not [=,<] and [=,>]) will be interchanged? Because after interchange the dependency vector would become [=, <] and [=, >] respectively, which could improve potential parallelization and enable finer-grained parallelism, i.e., outer loop parallelism instead of inner loop parallelism. I think this is what isProfitableForVectorization() is supposed to be.

I wonder if it makes sense to you @bmahjour ?

I think profitability determination solely based on dependency matrix is fundamentally flawed. One obvious example is this
void f1() {
  for (int i = 0; i < n; i++)
    for (int j = 0; j < m; j++) {
      B[j][i] = A[j][i];
    }  
}
Assuming A and B don't alias, there are no dependencies, nevertheless interchange is profitable.

Another example is:
void f2() {
  // > =
  for (int i = 0; i < n; i++)
    for (int j = 0; j < m; j++) {
      ... = A[i][j];
      A[i-1][j] = ...;
    }
}
The dependence is carried by the outer loop, yet it's not profitable to interchange (since it would make both locality and parallelism worse).

The following example is profitable to interchange, but won't be recognized as profitable after this patch:
void f3() {
  // = =
  for (int i = 0; i < n; i++)
    for (int j = 0; j < m; j++) {
      ... = A[j][i];
      A[j][i] = ...;
    }
}

In loop interchange, profitablility is determined first by the cost model and then by isProfitableForVectorization(). Loops in f1() and f3() will be interchanged by CacheCost checking. In profitability checking of loop interchange, CacheCost analysis is done first. So this patch will not affect the f1() and f3() cases.

For f2() case, CacheCost and Legacy cost model will not show profitability. If loops in f2() is loop interchanged then, Outer loop parallelization is possible, which could improve parallelism.

Could you add a test case that is not considered profitable where before it was not?

In D135808#3865187, @bmahjour wrote:

I think profitability determination solely based on dependency matrix is fundamentally flawed.

I understand the idea is that when the locality model says that interchanged and non-interchanged have the same profitability according to cache locality analysis, only then we fall back to a heuristic that determines whether maybe the innermost loop of the interchange loop nest can be vectorized when the one before vectorization could not.

Unfortunately LoopInterchangeProfitability::isProfitable is not structured like that. It falls back to isProfitableForVectorization whenever cache analysis does not want to interchange the loops. This includes the case where the old loop nest has better locality than the interchanged would have. This could result in a endless application of LoopInterchange if we would run it until no more change is to be made:

Cache analysis thinks non-interchanged nest has more locality => fall back to isProfitableForVectorization which returns true => do the interchange
Cache analysis thinks the interchange has more locality => do the interchange (with two interchanges we arrive at the original loop)
Cache analysis thinks non-interchanged nest has more locality => fall back to isProfitableForVectorization which returns true => do the interchange again
...

In D135808#3862874, @congzhe wrote:

I thought what you meant is that after this patch, [<, =] and [>, =] (not [=,<] and [=,>]) will be interchanged? Because after interchange the dependency vector would become [=, <] and [=, >] respectively, which could improve potential parallelization and enable finer-grained parallelism, i.e., outer loop parallelism instead of inner loop parallelism. I think this is what isProfitableForVectorization() is supposed to be.

The current LoopVectorize pass only supports innermost loops, hence you would want the dependencies carried by the outer loops so the LoopVectorize pass does not have to consider them.

Btw, why does LopInterchange use a DepMatrix when the dependencies for all instructions in the loop could be just summarized in a single vector?

Meinersbur added inline comments.Oct 19 2022, 12:20 AM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1100–1101	Does you patch cover this TODO?
1100–1101	Shouldn't the condition on `Row[InnerLoopId]` and `Row[OuterLoopId]` be exact opposite? That is, it is profitable if the innermost loop has loop-carried dependencies while the outer has not?
1125–1137	Should this only be considered if `InnerLoopId` is actually an innermost loop (The only kind LoopVectorize can currently process)?

ram-NK updated this revision to Diff 469339.Oct 20 2022, 1:47 PM

ram-NK marked an inline comment as done.

ram-NK edited the summary of this revision. (Show Details)

ram-NK marked 2 inline comments as done.Oct 20 2022, 2:01 PM

ram-NK added inline comments.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1100–1101	Corrected the dependency checking and corrected the comments.
1100–1101	Corrected the dependency check. If inner loop has loop carried dependency and outer loop is loop independent then, loop interchange is considered as profitable for vectorization.

ram-NK marked 2 inline comments as done.Oct 20 2022, 2:04 PM

ram-NK added inline comments.

llvm/test/Transforms/LoopInterchange/pr43797-lcssa-for-multiple-outer-loop-blocks.ll
11 ↗	(On Diff #467235)	These changes are not needed after correcting the dependency check in loop interchange.

Following changes are made as per the comments. If inner loop is loop independent or doesn't carry any dependency then, loop interchange is not considered as profitable. Also if outer loop is not loop independent then, loop interchange is not considered as profitable. If inner loop has dependence and outer loop is loop independent then, it is profitable to interchange to enable inner loop parallelism.

Harbormaster completed remote builds in B193318: Diff 469339.Oct 20 2022, 2:43 PM

ram-NK edited the summary of this revision. (Show Details)Oct 20 2022, 3:12 PM

ram-NK updated this revision to Diff 469613.Oct 21 2022, 8:05 AM

Harbormaster completed remote builds in B193525: Diff 469613.Oct 21 2022, 8:44 AM

@Meinersbur and @bmahjour, Could you get the chance to review the code I modified as per your comment?

This looks better now, but the problem of "endless interchange" is still not addressed. Per discussion in the loop opt call, we should only use CacheCost when it was able to calculate a valid cost and the loops can be sorted based on "strictly less than" ordering relationship. Only when CacheCost result is indeterminate (eg. two loops have equal cost) or when it is unable to calculate the cost (eg. due to delinearization issues), we should fall back to the isProfitableForVectorization() function.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1105	not not -> not

This looks better now, but the problem of "endless interchange" is still not addressed. Per discussion in the loop opt call, we should only use CacheCost when it was able to calculate a valid cost and the loops can be sorted based on "strictly less than" ordering relationship. Only when CacheCost result is indeterminate (eg. two loops have equal cost) or when it is unable to calculate the cost (eg. due to delinearization issues), we should fall back to the isProfitableForVectorization() function.

Corrected the "endless interchange" possibility. Now isProfitableForVectorization() only invokes when two loops have equal cost or it is unable to calculate the cost.

ram-NK marked an inline comment as done.Nov 7 2022, 9:31 AM

ram-NK added inline comments.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1105	corrected

Harbormaster completed remote builds in B196518: Diff 473714.Nov 7 2022, 10:50 AM

congzhe added inline comments.Nov 7 2022, 9:03 PM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1141

ram-NK updated this revision to Diff 474010.Nov 8 2022, 7:54 AM

ram-NK marked an inline comment as done.

@congzhe and @bmahjour comments are addressed.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1141	Corrected the condition. any of the loop is failed to determine the loop nest and equal locality then only checks the profit of vectorization.

Harbormaster completed remote builds in B196719: Diff 474010.Nov 8 2022, 9:02 AM

Macro testcase:

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1138–1139	Consider updating this message. Suggestion: "Interchanging loops not considered to improve cache locality nor vectorization."
1141	This is exactly the else branch of the cache analysis logic. It does not consider the fallback `Cost < -LoopInterchangeCostThreshold` legacy cost model. Please avoid the code duplication and needing to loop up the `CostMap` again.

Meinersbur added inline comments.Nov 16 2022, 9:37 AM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

1115–1121

The structure suggested in the LoopWG call.

The more general pattern/refactoring would be:

std::optional<bool> shouldInterchange = isProfitableAccordingLoopCacheAnalysis(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

shouldInterchange = isProfitableAccordingInstrOrderCost(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

shouldInterchange = isProfitableForVectorization(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

emitOptimizationRemark("Don't know");

However, this changes when the emitOptimizationRemark is called. If we would not want to change this, it would be (which corresponds to the current structure but with refactoring):

std::optional<bool> shouldInterchange = isProfitableAccordingLoopCacheAnalysis(..);
if (!shouldInterchange.has_value()) {
  shouldInterchange = isProfitableAccordingInstrOrderCost(..);
  if (!shouldInterchange.has_value()) {
     shouldInterchange = isProfitableForVectorization(..);
  }
}

if (!shouldInterchange.has_value())
  emitOptimizationRemark("Don't know");
else if (!shouldInterchange.get_value())
  emitOptimizationRemark("Profitability heuristic indicates this loop is good as-is");

although I would prefer the former over the nested if-else chain and instead emit different optimization remarks for each of the heuristic that indicates that/why the loops should (NOT) be interchanged.

@Meinersbur, I will update these comments and testcase accordingly.

As per the Michael 's comments. Code inside isProfitable function is cleaned.

Added test case. With two loop interchange passes make sure that, there is no endless interchange. Before this patch there is endless loop interchange possibility. There is no endless interchange after this patch.

@Meinersbur comments are addressed.

I would suggest to revise the summary and/or the title to describe that this patch now also addresses the endless interchange problem, in addition to correction in isProfitableForVectorization().

llvm/test/Transforms/LoopInterchange/profitability.ll
179	I would suggest to apply `--check-prefix=PROFIT` only to `interchange_05()` and not to other existing tests. Because this opt command line runs interchange twice and shows that there would have been endless interchange with `interchange_05()` before this patch but there is no endless interchange after this patch, which applies only to `interchange_05()` and is not related to other tests.

congzhe added inline comments.Dec 1 2022, 8:13 AM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1133	Is it possible to make `LoopInterchangeCostMeaningfulnessThreshold` an opt flag so we can assign its value more flexibly? Possibly rename it to `LegacyCostModelThreshold` or whatever is more appropriate. static cl::opt<int> LegacyCostModelThreshold( "legacy-cost-threshold", cl::init(10), cl::Hidden, cl::desc("The threshold for the legacy cost model to be considered."));

@congzhe Corrected as per the comments.

Harbormaster completed remote builds in B200555: Diff 479329.Dec 1 2022, 12:33 PM

Added comments in Test case 05 for better understanding.
Corrected the comment in Test case 01. The loop interchange decision is made from the Cache cost analysis and is not from vectorization.

Harbormaster completed remote builds in B200769: Diff 479630.Dec 2 2022, 8:27 AM

ram-NK updated this revision to Diff 479744.Dec 2 2022, 1:55 PM

Harbormaster completed remote builds in B200849: Diff 479744.Dec 2 2022, 4:45 PM

ram-NK updated this revision to Diff 482045.Dec 12 2022, 2:43 AM

Harbormaster completed remote builds in B202527: Diff 482045.Dec 12 2022, 3:25 AM

congzhe added inline comments.Dec 12 2022, 3:28 PM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1101–1104	nit: `isProfitableAccordingToLoopCacheAnalysis`, or `isProfitablePerLoopCacheAnalysis`
1126–1137	nit: `isProfitableAccordingToInstrOrderCost`, or `isProfitablePerInstrOrderCost`
1168	It could be worth adding some comments for this function that describe what it does now, and how it prevents endless interchange from happening.
llvm/test/Transforms/LoopInterchange/profitability.ll
175	nit: `may leads`->`may lead` `before`-> `before D135808` `now`-> `after D135808`

@congzhe, All comments are addressed.

Harbormaster completed remote builds in B202992: Diff 482681.Dec 13 2022, 7:08 PM

congzhe added a reviewer: Restricted Project.Dec 13 2022, 8:13 PM

Matt added a subscriber: Matt.Dec 14 2022, 8:32 AM

bmahjour added inline comments.Dec 15 2022, 12:29 PM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
64	I don't see why we need two options to control the legacy cost model.
1117	We should return `nullopt` when/if `InnerIndex == OuterIndex`
1146	since the idea is to call this function only when CC has failed or been indecisive, we don't need to pass CC to this function and check it here.
llvm/test/Transforms/LoopInterchange/profitability.ll
171–176	reword: This tests to make sure, that multiple invocations of loop interchange will not undo previous interchange and will converge to a particular order determined by the profitability analysis.

@bmahjour I will update the comments accordingly.

ram-NK added inline comments.Dec 21 2022, 11:30 AM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
64	This new parameter added as per @Meinersbur comments in Inline. This is considered as the upper limit of InstrOrderCost.
1117	The CostMap is assigned with unique number to each loop. So this condition "InnerIndex == OuterIndex" will not satisfy. But LoopCost may be same, CC->getLoopCost(InnerLoop) == CC->getLoopCost(OuterLoop) checking can be added for returning nullopt.

ram-NK updated this revision to Diff 485617.Dec 29 2022, 8:07 AM

perserve-lcssa.ll corrected, Instruction order cost is 0, Not profitable to interchange.
loop-interchange-optimization-remarks.ll remark message corrected.
All comments are updated.

Harbormaster completed remote builds in B205161: Diff 485617.Dec 29 2022, 8:55 AM

corrected the code formating

Harbormaster completed remote builds in B205223: Diff 485706.Dec 30 2022, 10:20 AM

Hi @bmahjour, Changes are made as per your comments. Could you get the time to review it.
Hi @Meinersbur, Some changes are done over your suggested comments (https://reviews.llvm.org/D135808?id=474010#inline-1332682)

LGTM with some minor comments.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1117	ah, ok...in that case please add `assert(InnerIndex != OuterIndex && "CostMap should assign unique numbers to each loop")`
1151	ideally this should be `if (Row[OuterLoopId] != '=' && Row[OuterLoopId] != 'I')`

This revision is now accepted and ready to land.Jan 10 2023, 2:34 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 10 2023, 2:34 PM

LGTM.

Sorry I didn't find time to review it.

llvm/test/Transforms/LoopInterchange/perserve-lcssa.ll
172	The test fails for me with the latest version because of opaque pointers emitted by opt. Regeneration of the test should fix this.

All comments are updated.

Harbormaster completed remote builds in B207535: Diff 488854.Jan 12 2023, 9:29 PM

This revision was landed with ongoing or failed builds.Jan 16 2023, 11:36 AM

Closed by commit rGee7188c8b2ab: [LoopInterchange] Correcting the profitability check (authored by ram-NK, committed by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rGee7188c8b2ab: [LoopInterchange] Correcting the profitability check.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopInterchange.cpp

145 lines

test/

Transforms/

LoopInterchange/

loop-interchange-optimization-remarks.ll

2 lines

perserve-lcssa.ll

35 lines

pr57148.ll

2 lines

profitability.ll

63 lines

Diff 489614

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

#define DEBUG_TYPE "loop-interchange" #define DEBUG_TYPE "loop-interchange"

STATISTIC(LoopsInterchanged, "Number of loops interchanged"); STATISTIC(LoopsInterchanged, "Number of loops interchanged");

static cl::opt<int> LoopInterchangeCostThreshold( static cl::opt<int> LoopInterchangeCostThreshold(

"loop-interchange-threshold", cl::init(0), cl::Hidden, "loop-interchange-threshold", cl::init(0), cl::Hidden,

cl::desc("Interchange if you gain more than this number")); cl::desc("Interchange if you gain more than this number"));

namespace { namespace {

bmahjourUnsubmitted

Done

I don't see why we need two options to control the legacy cost model.

bmahjour: I don't see why we need two options to control the legacy cost model.

ram-NKAuthorUnsubmitted

Done

This new parameter added as per @Meinersbur comments in Inline. This is considered as the upper limit of InstrOrderCost.

ram-NK: This new parameter added as per @Meinersbur comments in [[ https://reviews.llvm.org/D135808?

using LoopVector = SmallVector<Loop *, 8>; using LoopVector = SmallVector<Loop *, 8>;

// TODO: Check if we can use a sparse matrix here. // TODO: Check if we can use a sparse matrix here.

using CharMatrix = std::vector<std::vector<char>>; using CharMatrix = std::vector<std::vector<char>>;

} // end anonymous namespace } // end anonymous namespace

▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines public:

LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE, LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE,

OptimizationRemarkEmitter *ORE) OptimizationRemarkEmitter *ORE)

: OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE) {} : OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE) {}

/// Check if the loop interchange is profitable. /// Check if the loop interchange is profitable.

bool isProfitable(const Loop *InnerLoop, const Loop *OuterLoop, bool isProfitable(const Loop *InnerLoop, const Loop *OuterLoop,

unsigned InnerLoopId, unsigned OuterLoopId, unsigned InnerLoopId, unsigned OuterLoopId,

CharMatrix &DepMatrix, CharMatrix &DepMatrix,

const DenseMap<const Loop *, unsigned> &CostMap); const DenseMap<const Loop *, unsigned> &CostMap,

std::unique_ptr<CacheCost> &CC);

private: private:

int getInstrOrderCost(); int getInstrOrderCost();

std::optional<bool> isProfitablePerLoopCacheAnalysis(

const DenseMap<const Loop *, unsigned> &CostMap,

std::unique_ptr<CacheCost> &CC);

std::optional<bool> isProfitablePerInstrOrderCost();

std::optional<bool> isProfitableForVectorization(unsigned InnerLoopId,

unsigned OuterLoopId,

CharMatrix &DepMatrix);

Loop *OuterLoop; Loop *OuterLoop;

Loop *InnerLoop; Loop *InnerLoop;

/// Scev analysis. /// Scev analysis.

ScalarEvolution *SE; ScalarEvolution *SE;

/// Interface to emit optimization remarks. /// Interface to emit optimization remarks.

OptimizationRemarkEmitter *ORE; OptimizationRemarkEmitter *ORE;

▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId,

LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, ORE); LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, ORE);

if (!LIL.canInterchangeLoops(InnerLoopId, OuterLoopId, DependencyMatrix)) { if (!LIL.canInterchangeLoops(InnerLoopId, OuterLoopId, DependencyMatrix)) {

LLVM_DEBUG(dbgs() << "Not interchanging loops. Cannot prove legality.\n"); LLVM_DEBUG(dbgs() << "Not interchanging loops. Cannot prove legality.\n");

return false; return false;

} }

LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n"); LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n");

LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE); LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE);

if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId, if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId,

DependencyMatrix, CostMap)) { DependencyMatrix, CostMap, CC)) {

LLVM_DEBUG(dbgs() << "Interchanging loops not profitable.\n"); LLVM_DEBUG(dbgs() << "Interchanging loops not profitable.\n");

return false; return false;

} }

ORE->emit([&]() { ORE->emit([&]() {

return OptimizationRemark(DEBUG_TYPE, "Interchanged", return OptimizationRemark(DEBUG_TYPE, "Interchanged",

InnerLoop->getStartLoc(), InnerLoop->getStartLoc(),

InnerLoop->getHeader()) InnerLoop->getHeader())

▲ Show 20 Lines • Show All 561 Lines • ▼ Show 20 Lines for (Instruction &Ins : *BB) {

} }

return GoodOrder - BadOrder; return GoodOrder - BadOrder;

} }

static bool isProfitableForVectorization(unsigned InnerLoopId, std::optional<bool>

MeinersburUnsubmitted

Done

Does you patch cover this TODO?

Meinersbur: Does you patch cover this TODO?

ram-NKAuthorUnsubmitted

Done

Corrected the dependency checking and corrected the comments.

ram-NK: Corrected the dependency checking and corrected the comments.

MeinersburUnsubmitted

Done

Shouldn't the condition on Row[InnerLoopId] and Row[OuterLoopId] be exact opposite? That is, it is profitable if the innermost loop has loop-carried dependencies while the outer has not?

Meinersbur: Shouldn't the condition on `Row[InnerLoopId]` and `Row[OuterLoopId]` be exact opposite? That is…

ram-NKAuthorUnsubmitted

Done

Corrected the dependency check. If inner loop has loop carried dependency and outer loop is loop independent then, loop interchange is considered as profitable for vectorization.

ram-NK: Corrected the dependency check. If inner loop has loop carried dependency and outer loop is…

unsigned OuterLoopId, LoopInterchangeProfitability::isProfitablePerLoopCacheAnalysis(

CharMatrix &DepMatrix) { const DenseMap<const Loop *, unsigned> &CostMap,

// TODO: Improve this heuristic to catch more cases. std::unique_ptr<CacheCost> &CC) {

congzheUnsubmitted

Done

nit: isProfitableAccordingToLoopCacheAnalysis, or isProfitablePerLoopCacheAnalysis

congzhe: nit: `isProfitableAccordingToLoopCacheAnalysis`, or `isProfitablePerLoopCacheAnalysis`

// If the inner loop is loop independent or doesn't carry any dependency it is

// profitable to move this to outer position.

for (auto &Row : DepMatrix) {

if (Row[InnerLoopId] != 'S' && Row[InnerLoopId] != 'I')

return false;

// TODO: We need to improve this heuristic.

if (Row[OuterLoopId] != '=')

return false;

}

// If outer loop has dependence and inner loop is loop independent then it is

// profitable to interchange to enable parallelism.

// If there are no dependences, interchanging will not improve anything.

return !DepMatrix.empty();

}

bool LoopInterchangeProfitability::isProfitable(

const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,

unsigned OuterLoopId, CharMatrix &DepMatrix,

const DenseMap<const Loop *, unsigned> &CostMap) {

// TODO: Remove the legacy cost model.

// This is the new cost model returned from loop cache analysis. // This is the new cost model returned from loop cache analysis.

bmahjourUnsubmitted

Done

not not -> not

bmahjour: not not -> not

ram-NKAuthorUnsubmitted

Done

corrected

ram-NK: corrected

// A smaller index means the loop should be placed an outer loop, and vice // A smaller index means the loop should be placed an outer loop, and vice

// versa. // versa.

if (CostMap.find(InnerLoop) != CostMap.end() && if (CostMap.find(InnerLoop) != CostMap.end() &&

CostMap.find(OuterLoop) != CostMap.end()) { CostMap.find(OuterLoop) != CostMap.end()) {

unsigned InnerIndex = 0, OuterIndex = 0; unsigned InnerIndex = 0, OuterIndex = 0;

InnerIndex = CostMap.find(InnerLoop)->second; InnerIndex = CostMap.find(InnerLoop)->second;

OuterIndex = CostMap.find(OuterLoop)->second; OuterIndex = CostMap.find(OuterLoop)->second;

LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex

<< ", OuterIndex = " << OuterIndex << "\n"); << ", OuterIndex = " << OuterIndex << "\n");

if (InnerIndex < OuterIndex) if (InnerIndex < OuterIndex)

return true; return std::optional<bool>(true);

} else { assert(InnerIndex != OuterIndex && "CostMap should assign unique "

bmahjourUnsubmitted

Done

We should return nullopt when/if InnerIndex == OuterIndex

bmahjour: We should return `nullopt` when/if `InnerIndex == OuterIndex`

ram-NKAuthorUnsubmitted

Done

The CostMap is assigned with unique number to each loop. So this condition "InnerIndex == OuterIndex" will not satisfy. But LoopCost may be same, CC->getLoopCost(*InnerLoop) == CC->getLoopCost(*OuterLoop) checking can be added for returning nullopt.

ram-NK: The CostMap is assigned with unique number to each loop. So this condition "InnerIndex ==…

bmahjourUnsubmitted

Done

ah, ok...in that case please add assert(InnerIndex != OuterIndex && "CostMap should assign unique numbers to each loop")

bmahjour: ah, ok...in that case please add `assert(InnerIndex != OuterIndex && "CostMap should assign…

"numbers to each loop");

if (CC->getLoopCost(*OuterLoop) == CC->getLoopCost(*InnerLoop))

return std::nullopt;

return std::optional<bool>(false);

MeinersburUnsubmitted

Done

LLVM_DEBUG(dbgs() << "Cost = " << Cost << "\n");

- if (Cost < -LoopInterchangeCostThreshold)

- return true;

+ if (abs(Cost) < LoopInterchangeCostThreshold) { // Is the threshold for meaningfulness reached?

+ if (Cost < 0) // Does the heuristic indicate that the loops should be interchanged?

+ return true;

+ // else { Heuristic indicates that the current loop order is better and would revert it if we previously interchange tho loop -- hence do not do the loop interchange even if a fallback heuristic would want to }

+ } else {

+ // Check profitability for vectorization

+ }

}

// To prevent endless interchange, only check whether it is profitable for vectorization

The structure suggested in the LoopWG call.

The more general pattern/refactoring would be:

std::optional<bool> shouldInterchange = isProfitableAccordingLoopCacheAnalysis(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

shouldInterchange = isProfitableAccordingInstrOrderCost(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

shouldInterchange = isProfitableForVectorization(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

emitOptimizationRemark("Don't know");

However, this changes when the emitOptimizationRemark is called. If we would not want to change this, it would be (which corresponds to the current structure but with refactoring):

std::optional<bool> shouldInterchange = isProfitableAccordingLoopCacheAnalysis(..);
if (!shouldInterchange.has_value()) {
  shouldInterchange = isProfitableAccordingInstrOrderCost(..);
  if (!shouldInterchange.has_value()) {
     shouldInterchange = isProfitableForVectorization(..);
  }
}

if (!shouldInterchange.has_value())
  emitOptimizationRemark("Don't know");
else if (!shouldInterchange.get_value())
  emitOptimizationRemark("Profitability heuristic indicates this loop is good as-is");

Meinersbur: The structure suggested in the LoopWG call. The more general pattern/refactoring would be…

}

return std::nullopt;

}

std::optional<bool>

LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {

// Legacy cost model: this is rough cost estimation algorithm. It counts the // Legacy cost model: this is rough cost estimation algorithm. It counts the

// good and bad order of induction variables in the instruction and allows // good and bad order of induction variables in the instruction and allows

// reordering if number of bad orders is more than good. // reordering if number of bad orders is more than good.

int Cost = getInstrOrderCost(); int Cost = getInstrOrderCost();

LLVM_DEBUG(dbgs() << "Cost = " << Cost << "\n"); LLVM_DEBUG(dbgs() << "Cost = " << Cost << "\n");

if (Cost < -LoopInterchangeCostThreshold) if (Cost < 0 && Cost < LoopInterchangeCostThreshold)

congzheUnsubmitted

Done

Is it possible to make LoopInterchangeCostMeaningfulnessThreshold an opt flag so we can assign its value more flexibly? Possibly rename it to LegacyCostModelThreshold or whatever is more appropriate.

static cl::opt<int> LegacyCostModelThreshold(
    "legacy-cost-threshold", cl::init(10), cl::Hidden,
    cl::desc("The threshold for the legacy cost model to be considered."));

congzhe: Is it possible to make `LoopInterchangeCostMeaningfulnessThreshold` an opt flag so we can…

return true; return std::optional<bool>(true);

return std::nullopt;

} }

MeinersburUnsubmitted

Done

Should this only be considered if InnerLoopId is actually an innermost loop (The only kind LoopVectorize can currently process)?

Meinersbur: Should this only be considered if `InnerLoopId` is actually an innermost loop (The only kind…

congzheUnsubmitted

Done

nit: isProfitableAccordingToInstrOrderCost, or isProfitablePerInstrOrderCost

congzhe: nit: `isProfitableAccordingToInstrOrderCost`, or `isProfitablePerInstrOrderCost`

// It is not profitable as per current cache profitability model. But check if std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(

MeinersburUnsubmitted

Done

Consider updating this message. Suggestion: "Interchanging loops not considered to improve cache locality nor vectorization."

Meinersbur: Consider updating this message. Suggestion: "Interchanging loops not considered to improve…

// we can move this loop outside to improve parallelism. unsigned InnerLoopId, unsigned OuterLoopId, CharMatrix &DepMatrix) {

if (isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix)) for (auto &Row : DepMatrix) {

congzheUnsubmitted

Done

// analyze the loopnest (e.g., due to delinearization issues).

- if (CostMap.find(InnerLoop) == CostMap.end() || CostMap.find(OuterLoop) != CostMap.end() ||

+ if (CostMap.find(InnerLoop) == CostMap.end() || CostMap.find(OuterLoop) == CostMap.end() ||

(CC && CC->getLoopCost(*InnerLoop) == CC->getLoopCost(*OuterLoop))) {

congzhe:

ram-NKAuthorUnsubmitted

Done

Corrected the condition. any of the loop is failed to determine the loop nest and equal locality then only checks the profit of vectorization.

ram-NK: Corrected the condition. any of the loop is failed to determine the loop nest and equal…

MeinersburUnsubmitted

Done

This is exactly the else branch of the cache analysis logic. It does not consider the fallback Cost < -LoopInterchangeCostThreshold legacy cost model. Please avoid the code duplication and needing to loop up the CostMap again.

Meinersbur: This is exactly the else branch of the cache analysis logic. It does not consider the fallback…

return true; // If the inner loop is loop independent or doesn't carry any dependency

// it is not profitable to move this to outer position, since we are

// likely able to do inner loop vectorization already.

if (Row[InnerLoopId] == 'I' || Row[InnerLoopId] == '=')

return std::optional<bool>(false);

bmahjourUnsubmitted

Done

since the idea is to call this function only when CC has failed or been indecisive, we don't need to pass CC to this function and check it here.

bmahjour: since the idea is to call this function only when CC has failed or been indecisive, we don't…

// If the outer loop is not loop independent it is not profitable to move

// this to inner position, since doing so would not enable inner loop

// parallelism.

if (Row[OuterLoopId] != 'I' && Row[OuterLoopId] != '=')

bmahjourUnsubmitted

Done

ideally this should be

if (Row[OuterLoopId] != '=' && Row[OuterLoopId] != 'I')

bmahjour: ideally this should be ` if (Row[OuterLoopId] != '=' && Row[OuterLoopId] != 'I')`

return std::optional<bool>(false);

}

// If inner loop has dependence and outer loop is loop independent then it

// is/ profitable to interchange to enable inner loop parallelism.

// If there are no dependences, interchanging will not improve anything.

return std::optional<bool>(!DepMatrix.empty());

}

bool LoopInterchangeProfitability::isProfitable(

const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,

unsigned OuterLoopId, CharMatrix &DepMatrix,

const DenseMap<const Loop *, unsigned> &CostMap,

std::unique_ptr<CacheCost> &CC) {

// isProfitable() is structured to avoid endless loop interchange.

// If loop cache analysis could decide the profitability then,

// profitability check will stop and return the analysis result.

// If cache analysis failed to analyze the loopnest (e.g.,

congzheUnsubmitted

Done

It could be worth adding some comments for this function that describe what it does now, and how it prevents endless interchange from happening.

congzhe: It could be worth adding some comments for this function that describe what it does now, and…

// due to delinearization issues) then only check whether it is

// profitable for InstrOrderCost. Likewise, if InstrOrderCost failed to

// analysis the profitability then only, isProfitableForVectorization

// will decide.

std::optional<bool> shouldInterchange =

isProfitablePerLoopCacheAnalysis(CostMap, CC);

if (!shouldInterchange.has_value()) {

shouldInterchange = isProfitablePerInstrOrderCost();

if (!shouldInterchange.has_value())

shouldInterchange =

isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix);

}

if (!shouldInterchange.has_value()) {

ORE->emit([&]() {

return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",

InnerLoop->getStartLoc(),

InnerLoop->getHeader())

<< "Insufficient information to calculate the cost of loop for "

"interchange.";

});

return false;

} else if (!shouldInterchange.value()) {

ORE->emit([&]() { ORE->emit([&]() {

return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable", return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",

InnerLoop->getStartLoc(), InnerLoop->getStartLoc(),

InnerLoop->getHeader()) InnerLoop->getHeader())

<< "Interchanging loops is too costly and it does not improve " << "Interchanging loops is not considered to improve cache "

"parallelism."; "locality nor vectorization.";

}); });

return false; return false;

} }

return true;

}

void LoopInterchangeTransform::removeChildLoop(Loop *OuterLoop, void LoopInterchangeTransform::removeChildLoop(Loop *OuterLoop,

Loop *InnerLoop) { Loop *InnerLoop) {

for (Loop *L : *OuterLoop) for (Loop *L : *OuterLoop)

if (L == InnerLoop) { if (L == InnerLoop) {

OuterLoop->removeChildLoop(L); OuterLoop->removeChildLoop(L);

return; return;

} }

▲ Show 20 Lines • Show All 570 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: - String: Cannot interchange loops due to dependences.			; CHECK-NEXT: - String: Cannot interchange loops due to dependences.
	; CHECK-NEXT: ...			; CHECK-NEXT: ...

	; DELIN: --- !Missed			; DELIN: --- !Missed
	; DELIN-NEXT: Pass: loop-interchange			; DELIN-NEXT: Pass: loop-interchange
	; DELIN-NEXT: Name: InterchangeNotProfitable			; DELIN-NEXT: Name: InterchangeNotProfitable
	; DELIN-NEXT: Function: test01			; DELIN-NEXT: Function: test01
	; DELIN-NEXT: Args:			; DELIN-NEXT: Args:
	; DELIN-NEXT: - String: Interchanging loops is too costly and it does not improve parallelism.			; DELIN-NEXT: - String: Interchanging loops is not considered to improve cache locality nor vectorization.
	; DELIN-NEXT: ...			; DELIN-NEXT: ...

	;;--------------------------------------Test case 02------------------------------------			;;--------------------------------------Test case 02------------------------------------
	;; [FIXME] This loop though valid is currently not interchanged due to the			;; [FIXME] This loop though valid is currently not interchanged due to the
	;; limitation that we cannot split the inner loop latch due to multiple use of inner induction			;; limitation that we cannot split the inner loop latch due to multiple use of inner induction
	;; variable.(used to increment the loop counter and to access A[j+1][i+1]			;; variable.(used to increment the loop counter and to access A[j+1][i+1]
	;; for(int i=0;i<N-1;i++)			;; for(int i=0;i<N-1;i++)
	;; for(int j=1;j<N-1;j++)			;; for(int j=1;j<N-1;j++)
	▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/perserve-lcssa.ll

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines

exit: ; preds = %outer.latch

store i64 %r1, ptr @b, align 4

store i64 %v4.lcssa, ptr @a, align 4

ret void

}

; Make sure we do not crash for loops without reachable exits.

define void @no_reachable_exits() {

; Check we interchanged.

; Check we do not crash.

; CHECK-LABEL: @no_reachable_exits() {

; CHECK-LABEL: @no_reachable_exits(

; CHECK-NEXT: bb:

; CHECK-NEXT: br label %inner.ph

; CHECK-NEXT: br label [[OUTER_PH:%.*]]

; CHECK-LABEL: outer.ph:

; CHECK: outer.ph:

; CHECK-NEXT: br label %outer.header

; CHECK-NEXT: br label [[OUTER_HEADER:%.*]]

; CHECK-LABEL: inner.ph:

; CHECK: outer.header:

; CHECK-NEXT: br label %inner.body

; CHECK-NEXT: [[TMP2:%.*]] = phi i32 [ 0, [[OUTER_PH]] ], [ [[TMP8:%.*]], [[OUTER_LATCH:%.*]] ]

; CHECK-LABEL: inner.body:

; CHECK-NEXT: br i1 undef, label [[INNER_PH:%.*]], label [[OUTER_LATCH]]

; CHECK-NEXT: %tmp31 = phi i32 [ 0, %inner.ph ], [ %[[IVNEXT:[0-9]]], %inner.body.split ]

; CHECK: inner.ph:

; CHECK-NEXT: br label %outer.ph

; CHECK-NEXT: br label [[INNER_BODY:%.*]]

; CHECK-LABEL: inner.body.split:

; CHECK: inner.body:

; CHECK-NEXT: %[[IVNEXT]] = add nsw i32 %tmp31, 1

; CHECK-NEXT: [[TMP31:%.*]] = phi i32 [ 0, [[INNER_PH]] ], [ [[TMP6:%.*]], [[INNER_BODY]] ]

; CHECK-NEXT: br i1 false, label %inner.body, label %exit

; CHECK-NEXT: [[TMP5:%.*]] = load ptr, ptr undef, align 8

MeinersburUnsubmitted

Done

; CHECK-NEXT: [[TMP31:%.*]] = phi i32 [ 0, [[INNER_PH]] ], [ [[TMP6:%.*]], [[INNER_BODY]] ]

- ; CHECK-NEXT: [[TMP5:%.*]] = load i32*, i32** undef, align 8

+ ; CHECK-NEXT: [[TMP5:%.*]] = load ptr, ptr undef, align 8

; CHECK-NEXT: [[TMP6]] = add nsw i32 [[TMP31]], 1

The test fails for me with the latest version because of opaque pointers emitted by opt.

Regeneration of the test should fix this.

Meinersbur: The test fails for me with the latest version because of opaque pointers emitted by opt.

; CHECK-NEXT: [[TMP6]] = add nsw i32 [[TMP31]], 1

; CHECK-NEXT: br i1 false, label [[INNER_BODY]], label [[OUTER_LATCH_LOOPEXIT:%.*]]

; CHECK: outer.latch.loopexit:

; CHECK-NEXT: br label [[OUTER_LATCH]]

; CHECK: outer.latch:

; CHECK-NEXT: [[TMP8]] = add nsw i32 [[TMP2]], 1

; CHECK-NEXT: br i1 false, label [[OUTER_HEADER]], label [[EXIT:%.*]]

; CHECK: exit:

; CHECK-NEXT: unreachable

bb:

br label %outer.ph

outer.ph: ; preds = %bb

br label %outer.header

Show All 20 Lines

llvm/test/Transforms/LoopInterchange/pr57148.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=loop-interchange -cache-line-size=4 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s			; RUN: opt < %s -passes=loop-interchange -cache-line-size=4 -loop-interchange-threshold=-100 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s

	; Make sure the loops are in LCSSA form after loop interchange,			; Make sure the loops are in LCSSA form after loop interchange,
	; and loop interchange does not hit assertion errors and crash.			; and loop interchange does not hit assertion errors and crash.

	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@b = external global [512 x [4 x i32]], align 1			@b = external global [512 x [4 x i32]], align 1
	@c = external global [2 x [4 x i32]], align 1			@c = external global [2 x [4 x i32]], align 1
	▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/profitability.ll

	; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -pass-remarks-output=%t -verify-dom-info -verify-loop-info \			; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 -pass-remarks-output=%t -verify-dom-info -verify-loop-info \
	; RUN: -pass-remarks=loop-interchange -pass-remarks-missed=loop-interchange			; RUN: -pass-remarks=loop-interchange -pass-remarks-missed=loop-interchange
	; RUN: FileCheck -input-file %t %s			; RUN: FileCheck -input-file %t %s

				; RUN: opt < %s -passes=loop-interchange,loop-interchange -cache-line-size=64 \
				; RUN: -pass-remarks-output=%t -pass-remarks='loop-interchange' -S
				; RUN: cat %t \| FileCheck --check-prefix=PROFIT %s

	;; We test profitability model in these test cases.			;; We test profitability model in these test cases.

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer
	@B = common global [100 x [100 x i32]] zeroinitializer			@B = common global [100 x [100 x i32]] zeroinitializer

	;;---------------------------------------Test case 01---------------------------------			;;---------------------------------------Test case 01---------------------------------
	;; Loops interchange will result in code vectorization and hence profitable. Check for interchange.			;; Loops interchange will result in better cache locality and hence profitable. Check for interchange.
	;; for(int i=1;i<100;i++)			;; for(int i=1;i<100;i++)
	;; for(int j=1;j<100;j++)			;; for(int j=1;j<100;j++)
	;; A[j][i] = A[j - 1][i] + B[j][i];			;; A[j][i] = A[j - 1][i] + B[j][i];

	; CHECK: Name: Interchanged			; CHECK: Name: Interchanged
	; CHECK-NEXT: Function: interchange_01			; CHECK-NEXT: Function: interchange_01

	define void @interchange_01() {			define void @interchange_01() {
	▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	for.inc10:			for.inc10:
	%indvars.iv.next22 = add nuw nsw i64 %indvars.iv21, 1			%indvars.iv.next22 = add nuw nsw i64 %indvars.iv21, 1
	%exitcond23 = icmp eq i64 %indvars.iv.next22, 100			%exitcond23 = icmp eq i64 %indvars.iv.next22, 100
	br i1 %exitcond23, label %for.end12, label %for.cond1.preheader			br i1 %exitcond23, label %for.end12, label %for.cond1.preheader

	for.end12:			for.end12:
	ret void			ret void
	}			}

				;;---------------------------------------Test case 05---------------------------------
				;; This test is to make sure, that multiple invocations of loop interchange will not
				;; undo previous interchange and will converge to a particular order determined by the
				;; profitability analysis.
				;; for(int i=1;i<100;i++)
				;; for(int j=1;j<100;j++)
				congzheUnsubmitted Done Reply Inline Actions nit: `may leads`->`may lead` `before`-> `before D135808` `now`-> `after D135808` congzhe: nit: `may leads`->`may lead` `before`-> `before D135808` `now`-> `after D135808`
				;; A[j][0] = A[j][0] + B[j][i];
				bmahjourUnsubmitted Done Reply Inline Actions reword: This tests to make sure, that multiple invocations of loop interchange will not undo previous interchange and will converge to a particular order determined by the profitability analysis. bmahjour: reword: This tests to make sure, that multiple invocations of loop interchange will not undo…

				; CHECK: Name: Interchanged
				; CHECK-NEXT: Function: interchange_05
				congzheUnsubmitted Done Reply Inline Actions I would suggest to apply `--check-prefix=PROFIT` only to `interchange_05()` and not to other existing tests. Because this opt command line runs interchange twice and shows that there would have been endless interchange with `interchange_05()` before this patch but there is no endless interchange after this patch, which applies only to `interchange_05()` and is not related to other tests. congzhe: I would suggest to apply `--check-prefix=PROFIT` only to `interchange_05()` and not to other…

				; PROFIT-LABEL: --- !Passed
				; PROFIT-NEXT: Pass: loop-interchange
				; PROFIT-NEXT: Name: Interchanged
				; PROFIT-LABEL: Function: interchange_05
				; PROFIT-NEXT: Args:
				; PROFIT-NEXT: - String: Loop interchanged with enclosing loop.
				; PROFIT-NEXT: ...
				; PROFIT: --- !Missed
				; PROFIT-NEXT: Pass: loop-interchange
				; PROFIT-NEXT: Name: InterchangeNotProfitable
				; PROFIT-NEXT: Function: interchange_05
				; PROFIT-NEXT: Args:
				; PROFIT-NEXT: - String: Interchanging loops is not considered to improve cache locality nor vectorization.
				; PROFIT-NEXT: ...
				define void @interchange_05() {
				entry:
				br label %for2.preheader

				for2.preheader:
				%i30 = phi i64 [ 1, %entry ], [ %i.next31, %for1.inc14 ]
				br label %for2

				for2:
				%j = phi i64 [ %i.next, %for2 ], [ 1, %for2.preheader ]
				%j.prev = add nsw i64 %j, -1
				%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %j, i64 0
				%lv1 = load i32, i32* %arrayidx5
				%arrayidx9 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @B, i64 0, i64 %j, i64 %i30
				%lv2 = load i32, i32* %arrayidx9
				%add = add nsw i32 %lv1, %lv2
				%arrayidx13 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %j, i64 0
				store i32 %add, i32* %arrayidx13
				%i.next = add nuw nsw i64 %j, 1
				%exitcond = icmp eq i64 %j, 99
				br i1 %exitcond, label %for1.inc14, label %for2

				for1.inc14:
				%i.next31 = add nuw nsw i64 %i30, 1
				%exitcond33 = icmp eq i64 %i30, 99
				br i1 %exitcond33, label %for.end16, label %for2.preheader

				for.end16:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Correcting the profitability checkClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 489614

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll

llvm/test/Transforms/LoopInterchange/perserve-lcssa.ll

llvm/test/Transforms/LoopInterchange/pr57148.ll

llvm/test/Transforms/LoopInterchange/profitability.ll

[LoopInterchange] Correcting the profitability check
ClosedPublic