Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[LoopInterchange] Correcting the profitability check
ClosedPublic

Authored by ram-NK on Oct 12 2022, 12:40 PM.

Details

Summary

If inner loop is loop independent or doesn't carry any dependency then, loop interchange is not profitable. Also if outer loop is not loop independent then, loop interchange is not profitable. if inner loop has dependence and outer loop is loop independent then, it is profitable to interchange to enable inner loop parallelism. Corrected the dependency checking inside isProfitableForVectorization(). Also addresses the endless interchange problem. If Cache analysis could decide the loop order then, isProfitable will not invoke isProfitableForVectorization for decidinding better loop interchange.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Meinersbur added inline comments.Oct 19 2022, 12:20 AM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1100–1101

Does you patch cover this TODO?

1100–1101

Shouldn't the condition on Row[InnerLoopId] and Row[OuterLoopId] be exact opposite? That is, it is profitable if the innermost loop has loop-carried dependencies while the outer has not?

1125–1137

Should this only be considered if InnerLoopId is actually an innermost loop (The only kind LoopVectorize can currently process)?

ram-NK updated this revision to Diff 469339.Oct 20 2022, 1:47 PM
ram-NK marked an inline comment as done.
ram-NK edited the summary of this revision. (Show Details)
ram-NK marked 2 inline comments as done.Oct 20 2022, 2:01 PM
ram-NK added inline comments.
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1100–1101

Corrected the dependency checking and corrected the comments.

1100–1101

Corrected the dependency check. If inner loop has loop carried dependency and outer loop is loop independent then, loop interchange is considered as profitable for vectorization.

ram-NK marked 2 inline comments as done.Oct 20 2022, 2:04 PM
ram-NK added inline comments.
llvm/test/Transforms/LoopInterchange/pr43797-lcssa-for-multiple-outer-loop-blocks.ll
11 ↗(On Diff #467235)

These changes are not needed after correcting the dependency check in loop interchange.

ram-NK added a comment.EditedOct 20 2022, 2:11 PM

Following changes are made as per the comments. If inner loop is loop independent or doesn't carry any dependency then, loop interchange is not considered as profitable. Also if outer loop is not loop independent then, loop interchange is not considered as profitable. If inner loop has dependence and outer loop is loop independent then, it is profitable to interchange to enable inner loop parallelism.

ram-NK edited the summary of this revision. (Show Details)Oct 20 2022, 3:12 PM
ram-NK updated this revision to Diff 469613.Oct 21 2022, 8:05 AM
ram-NK added a comment.EditedOct 28 2022, 8:36 AM

@Meinersbur and @bmahjour, Could you get the chance to review the code I modified as per your comment?

This looks better now, but the problem of "endless interchange" is still not addressed. Per discussion in the loop opt call, we should only use CacheCost when it was able to calculate a valid cost and the loops can be sorted based on "strictly less than" ordering relationship. Only when CacheCost result is indeterminate (eg. two loops have equal cost) or when it is unable to calculate the cost (eg. due to delinearization issues), we should fall back to the isProfitableForVectorization() function.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1105

not not -> not

ram-NK updated this revision to Diff 473714.Nov 7 2022, 9:31 AM

This looks better now, but the problem of "endless interchange" is still not addressed. Per discussion in the loop opt call, we should only use CacheCost when it was able to calculate a valid cost and the loops can be sorted based on "strictly less than" ordering relationship. Only when CacheCost result is indeterminate (eg. two loops have equal cost) or when it is unable to calculate the cost (eg. due to delinearization issues), we should fall back to the isProfitableForVectorization() function.

Corrected the "endless interchange" possibility. Now isProfitableForVectorization() only invokes when two loops have equal cost or it is unable to calculate the cost.

ram-NK marked an inline comment as done.Nov 7 2022, 9:31 AM
ram-NK added inline comments.
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1105

corrected

congzhe added inline comments.Nov 7 2022, 9:03 PM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1141
ram-NK updated this revision to Diff 474010.Nov 8 2022, 7:54 AM
ram-NK marked an inline comment as done.
ram-NK marked 2 inline comments as done.Nov 8 2022, 8:08 AM

@congzhe and @bmahjour comments are addressed.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1141

Corrected the condition. any of the loop is failed to determine the loop nest and equal locality then only checks the profit of vectorization.

Macro testcase:

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1138–1139

Consider updating this message. Suggestion: "Interchanging loops not considered to improve cache locality nor vectorization."

1141

This is exactly the else branch of the cache analysis logic. It does not consider the fallback Cost < -LoopInterchangeCostThreshold legacy cost model. Please avoid the code duplication and needing to loop up the CostMap again.

Meinersbur added inline comments.Nov 16 2022, 9:37 AM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1115–1121

The structure suggested in the LoopWG call.

The more general pattern/refactoring would be:

std::optional<bool> shouldInterchange = isProfitableAccordingLoopCacheAnalysis(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

shouldInterchange = isProfitableAccordingInstrOrderCost(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

shouldInterchange = isProfitableForVectorization(..);
if (shouldInterchange.has_value()) 
  return shouldInterchange.get_value();

emitOptimizationRemark("Don't know");

However, this changes when the emitOptimizationRemark is called. If we would not want to change this, it would be (which corresponds to the current structure but with refactoring):

std::optional<bool> shouldInterchange = isProfitableAccordingLoopCacheAnalysis(..);
if (!shouldInterchange.has_value()) {
  shouldInterchange = isProfitableAccordingInstrOrderCost(..);
  if (!shouldInterchange.has_value()) {
     shouldInterchange = isProfitableForVectorization(..);
  }
}

if (!shouldInterchange.has_value())
  emitOptimizationRemark("Don't know");
else if (!shouldInterchange.get_value())
  emitOptimizationRemark("Profitability heuristic indicates this loop is good as-is");

although I would prefer the former over the nested if-else chain and instead emit different optimization remarks for each of the heuristic that indicates that/why the loops should (NOT) be interchanged.

ram-NK marked an inline comment as done.Nov 22 2022, 12:37 PM

@Meinersbur, I will update these comments and testcase accordingly.

ram-NK updated this revision to Diff 479293.Dec 1 2022, 7:48 AM

As per the Michael 's comments. Code inside isProfitable function is cleaned.

Added test case. With two loop interchange passes make sure that, there is no endless interchange. Before this patch there is endless loop interchange possibility. There is no endless interchange after this patch.

ram-NK marked 3 inline comments as done.Dec 1 2022, 7:51 AM

@Meinersbur comments are addressed.

I would suggest to revise the summary and/or the title to describe that this patch now also addresses the endless interchange problem, in addition to correction in isProfitableForVectorization().

llvm/test/Transforms/LoopInterchange/profitability.ll
179

I would suggest to apply --check-prefix=PROFIT only to interchange_05() and not to other existing tests. Because this opt command line runs interchange twice and shows that there would have been endless interchange with interchange_05() before this patch but there is no endless interchange after this patch, which applies only to interchange_05() and is not related to other tests.

congzhe added inline comments.Dec 1 2022, 8:13 AM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1133

Is it possible to make LoopInterchangeCostMeaningfulnessThreshold an opt flag so we can assign its value more flexibly? Possibly rename it to LegacyCostModelThreshold or whatever is more appropriate.

static cl::opt<int> LegacyCostModelThreshold(
    "legacy-cost-threshold", cl::init(10), cl::Hidden,
    cl::desc("The threshold for the legacy cost model to be considered."));
ram-NK updated this revision to Diff 479329.Dec 1 2022, 9:13 AM
ram-NK retitled this revision from [LoopInterchange] Correcting the profitability checking for vectorization to [LoopInterchange] Correcting the profitability check.
ram-NK edited the summary of this revision. (Show Details)
ram-NK marked 2 inline comments as done.

@congzhe Corrected as per the comments.

ram-NK updated this revision to Diff 479630.Dec 2 2022, 7:17 AM

Added comments in Test case 05 for better understanding.
Corrected the comment in Test case 01. The loop interchange decision is made from the Cache cost analysis and is not from vectorization.

ram-NK updated this revision to Diff 479744.Dec 2 2022, 1:55 PM
ram-NK updated this revision to Diff 482045.Dec 12 2022, 2:43 AM
congzhe added inline comments.Dec 12 2022, 3:28 PM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1101–1104

nit: isProfitableAccordingToLoopCacheAnalysis, or isProfitablePerLoopCacheAnalysis

1126–1137

nit: isProfitableAccordingToInstrOrderCost, or isProfitablePerInstrOrderCost

1168

It could be worth adding some comments for this function that describe what it does now, and how it prevents endless interchange from happening.

llvm/test/Transforms/LoopInterchange/profitability.ll
175

nit:

may leads->may lead
before-> before D135808
now-> after D135808

ram-NK updated this revision to Diff 482681.Dec 13 2022, 5:58 PM
ram-NK marked 4 inline comments as done.

@congzhe, All comments are addressed.

congzhe added a reviewer: Restricted Project.Dec 13 2022, 8:13 PM
Matt added a subscriber: Matt.Dec 14 2022, 8:32 AM
bmahjour added inline comments.Dec 15 2022, 12:29 PM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
64

I don't see why we need two options to control the legacy cost model.

1117

We should return nullopt when/if InnerIndex == OuterIndex

1146

since the idea is to call this function only when CC has failed or been indecisive, we don't need to pass CC to this function and check it here.

llvm/test/Transforms/LoopInterchange/profitability.ll
171–176

reword:

This tests to make sure, that multiple invocations of loop interchange will not undo previous interchange and will converge to a particular order determined by the profitability analysis.

@bmahjour I will update the comments accordingly.

ram-NK added inline comments.Dec 21 2022, 11:30 AM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
64

This new parameter added as per @Meinersbur comments in Inline. This is considered as the upper limit of InstrOrderCost.

1117

The CostMap is assigned with unique number to each loop. So this condition "InnerIndex == OuterIndex" will not satisfy. But LoopCost may be same, CC->getLoopCost(*InnerLoop) == CC->getLoopCost(*OuterLoop) checking can be added for returning nullopt.

ram-NK updated this revision to Diff 485617.Dec 29 2022, 8:07 AM
ram-NK marked 4 inline comments as done.EditedDec 29 2022, 8:16 AM

perserve-lcssa.ll corrected, Instruction order cost is 0, Not profitable to interchange.
loop-interchange-optimization-remarks.ll remark message corrected.
All comments are updated.

ram-NK updated this revision to Diff 485706.Dec 30 2022, 9:00 AM

corrected the code formating

ram-NK added a comment.Jan 6 2023, 8:27 AM

Hi @bmahjour, Changes are made as per your comments. Could you get the time to review it.
Hi @Meinersbur, Some changes are done over your suggested comments (https://reviews.llvm.org/D135808?id=474010#inline-1332682)

bmahjour accepted this revision.Jan 10 2023, 2:34 PM

LGTM with some minor comments.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1117

ah, ok...in that case please add assert(InnerIndex != OuterIndex && "CostMap should assign unique numbers to each loop")

1151

ideally this should be

if (Row[OuterLoopId] != '=' && Row[OuterLoopId] != 'I')

This revision is now accepted and ready to land.Jan 10 2023, 2:34 PM
Meinersbur accepted this revision.Jan 11 2023, 7:56 AM

LGTM.

Sorry I didn't find time to review it.

llvm/test/Transforms/LoopInterchange/perserve-lcssa.ll
172

The test fails for me with the latest version because of opaque pointers emitted by opt.

Regeneration of the test should fix this.

ram-NK updated this revision to Diff 488854.Jan 12 2023, 8:14 PM
ram-NK marked 3 inline comments as done.

All comments are updated.

This revision was landed with ongoing or failed builds.Jan 16 2023, 11:36 AM
This revision was automatically updated to reflect the committed changes.