This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
3/12
LoopInterchange.cpp
-
test/Transforms/
-
Transforms/
-
LICM/
3
lnicm.ll
-
LoopInterchange/
1/2
call-instructions.ll
-
currentLimitation.ll
-
debuginfo.ll
-
inner-indvar-depend-on-outer-indvar.ll
-
inner-only-reductions.ll
-
innermost-latch-uses-values-in-middle-header.ll
-
interchange-flow-dep-outer.ll
-
interchange-insts-between-indvar.ll
-
interchange-no-deps.ll
-
interchangeable-innerloop-multiple-indvars.ll
-
interchangeable-outerloop-multiple-indvars.ll
-
interchangeable.ll
-
interchanged-loop-nest-3.ll
-
lcssa-preheader.ll
-
lcssa.ll
-
loop-interchange-optimization-remarks.ll
-
not-interchanged-dependencies-1.ll
-
not-interchanged-loop-nest-3.ll
-
not-interchanged-tightly-nested.ll
-
outer-header-jump-to-inner-latch.ll
-
outer-only-reductions.ll
-
perserve-lcssa.ll
-
phi-ordering.ll
-
pr43176-move-to-new-latch.ll
-
pr43326-ideal-access-pattern.ll
-
pr43326.ll
-
pr43473-invalid-lcssa-phis-in-inner-exit.ll
-
pr43797-lcssa-for-multiple-outer-loop-blocks.ll
-
pr45743-move-from-inner-preheader.ll
-
pr48212.ll
-
profitability.ll
-
reductions-across-inner-and-outer-loop.ll
-
update-condbranch-duplicate-successors.ll
-
vector-gep-operand.ll

Differential D124926

[LoopInterchange] New cost model for loop interchange
ClosedPublic

Authored by congzhe on May 4 2022, 7:48 AM.

Download Raw Diff

Details

Reviewers

bmahjour
Meinersbur

Group Reviewers

Restricted Project

Commits

rGb941857b40ed: [LoopInterchange] New cost model for loop interchange
rG1b24fe34b06c: [LoopInterchange] New cost model for loop interchange
rG006334470d8d: [LoopInterchange] New cost model for loop interchange

Summary

This patch proposed to replace the current loop interchange cost model by a new one, i.e., one that is returned from loop cache analysis.

Motivation behind

Given a loopnest, what loop cache analysis returns is a vector of loops [loop0, loop1, loop2, ...] where loop0 should be replaced as the outermost loop, loop1 should be placed one more level inside, and loop2 one more level inside, etc. What loop cache analysis does is not only more comprehensive than the current cost model, it is also a "one-shot" query meaning that we only need to query once during the entire loop interchange pass, which is better than the current cost model where we query it every time we check whether it is profitable to interchange two loops. Thus there is complexity saving, especially after D120386 where we do more interchanges to get the globally optimal loop access pattern.

Changes made to test cases

There's some changes to lit tests. Most of them are minor. One change that applies to all tests is that I changed the target triple from "x86_64" to "aarch64" and added "-mcpu=tsv110" in the RUN lines. This is because loop cache analysis needs the cache line size, which is given by "TTI.getCacheLineSize()". However, for x86 subtargets the "getCacheLineSize()" method is not implemented hence "TTI.getCacheLineSize()" would just return 0. This change I made makes "TTI.getCacheLineSize()" return a valid number and hence loop cache analysis can proceed as normal.

*update:* now the target triple are changed to powerpc as per comments.

interchange-no-deps.ll: removed the test function "no_bad_order()" because it is in fact irrelevant to situations for loop interchange and only aims at testing the legacy cost model (operands in gep instructions, to be more specific). The memory access is irrelevant to the outer loop and the outer loop should have been deleted, hence the function is not applicable to loop interchange. Therefore this is not a typical IR that we would encounter in real situations. The new and legacy cost models give different results for this function, so I just removed this test function since it is in fact not applicable to loop interchange.

interchanged-loop-nest-3.ll, not-interchanged-loop-nest-3.ll, not-interchanged-tightly-nested.ll: the IR was not really entirely correct, since the target triple specified was "x86_64" but the getelementptr indices are 32 bits. The indices should be 64 bits since pointer arithmetric should be 64 bit. So I changed them from i32 to i64, otherwise it will trigger SCEV assertion failures when running loop cache analysis, which says "scev operand types mismatch".

A note: currently we did not completely remove the legacy cost model, but keep it under an opt flag. This is because currently if we only used the new cost model, some lit tests would fail. The reason is that we leverage delinearization in loop cache analysis , and delinearization needs some enhancement -- currently it cannot successfully delinearize some tests and loop cache analysis would just bail out. I'll put enhancement of delinearization into my next-steps.

Diff Detail

Event Timeline

congzhe created this revision.May 4 2022, 7:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 7:48 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

congzhe requested review of this revision.May 4 2022, 7:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 7:48 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B162684: Diff 427005.May 4 2022, 8:37 AM

congzhe updated this revision to Diff 429392.May 13 2022, 5:04 PM

congzhe retitled this revision from [WIP][LoopInterchange] New cost model for loop interchange to [LoopInterchange] New cost model for loop interchange.May 13 2022, 5:42 PM

congzhe edited the summary of this revision. (Show Details)

Herald added subscribers: pengfei, kristof.beyls. · View Herald TranscriptMay 13 2022, 5:42 PM

congzhe edited the summary of this revision. (Show Details)May 13 2022, 5:42 PM

congzhe edited the summary of this revision. (Show Details)May 13 2022, 6:03 PM

Herald added a subscriber: javed.absar. · View Herald TranscriptMay 13 2022, 6:03 PM

congzhe edited the summary of this revision. (Show Details)May 13 2022, 6:13 PM

congzhe added reviewers: bmahjour, Meinersbur, Restricted Project.

congzhe set the repository for this revision to rG LLVM Github Monorepo.

congzhe added a project: Restricted Project.

congzhe edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B164418: Diff 429392.May 13 2022, 6:19 PM

Other than the inline comments looks good to me.

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
63	Not sure if we need an option, I slightly prefer removing it, because the fact that we have an "EnableLegacy" option for it that defaults to "true" might incorrectly imply that we still use the old cost model where in fact we use the new one for majority of cases.
509	Add a comment about what the index reprsents (ie the optimal order of loops in the nest).
1161	I think we should try to limit the use of legacy cost model only to cases were LoopCacheCost fails due to delinearization, as you mentioned in the description. Can we do it like this: if (CostMap.find(InnerLoop) != CostMap.end() && CostMap.find(OuterLoop) != CostMap.end()) { ... } else { Cost = getInstrOrderCost(); LLVM_DEBUG(dbgs() << "Cost = " << Cost << "\n"); if (Cost < -LoopInterchangeCostThreshold) return true; }
1193	Can we unify the optimization remark output for both cost models? I don't think the specific cost values are of interest to the user, so just saying whether we found the loops profitable or not should suffice. For developers interested in the actual cost, they can get it through -debug output.
llvm/test/Transforms/LoopInterchange/call-instructions.ll
2	If we use `powerpc64le-unknown-linux-gnu` as the target triple, then we wouldn't need to specify `-mcpu`, since `getCacheLineSize()` returns valid cache line size for all Power subtargets.

congzhe updated this revision to Diff 431049.May 20 2022, 2:09 PM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
63	Thanks, that's a good point. I've removed this flag.

congzhe added inline comments.May 20 2022, 2:26 PM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
509	Thanks and updated accordingly.
1161	Thanks, I updated the code as you suggested.
llvm/test/Transforms/LoopInterchange/call-instructions.ll
2	I've now used `powerpc64le-unknown-linux-gnu` for all tests.

Harbormaster completed remote builds in B165586: Diff 431054.May 20 2022, 3:13 PM

There's some changes to lit tests. Most of them are minor. One change that applies to all tests is that I changed the target triple from "x86_64" to "aarch64" and added "-mcpu=tsv110" in the RUN lines.

IIUC, this means that IndexInner = CostMap.find(InnerLoop)->second = IndexOuter = CostMap.find(OuterLoop)->second = 0 here, right? Will your change possibly take away some coverage from x86? I think having a pre-commit to fill in something in x86's TTI to make sure its working fine on the new model would be fine. We can have multiple RUN lines on different architectures.

In D124926#3529179, @eopXD wrote:

There's some changes to lit tests. Most of them are minor. One change that applies to all tests is that I changed the target triple from "x86_64" to "aarch64" and added "-mcpu=tsv110" in the RUN lines.

IIUC, this means that IndexInner = CostMap.find(InnerLoop)->second = IndexOuter = CostMap.find(OuterLoop)->second = 0 here, right?

Thanks for the comment!

With "x86_64" target triple since TTI->getCachelineSize() returns 0, loop cache analysis would execute analysis incorrectly, which means the loop vector it output may not be correct. What this indicates in loop interchange is that, IndexInner/IndexOuter might be messed up such that even if InnerLoop should be placed as the outer loop and OuterLoop should be placed as the inner loop, IndexInner might not be smaller than IndexOuter and hence we won't do interchange.

Will your change possibly take away some coverage from x86? I think having a pre-commit to fill in something in x86's TTI to make sure its working fine on the new model would be fine. We can have multiple RUN lines on different architectures.

IMHO I don't think we necessarily lose coverage for x86. Loop interchange is a mid-end pass that is not dependent on specific backend targets, although we need a little bit information from backends which is CachelineSize. Whichever "target triple" we use in our tests won't affect the functionality of this pass (and the test coverage). Regarding a pre-commit of "getCachelineSize()": I'm leaning towards leaving the implementation of getCachelineSize() in different backends to experts who are most familiar with that specific backend, let it be x86, arm, arc, loongArch, RISCV, etc, since it looks like implementation of "getCachelineSize()" is missing in a few targets. I'd appreciate it if you could let me know your thoughts.

For now we are failing one test case in LICM: Transforms/LICM/lnicm.ll. The reason is as follows. Looking at the source code:

void test(int x[10][10], int y[10], int *z) {
  for (int k = 0; k < 10; k++) {
    int tmp = *z;
    for (int i = 0; i < 10; i++)
      x[i][k] += y[k] + tmp;
  }
}

With the new cost model, loop cache analysis could analyze the access to array y but could not analyze the access to array x since it could not delinearize fixed-size array x. This is because the IR uses a chain of getelementptrs to access x which cannot be delinearized currently (as we discussed before). Hence loop cache analysis does the analysis only based on y and decides not to interchange. Ideally it should analyze both x and y and possibly decides to interchange the loops (whether to interchange or not depends on the tripcount numbers though). Currently this test case assumes we should interchange which we do not, and therefore the test fails.

There's multiple ways to resolve it. One possible way in my mind is that, we could change the following code in populateReferenceGroups() in LoopCacheAnalysis.cpp from:

if (!R->isValid())
    continue;

if (!R->isValid())
    return false;

which means that if it finds an IndexedReference that is not analyzable, it bails out with an empty loop vector and indicates it would not produce results if there's mem access that is not analyzable. IMHO it does make some sense because if loop cache analysis cannot successfully analyze all mem accesses, it should bail out otherwise incorrect decisions might be made due to missed info. In this case loop interchange will just fall back to the legacy cost model. If we go with this fix, I can post a simple patch for it.

Other possible fixes include modifying the gep instructions in Transforms/LICM/lnicm.ll to make mem accesses delinearizable and analyzable. I'd be fine with any possible solution. I'd appreciate it if you could let me know your thoughts? @bmahjour

Other possible fixes include modifying the gep instructions in Transforms/LICM/lnicm.ll to make mem accesses delinearizable and analyzable. I'd be fine with any possible solution. I'd appreciate it if you could let me know your thoughts? @bmahjour

Overtime we'd want to improve delinearization and remove dependency on the legacy loop interchange cost model. I'd prefer changing the test case to become more friendly to delinearization. That test case needs to be rewritten soon to get rid of typed pointers anyway, so might as well use opaque pointers now and make the arrays non-fixed size.

Updated the test case Transforms/LICM/lnicm.ll according to comments from Bardia @bmahjour, used opaque pointers and non-fixed size array. A few minor basic blocks were added in Transforms/LICM/lnicm.ll but the semantics did not change. Now delinearization works for it, and the new interchange cost model works fine - after running LNICM the loopnest will be interchanged.

Herald added subscribers: steven.zhang, asbirlea. · View Herald TranscriptMay 28 2022, 10:47 AM

Harbormaster completed remote builds in B166784: Diff 432741.May 28 2022, 11:26 AM

bmahjour added inline comments.May 30 2022, 9:13 AM

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
508	[nit] `and populate the <Loop, index> pair into a map` -> `and put each <Loop, index> pair into a map...`
513	Since `CC` can be nullptr we need to assert that that's not the case.
1169	[nit] IndexInner -> InnerIndex [nit] IndexOuter -> OuterIndex
llvm/test/Transforms/LICM/lnicm.ll
7–8	[nit] please add `n` and `m` as parameter. ie `void test(int n, int m, int x[m][n], int y[n], int *z) {`
19–20	[nit] correct order based on the C code above.

bmahjour added inline comments.May 30 2022, 9:24 AM

llvm/test/Transforms/LICM/lnicm.ll
20	Why can't this be kept as CHECK-NEXT (like before)? The load of `z` should be moved out by both LICM and LNICM I think.

Thanks Bardia for the comments, I've addressed them accordingly.

Regarding test/Transforms/LICM/lnicm.ll: the load of z is indeed moved out by both LICM and LNICM so there is no functional problems. The reason why the CHECK lines are changed here is that the new basic blocks I added in this patch (for ease of delinearization) will be involved in transformation of loop interchange, which is not the case before this patch.

With this patch:

With LICM the load of z is hoisted into entry and since loop interchange won't take place, the load of z stays in entry.
With LNICM the load of z is again hoisted into entry, and since entry is the preheader of the original outer loop, after interchange load of z is moved into for.body3.preheader, which is the new outer loop preheader. I tried to modify the IR aiming at keep the CHECK-NEXT lines that you mentioned like before, but I have not been successful.

Before this patch:

For the original test/Transforms/LICM/lnicm.ll before this patch, the entry block is not considered part of the loop (and a new outer preheader for.body.preheader will be generated) so once the load of z is hoisted to entry by either LICM or LNICM, it would always stay there since entry was not involved in transformation of loop interchange.

Herald added a subscriber: jsji. · View Herald TranscriptMay 30 2022, 7:18 PM

Harbormaster completed remote builds in B166979: Diff 433004.May 30 2022, 8:03 PM

...With LNICM the load of z is again hoisted into entry, and since entry is the preheader of the original outer loop, after interchange load of z is moved into for.body3.preheader, which is the new outer loop preheader....before this patch, the entry block is not considered part of the loop ...

From what I can see, the "entry" is the preheader of the original outer loop in both cases, so not sure why it's treated differently. Maybe it's better to separate out the INTC, LNICM and LICM checks and show the full context for each set to better understand what's going on?

llvm/lib/Transforms/Scalar/LoopInterchange.cpp
514–515
1174–1175

In D124926#3547912, @bmahjour wrote:

...With LNICM the load of z is again hoisted into entry, and since entry is the preheader of the original outer loop, after interchange load of z is moved into for.body3.preheader, which is the new outer loop preheader....before this patch, the entry block is not considered part of the loop ...

From what I can see, the "entry" is the preheader of the original outer loop in both cases, so not sure why it's treated differently. Maybe it's better to separate out the INTC, LNICM and LICM checks and show the full context for each set to better understand what's going on?

I thought about too, it might be cleaner to separate out the INTC, LNICM and LICM checks. Patch is updated accordingly.

And thanks for finding out the typos :)

Harbormaster completed remote builds in B167094: Diff 433153.May 31 2022, 12:22 PM

LGTM. Thanks!

bmahjour accepted this revision.May 31 2022, 12:47 PM

This revision is now accepted and ready to land.May 31 2022, 12:47 PM

Closed by commit rG006334470d8d: [LoopInterchange] New cost model for loop interchange (authored by congzhe). · Explain WhyJun 2 2022, 4:07 PM

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rG006334470d8d: [LoopInterchange] New cost model for loop interchange.

Hi, after this patch, our buildbot for VE has been failing, https://lab.llvm.org/buildbot/#/builders/91/builds/9844. Is it possible to inspect these failures? Thanks.

DaniilSuchkov added a reverting change: rGf1940a589516: Revert "[LoopInterchange] New cost model for loop interchange".Jun 2 2022, 5:56 PM

In D124926#3555002, @kaz7 wrote:

Hi, after this patch, our buildbot for VE has been failing, https://lab.llvm.org/buildbot/#/builders/91/builds/9844. Is it possible to inspect these failures? Thanks.

Thanks for letting me know. On my local machine I am not seeing test failures. Would you please let me know which test and what command failed so I can look into it? The link above does not seem to contain information.

(Update) "The link above does not seem to contain information": this was my gateway problem and I can see the failures now. This is strange though because as I checked both before and after landing this patch, there is no test failures on my local machine (x86).

In D124926#3555072, @congzhe wrote:

In D124926#3555002, @kaz7 wrote:

Hi, after this patch, our buildbot for VE has been failing, https://lab.llvm.org/buildbot/#/builders/91/builds/9844. Is it possible to inspect these failures? Thanks.

Thanks for letting me know. On my local machine I am not seeing test failures. Would you please let me know which test and what command failed so I can look into it? The link above does not seem to contain information.

The issue is likely that all tests now depend on the PowerPC target to be built. This won't work as expected when LLVM is built with the PPC backend disabled. This is causing the test failures.

The tests as they are written at the moment would need something like ; REQUIRES: powerpc-registered-target to only run them when the target is available. If a lot of tests require a specific target to run, usually they are place in PowerPC sub-directory, which has a lit file limiting them to run only when the target is available.

Having all tests depend on a single target is not ideal, because this limits the number of bots/configurations that run the tests. I think we need a way to keep most of the tests independent of the PowerPC or any other target and only have tests that explicitly test the cost-modeling depend on the target.

interchanged-loop-nest-3.ll, not-interchanged-loop-nest-3.ll, not-interchanged-tightly-nested.ll: the IR was not really entirely correct, since the target triple specified was "x86_64" but the getelementptr indices are 32 bits. The indices should be 64 bits since pointer arithmetric should be 64 bit. So I changed them from i32 to i64, otherwise it will trigger SCEV assertion failures when running loop cache analysis, which says "scev operand types mismatch".

I don't think changing the IR here the right solution to resolving the crash. AFAICT it is perfectly legal to have GEPs with i32 operands even if the target has 64 bit pointers, they smaller arguments will be sign-extended if necessary (see getelementptr semantics in LangRef, which has If the offsets have a different width from the pointer, they are sign-extended or truncated to the width of the pointer.)

LoopCacheAnalysis should not crash on valid IR and needs fixing.

Having all tests depend on a single target is not ideal, because this limits the number of bots/configurations that run the tests. I think we need a way to keep most of the tests independent of the PowerPC or any other target and only have tests that explicitly test the cost-modeling depend on the target.

The analysis uses getCacheLineSize() which returns 0 by default unless a subtarget overrides it. I think value 0 is meant for machines with no cache model (in which case loop interchange will likely not matter). Ideally all other targets should implement this function, but for the time being two ways I can think of to solve this without making the tests target-dependent are:

Define an option-controlled CacheLineSize inside LoopCacheAnalysis.cpp (with a default of say 64). If the option is specified we use the value supplied otherwise, we call getCacheLineSize() and use the value if it's non zero, or fall-back to the default value in case of zero.
Halt the analysis and invalidate the result when getCacheLineSize() returns zero. Clients must check the validity of result before using it, or query if the analysis can be computed for the specified target. This looks undesirable to me, since it makes the analysis less consumable. Also there seem to be targets that can benefit from interchange but just haven't implemented getCacheLineSize() yet.

In D124926#3555877, @bmahjour wrote:

Having all tests depend on a single target is not ideal, because this limits the number of bots/configurations that run the tests. I think we need a way to keep most of the tests independent of the PowerPC or any other target and only have tests that explicitly test the cost-modeling depend on the target.

The analysis uses getCacheLineSize() which returns 0 by default unless a subtarget overrides it. I think value 0 is meant for machines with no cache model (in which case loop interchange will likely not matter). Ideally all other targets should implement this function, but for the time being two ways I can think of to solve this without making the tests target-dependent are:

Define an option-controlled CacheLineSize inside LoopCacheAnalysis.cpp (with a default of say 64). If the option is specified we use the value supplied otherwise, we call getCacheLineSize() and use the value if it's non zero, or fall-back to the default value in case of zero.

Halt the analysis and invalidate the result when getCacheLineSize() returns zero. Clients must check the validity of result before using it, or query if the analysis can be computed for the specified target. This looks undesirable to me, since it makes the analysis less consumable. Also there seem to be targets that can benefit from interchange but just haven't implemented getCacheLineSize() yet.

Thanks Florian for finding the cause, and thanks Bardia for your suggestions. On this thread I think I can try your solution #1 first. I remember I actually tried exactly this approach weeks ago internally and it worked (although I never posted it). Let me take a second look and if it does work fine, I'll update my patch accordingly and reset the target triple? For the ones that had x86 previously I can reset it to x86, and for the ones that do not have a target triple I can remove it.

Thanks @congzhe.

For the ones that had x86 previously I can reset it to x86, and for the ones that do not have a target triple I can remove it.

If there is no need for x86 triple, we should remove it from the tests as well.

BTW, I also agree with Florian's comment about GEPs with i32 operands, but I'd be ok to deal with it in a separate patch.

congzhe reopened this revision.Jun 6 2022, 10:51 AM

This revision is now accepted and ready to land.Jun 6 2022, 10:51 AM

Updated the patch following Bardia's suggestion: 1) used an option-controlled CacheLineSize inside LoopCacheAnalysis.cpp. If the option is specified we use the value supplied otherwise, we call getCacheLineSize() and use the value if it's non zero, or fall-back to the default value in case of zero. 2) Removed all target triple lines in all loop interchange tests.

I will address Florian's comment on GEPs with i32 operands in my next patch.

congzhe updated this revision to Diff 434538.Jun 6 2022, 11:07 AM

Harbormaster completed remote builds in B168107: Diff 434538.Jun 6 2022, 11:47 AM

Hi Bardia @bmahjour , I'm wondering if you have comments on the most recent version? I'd appreciate it if you could let me know your thoughts.

Some additional comments:

Could you please add a test directly under llvm/test/Analysis/LoopCacheAnalysis (not under PowerPC) to make sure the analysis produces different but sane costs when a) neither target triple nor option specified and b) option is specified but with a value different from the default.
I noticed some tests still have target datalayout. Do we need to specify target datalayout? If not please remove the target datalayout.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
51–52 ↗	(On Diff #434538)

fhahn added inline comments.Jun 8 2022, 10:57 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
51–52 ↗	(On Diff #434538)	The name and description seems to imply this specifies the cache-line size in general, but here it only applies to the use in CacheCost. Should the setting be directly be applied in `TTI::getCacheLineSize`? Also, the change to add the option should probably be landed separately, with a test in `llvm/test/Analysis/LoopCacheAnalsysis`

congzhe mentioned this in D127342: [TargetTransformInfo] Added an option for the cache line size.Jun 8 2022, 1:30 PM

Thanks @bmahjour @fhahn, according to your comments I posted a separate preliminary patch D127342 that puts the cache line size option in TargetTransformInfo.cpp, and added a new test case under llvm/test/Analysis/LoopCacheAnalysis to make sure loop cache analysis produces different but sane costs. After landing D127342 I'll rebase the current patch on D127342.

Should the setting be directly be applied in TTI::getCacheLineSize

I would personally be in favor of such a change, however as I mentioned earlier the current default value of 0 might have been purposefully chosen to indicate a computer with no cache. For example I see this comment in MCSubtargetInfo.h:

/// Return the target cache line size in bytes.  By default, return
/// the line size for the bottom-most level of cache.  This provides
/// a more convenient interface for the common case where all cache
/// levels have the same line size.  Return zero if there is no
/// cache model.
///
virtual unsigned getCacheLineSize() const {
  Optional<unsigned> Size = getCacheLineSize(0);
  if (Size)
    return *Size;

  return 0;
}

I don't think any of the targets supported run without some level of memory caching, but we need to run it by the wider community to make sure there are no surprises. Maybe a separate TTI hook for checking if a cache model exists would be the best compromise?

In D124926#3570664, @bmahjour wrote:
Should the setting be directly be applied in TTI::getCacheLineSize

I would personally be in favor of such a change, however as I mentioned earlier the current default value of 0 might have been purposefully chosen to indicate a computer with no cache. For example I see this comment in MCSubtargetInfo.h:
/// Return the target cache line size in bytes.  By default, return
/// the line size for the bottom-most level of cache.  This provides
/// a more convenient interface for the common case where all cache
/// levels have the same line size.  Return zero if there is no
/// cache model.
///
virtual unsigned getCacheLineSize() const {
  Optional<unsigned> Size = getCacheLineSize(0);
  if (Size)
    return *Size;

  return 0;
}
I don't think any of the targets supported run without some level of memory caching, but we need to run it by the wider community to make sure there are no surprises. Maybe a separate TTI hook for checking if a cache model exists would be the best compromise?

Thanks for the comment, I see your concern. Would you like me to continue D127342, or would you like me to still keep the cache line size option under LoopCacheCost.cpp as we did in this patch? To avoid breaking computers with no cache, I could update CLS in LoopCacheCost.cpp to the following (I may need to change to wording a little bit) :

CLS = DefaultCacheLineSize.getNumOccurrences() > 0  ? DefaultCacheLineSize : TTI.getCacheLineSize()

and pass -cache-line-size=64 to all loop interchange tests. This way the buildbot test failures can be avoided, and in the meantime for a realworld computer with no cache it won't break.

Thanks for the comment, I see your concern. Would you like me to continue D127342, or would you like me to still keep the cache line size option under LoopCacheCost.cpp as we did in this patch? To avoid breaking computers with no cache, I could update CLS in LoopCacheCost.cpp to the following (I may need to change to wording a little bit) :
CLS = DefaultCacheLineSize.getNumOccurrences() > 0  ? DefaultCacheLineSize : TTI.getCacheLineSize()
and pass -cache-line-size=64 to all loop interchange tests. This way the buildbot test failures can be avoided, and in the meantime for a realworld computer with no cache it won't break.

I slightly prefer to continue with D127342, assuming you bring it up with the wider community (eg. through Discourse) and there are no objections. Not having cache is an exception (at best) rather than the norm, so it doesn't make much sense to me that the default assumes no cache.
If that doesn't pan out I'm ok with your second suggestion, but we'd also need to teach LoopCacheCost to handle CLS == 0 (otherwise it would divide by zero).

In D124926#3571264, @bmahjour wrote:
Thanks for the comment, I see your concern. Would you like me to continue D127342, or would you like me to still keep the cache line size option under LoopCacheCost.cpp as we did in this patch? To avoid breaking computers with no cache, I could update CLS in LoopCacheCost.cpp to the following (I may need to change to wording a little bit) :
CLS = DefaultCacheLineSize.getNumOccurrences() > 0  ? DefaultCacheLineSize : TTI.getCacheLineSize()
and pass -cache-line-size=64 to all loop interchange tests. This way the buildbot test failures can be avoided, and in the meantime for a realworld computer with no cache it won't break.
I slightly prefer to continue with D127342, assuming you bring it up with the wider community (eg. through Discourse) and there are no objections. Not having cache is an exception (at best) rather than the norm, so it doesn't make much sense to me that the default assumes no cache.
If that doesn't pan out I'm ok with your second suggestion, but we'd also need to teach LoopCacheCost to handle CLS == 0 (otherwise it would divide by zero).

Thanks for the suggestions, I'll bring it up through Discourse and if there are no objections I'll continue with D127342.

Regarding "divide by zero" it seems to me that if CLS==0 in LoopCacheCost then isConsecutive() will always be false, hence we will never calculate this equation (TripCount*Stride)/CLS. But anyways this seems to be a subtle point - I'll pursue D127342 for now.

Matt added a subscriber: Matt.Jun 10 2022, 3:19 PM

Based on the discussion in the Discourse (https://discourse.llvm.org/t/rfc-targettransforminfo-add-an-option-to-supply-cache-line-size-if-not-provided-by-the-target/63114) and D127342, I'm thinking that there might be two ways to move forward.

We can modify TargetTransformInfo.cpp in D127342 as below, and pass "-cache-line-size=64" to the RUN lines for loop interchange tests. Since as pointed out in the Discourse, some embedded devices have no cache so we could not default the cache line size to 64 in TargetTransformInfo.cpp. Moreover, loop interchange is a mid-end pass anyways so it might be difficult to take each backend target into account, we might just want to pass "-cache-line-size=64" as a general setup to loop interchange tests to ensure it works as expected in the mid-end.

unsigned TargetTransformInfo::getCacheLineSize() const {
  return CacheLineSize.getNumOccurrences() > 0 
             ? CacheLineSize
             : TTIImpl->getCacheLineSize();
}

We could mimic the tests in loop date prefetch, where the tests do depend on specific backend targets, and are put under different directories, i.e., test/Transforms/LoopDataPrefetch/AArch64/large-stride.ll, test/Transforms/LoopDataPrefetch/PowerPC/basic.ll. So we could keep the powperpc target triple in loop interchange tests, and put them under test/Transforms/LoopInterchange/PowerPC.

I'm wondering if you think any of the two approaches make sense? @bmahjour

After D127342 is merged, we could try to reland this patch, by providing -cache-line-size=64 for each loop interchange test case. I've updated the patch accordingly.

I did not remove the target datalayout lines for the moment. As Michael pointed out, it represents the pointer size and so on. I actually tried to remove those lines but I hit one test failure in phi-ordering.ll (a SCEV crash in loop cache analysis). As Florian mentioned earlier, there should be some bug within loop cache analysis, and I'll look into it in my next patch as promised. For now I'm thinking we could try to get this patch landed firstly.

Comments are appreciated :)

Harbormaster completed remote builds in B171150: Diff 438771.Jun 21 2022, 11:28 AM

In D124926#3599795, @congzhe wrote:

After D127342 is merged, we could try to reland this patch, by providing -cache-line-size=64 for each loop interchange test case. I've updated the patch accordingly.

I did not remove the target datalayout lines for the moment. As Michael pointed out, it represents the pointer size and so on. I actually tried to remove those lines but I hit one test failure in phi-ordering.ll (a SCEV crash in loop cache analysis). As Florian mentioned earlier, there should be some bug within loop cache analysis, and I'll look into it in my next patch as promised. For now I'm thinking we could try to get this patch landed firstly.

Comments are appreciated :)

Hi Bardia, I'm wondering if you have further comments on this most recent version? @bmahjour

LGTM. Thanks!

This revision was landed with ongoing or failed builds.Jun 23 2022, 1:35 PM

Closed by commit rG1b24fe34b06c: [LoopInterchange] New cost model for loop interchange (authored by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rG1b24fe34b06c: [LoopInterchange] New cost model for loop interchange.

This change causes ubsan failures due to integer overflow, see https://lab.llvm.org/buildbot/#/builders/5/builds/25185

/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp:702:30: runtime error: signed integer overflow: 6148914691236517209 * 100 cannot be represented in type 'long'
    #0 0x56504b89b41c in llvm::CacheCost::computeLoopCacheCost(llvm::Loop const&, llvm::SmallVector<llvm::SmallVector<std::__1::unique_ptr<llvm::IndexedReference, std::__1::default_delete<llvm::IndexedReference>>, 8u>, 8u> const&) const /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp:702:30
    #1 0x56504b899bdb in llvm::CacheCost::calculateCacheFootprint() /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp:598:28
    #2 0x56504b89a218 in std::__1::__unique_if<llvm::CacheCost>::__unique_single std::__1::make_unique<llvm::CacheCost, llvm::SmallVector<llvm::Loop*, 8u>&, llvm::LoopInfo&, llvm::ScalarEvolution&, llvm::TargetTransformInfo&, llvm::AAResults&, llvm::DependenceInfo&, llvm::Optional<unsigned int>&>(llvm::SmallVector<llvm::Loop*, 8u>&, llvm::LoopInfo&, llvm::ScalarEvolution&, llvm::TargetTransformInfo&, llvm::AAResults&, llvm::DependenceInfo&, llvm::Optional<unsigned int>&) /b/sanitizer-x86_64-linux-fast/build/libcxx_build_ubsan/include/c++/v1/__memory/unique_ptr.h:714:32
    #3 0x56504b899e74 in llvm::CacheCost::getCacheCost(llvm::Loop&, llvm::LoopStandardAnalysisResults&, llvm::DependenceInfo&, llvm::Optional<unsigned int>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Analysis/LoopCacheAnalysis.cpp:583:10
    #4 0x56504b888b74 in llvm::LoopInterchangePass::run(llvm::LoopNest&, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>&, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Scalar/LoopInterchange.cpp:1773:7
    #5 0x56504bf78ad1 in llvm::detail::PassModel<llvm::LoopNest, llvm::LoopInterchangePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>::run(llvm::LoopNest&, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>&, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #6 0x56504bffbbb5 in llvm::Optional<llvm::PreservedAnalyses> llvm::PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>::runSinglePass<llvm::LoopNest, std::__1::unique_ptr<llvm::detail::PassConcept<llvm::LoopNest, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>, std::__1::default_delete<llvm::detail::PassConcept<llvm::LoopNest, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>>>>(llvm::LoopNest&, std::__1::unique_ptr<llvm::detail::PassConcept<llvm::LoopNest, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>, std::__1::default_delete<llvm::detail::PassConcept<llvm::LoopNest, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>>>&, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>&, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&, llvm::PassInstrumentation&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Transforms/Scalar/LoopPassManager.h:398:16
    #7 0x56504bffb033 in llvm::PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>::runWithLoopNestPasses(llvm::Loop&, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>&, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Scalar/LoopPassManager.cpp:104:16
    #8 0x56504bffad4a in llvm::PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>::run(llvm::Loop&, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>&, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Scalar/LoopPassManager.cpp:31:32
    #9 0x56504bf28bb1 in llvm::detail::PassModel<llvm::Loop, llvm::PassManager<llvm::Loop, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&>::run(llvm::Loop&, llvm::AnalysisManager<llvm::Loop, llvm::LoopStandardAnalysisResults&>&, llvm::LoopStandardAnalysisResults&, llvm::LPMUpdater&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #10 0x56504bffc3c5 in llvm::FunctionToLoopPassAdaptor::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Scalar/LoopPassManager.cpp:297:22
    #11 0x56504bf66881 in llvm::detail::PassModel<llvm::Function, llvm::FunctionToLoopPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #12 0x56504b190883 in llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:522:21
    #13 0x565049418591 in llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #14 0x56504b194f17 in llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/IR/PassManager.cpp:127:22
    #15 0x565049418301 in llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:88:17
    #16 0x56504b18faa3 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/PassManager.h:522:21
    #17 0x565048f05f7c in llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/NewPMDriver.cpp:496:7
    #18 0x565048f21a18 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/tools/opt/opt.cpp:822:12
    #19 0x7fda3e3df09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) (BuildId: eb6a5dd378d22b1e695984462a799cd4c81cdc22)
    #20 0x565048ed7b79 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_ubsan/bin/opt+0x6368b79)

eugenis added a reverting change: rG878309cc54f1: Revert "[LoopInterchange] New cost model for loop interchange".Jun 23 2022, 4:11 PM

kawashima-fj added a subscriber: kawashima-fj.Jun 23 2022, 10:33 PM

The cost overflow is a known issue (see https://github.com/llvm/llvm-project/issues/55233). I wonder if specifying a larger cache line size (eg. -cache-line-size=128 or 256 ) would workaround the ubsan failures for now. Otherwise we'd have to fix 55233 before recommitting this, I guess.

Won't that just make the tests pass, but keep potential integer overflows (UB) in the compiler? Sounds suboptimal.

In D124926#3608260, @bmahjour wrote:

The cost overflow is a known issue (see https://github.com/llvm/llvm-project/issues/55233). I wonder if specifying a larger cache line size (eg. -cache-line-size=128 or 256 ) would workaround the ubsan failures for now. Otherwise we'd have to fix 55233 before recommitting this, I guess.

The test case that fails upstream test, i.e., test/Transforms/LoopInterchange/interchangeable-innerloop-multiple-indvars.ll, is not supposed to result in large values in loop cache analysis calculation. The loop costs associated with those loops should be pretty small, and overflow is not supposed to occur. This is a bit strange, and on my local machine all tests did pass. We may need to dig deeper to find why it fails upstream build bot, nevertheless not being able to reproduce the failure has caused a bit of headache for me.

In D124926#3608830, @congzhe wrote:

In D124926#3608260, @bmahjour wrote:

The cost overflow is a known issue (see https://github.com/llvm/llvm-project/issues/55233). I wonder if specifying a larger cache line size (eg. -cache-line-size=128 or 256 ) would workaround the ubsan failures for now. Otherwise we'd have to fix 55233 before recommitting this, I guess.

The test case that fails upstream test, i.e., test/Transforms/LoopInterchange/interchangeable-innerloop-multiple-indvars.ll, is not supposed to result in large values in loop cache analysis calculation. The loop costs associated with those loops should be pretty small, and overflow is not supposed to occur. This is a bit strange, and on my local machine all tests did pass. We may need to dig deeper to find why it fails upstream build bot, nevertheless not being able to reproduce the failure has caused a bit of headache for me.

Okay I missed the fact that this is a sanitizer test, it makes more sense to fail the test now since potentially loop cache analylsis could overflow. But I'm still not entirely clear - if loop cache analysis suffers from potential overflow it should have failed the tests in test/LoopCacheAnalysis/ quite early on at the first place. Why did test/LoopCacheAnalysis/ not fail the sanitizer tests before but test/LoopInterchange/ tests are failing now?

In D124926#3608904, @congzhe wrote:

In D124926#3608830, @congzhe wrote:

In D124926#3608260, @bmahjour wrote:

The cost overflow is a known issue (see https://github.com/llvm/llvm-project/issues/55233). I wonder if specifying a larger cache line size (eg. -cache-line-size=128 or 256 ) would workaround the ubsan failures for now. Otherwise we'd have to fix 55233 before recommitting this, I guess.

The test case that fails upstream test, i.e., test/Transforms/LoopInterchange/interchangeable-innerloop-multiple-indvars.ll, is not supposed to result in large values in loop cache analysis calculation. The loop costs associated with those loops should be pretty small, and overflow is not supposed to occur. This is a bit strange, and on my local machine all tests did pass. We may need to dig deeper to find why it fails upstream build bot, nevertheless not being able to reproduce the failure has caused a bit of headache for me.

Okay I missed the fact that this is a sanitizer test, it makes more sense to fail the test now since potentially loop cache analylsis could overflow. But I'm still not entirely clear - if loop cache analysis suffers from potential overflow it should have failed the tests in test/LoopCacheAnalysis/ quite early on at the first place. Why did test/LoopCacheAnalysis/ not fail the sanitizer tests before but test/LoopInterchange/ tests are failing now?

I find the problem - it is line 185 in interchangeable-innerloop-multiple-indvars.ll, which is %tobool2 = icmp eq i64 %indvars.add, 0. This actually caused the loop to be infinite since %indvars.add will never be 0. The trip count in this case would be a very large number which caused the overlfow. I've modified this line to be %tobool2 = icmp sle i64 %indvars.add, 0, which resolves the problem. I guess let me fix it and do a sanitizer check again before I re-commit it.

Yep. Ubsan does not detect potential overflows, it detects actual overflows as they happen. The problem is real.

Fixed the IR in interchangeable-innerloop-multiple-indvars.ll, will do another attempt to land.

Update: landed, hopefully will get through this time.

This revision was landed with ongoing or failed builds.Jun 27 2022, 9:09 PM

congzhe added a commit: rGb941857b40ed: [LoopInterchange] New cost model for loop interchange.

Harbormaster completed remote builds in B172377: Diff 440465.Jun 27 2022, 10:01 PM

Hello,

We stumbled upon a case where loop-interchange seems to start hanging and consume more and more memory until it cannot do that any longer with this patch:

opt -passes="loop(loop-interchange,loop-interchange)" bbi-72548.ll -o /dev/null

bbi-72548.ll1 KBDownload

In D124926#3718761, @uabelho wrote:
Hello,

We stumbled upon a case where loop-interchange seems to start hanging and consume more and more memory until it cannot do that any longer with this patch:
opt -passes="loop(loop-interchange,loop-interchange)" bbi-72548.ll -o /dev/null
bbi-72548.ll1 KBDownload

Hi Mikael, thanks for this. I've quickly looked at it and the problem can be avoided by running opt -passes="loop(loop-interchange),loop(loop-interchange))" bbi-72548.ll -o /dev/null using the new pass manager, or opt -loop-interchange -loop-interchange" bbi-72548.ll -o /dev/null using the legacy pass manager.

My impression is that the root cause is not within this patch but I suspect it is a problem with the loopnest pass pipeline within the new pass manager. This IR is a triply-nested loop, after the first interchange pass, it should still be a triply-nested loop. However, when running opt -passes="loop(loop-interchange,loop-interchange), what is populated into the loopnest data structure in the LoopInterchangePass::run() function at the beginning of the 2nd interchange pass is a 2-level (doubly-nested) loop. This is incorrect and caused the trouble (infinitely looping over the following code in LoopInterchange.cpp).

while (Dep.size() != Level) {
  Dep.push_back('I');
}

Seems like the loopnest pass pipeline does not get the loop correct, when the loop is modified during the pipeline.

When running opt -passes="loop(loop-interchange),loop(loop-interchange))" bbi-72548.ll -o /dev/null or opt -loop-interchange -loop-interchange" bbi-72548.ll -o /dev/null, what is populated into the loopnest data structure in the LoopInterchangePass::run() function at the beginning of the 2nd interchange pass is a 3-level (triply-nested) loop, which is correct.

I'll do more investigations and post updates here.

I wrote an issue about a failed assertion we started seeing with this patch:
https://github.com/llvm/llvm-project/issues/57148

Note: Both the hanging and the new crash have popped up during fuzz testing with nonstandard pipelines.

In D124926#3722578, @uabelho wrote:

I wrote an issue about a failed assertion we started seeing with this patch:
https://github.com/llvm/llvm-project/issues/57148

Note: Both the hanging and the new crash have popped up during fuzz testing with nonstandard pipelines.

Thanks, I've posted D132055 to fix pr57148.

congzhe mentioned this in D132199: [LoopPassManager] Ensure to construct loop nests with the outermost loop.Aug 18 2022, 9:02 PM

In D124926#3719876, @congzhe wrote:
In D124926#3718761, @uabelho wrote:
Hello,

We stumbled upon a case where loop-interchange seems to start hanging and consume more and more memory until it cannot do that any longer with this patch:
opt -passes="loop(loop-interchange,loop-interchange)" bbi-72548.ll -o /dev/null
bbi-72548.ll1 KBDownload
Hi Mikael, thanks for this. I've quickly looked at it and the problem can be avoided by running opt -passes="loop(loop-interchange),loop(loop-interchange))" bbi-72548.ll -o /dev/null using the new pass manager, or opt -loop-interchange -loop-interchange" bbi-72548.ll -o /dev/null using the legacy pass manager.

My impression is that the root cause is not within this patch but I suspect it is a problem with the loopnest pass pipeline within the new pass manager. This IR is a triply-nested loop, after the first interchange pass, it should still be a triply-nested loop. However, when running opt -passes="loop(loop-interchange,loop-interchange), what is populated into the loopnest data structure in the LoopInterchangePass::run() function at the beginning of the 2nd interchange pass is a 2-level (doubly-nested) loop. This is incorrect and caused the trouble (infinitely looping over the following code in LoopInterchange.cpp).
while (Dep.size() != Level) {
  Dep.push_back('I');
}
Seems like the loopnest pass pipeline does not get the loop correct, when the loop is modified during the pipeline.

When running opt -passes="loop(loop-interchange),loop(loop-interchange))" bbi-72548.ll -o /dev/null or opt -loop-interchange -loop-interchange" bbi-72548.ll -o /dev/null, what is populated into the loopnest data structure in the LoopInterchangePass::run() function at the beginning of the 2nd interchange pass is a 3-level (triply-nested) loop, which is correct.

I'll do more investigations and post updates here.

I've posted D132199 to fix the hanging problem.

congzhe mentioned this in rG6782d71680ea: [LoopPassManager] Ensure to construct loop nests with the outermost loop.Sep 21 2022, 9:00 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopInterchange.cpp

92 lines

test/

Transforms/

LICM/

lnicm.ll

234 lines

LoopInterchange/

call-instructions.ll

3 lines

currentLimitation.ll

7 lines

debuginfo.ll

3 lines

inner-indvar-depend-on-outer-indvar.ll

2 lines

inner-only-reductions.ll

2 lines

innermost-latch-uses-values-in-middle-header.ll

2 lines

interchange-flow-dep-outer.ll

3 lines

interchange-insts-between-indvar.ll

2 lines

interchange-no-deps.ll

31 lines

interchangeable-innerloop-multiple-indvars.ll

2 lines

interchangeable-outerloop-multiple-indvars.ll

2 lines

interchangeable.ll

5 lines

interchanged-loop-nest-3.ll

23 lines

lcssa-preheader.ll

4 lines

lcssa.ll

3 lines

loop-interchange-optimization-remarks.ll

10 lines

not-interchanged-dependencies-1.ll

3 lines

not-interchanged-loop-nest-3.ll

23 lines

not-interchanged-tightly-nested.ll

21 lines

outer-header-jump-to-inner-latch.ll

2 lines

outer-only-reductions.ll

2 lines

perserve-lcssa.ll

2 lines

phi-ordering.ll

7 lines

pr43176-move-to-new-latch.ll

2 lines

pr43326-ideal-access-pattern.ll

2 lines

pr43326.ll

2 lines

pr43473-invalid-lcssa-phis-in-inner-exit.ll

2 lines

pr43797-lcssa-for-multiple-outer-loop-blocks.ll

2 lines

pr45743-move-from-inner-preheader.ll

2 lines

pr48212.ll

2 lines

profitability.ll

3 lines

reductions-across-inner-and-outer-loop.ll

3 lines

update-condbranch-duplicate-successors.ll

2 lines

vector-gep-operand.ll

2 lines

Diff 438771

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

Show All 12 Lines

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/LoopInterchange.h" #include "llvm/Transforms/Scalar/LoopInterchange.h"

#include "llvm/ADT/STLExtras.h" #include "llvm/ADT/STLExtras.h"

#include "llvm/ADT/SmallVector.h" #include "llvm/ADT/SmallVector.h"

#include "llvm/ADT/Statistic.h" #include "llvm/ADT/Statistic.h"

#include "llvm/ADT/StringRef.h" #include "llvm/ADT/StringRef.h"

#include "llvm/Analysis/DependenceAnalysis.h" #include "llvm/Analysis/DependenceAnalysis.h"

#include "llvm/Analysis/LoopCacheAnalysis.h"

#include "llvm/Analysis/LoopInfo.h" #include "llvm/Analysis/LoopInfo.h"

#include "llvm/Analysis/LoopNestAnalysis.h" #include "llvm/Analysis/LoopNestAnalysis.h"

#include "llvm/Analysis/LoopPass.h" #include "llvm/Analysis/LoopPass.h"

#include "llvm/Analysis/OptimizationRemarkEmitter.h" #include "llvm/Analysis/OptimizationRemarkEmitter.h"

#include "llvm/Analysis/ScalarEvolution.h" #include "llvm/Analysis/ScalarEvolution.h"

#include "llvm/Analysis/ScalarEvolutionExpressions.h" #include "llvm/Analysis/ScalarEvolutionExpressions.h"

#include "llvm/IR/BasicBlock.h" #include "llvm/IR/BasicBlock.h"

#include "llvm/IR/Constants.h" #include "llvm/IR/Constants.h"

Show All 25 Lines

#define DEBUG_TYPE "loop-interchange" #define DEBUG_TYPE "loop-interchange"

STATISTIC(LoopsInterchanged, "Number of loops interchanged"); STATISTIC(LoopsInterchanged, "Number of loops interchanged");

static cl::opt<int> LoopInterchangeCostThreshold( static cl::opt<int> LoopInterchangeCostThreshold(

"loop-interchange-threshold", cl::init(0), cl::Hidden, "loop-interchange-threshold", cl::init(0), cl::Hidden,

cl::desc("Interchange if you gain more than this number")); cl::desc("Interchange if you gain more than this number"));

namespace { namespace {

bmahjourUnsubmitted

Not Done

Not sure if we need an option, I slightly prefer removing it, because the fact that we have an "EnableLegacy" option for it that defaults to "true" might incorrectly imply that we still use the old cost model where in fact we use the new one for majority of cases.

bmahjour: Not sure if we need an option, I slightly prefer removing it, because the fact that we have an…

congzheAuthorUnsubmitted

Done

Thanks, that's a good point. I've removed this flag.

congzhe: Thanks, that's a good point. I've removed this flag.

using LoopVector = SmallVector<Loop *, 8>; using LoopVector = SmallVector<Loop *, 8>;

// TODO: Check if we can use a sparse matrix here. // TODO: Check if we can use a sparse matrix here.

using CharMatrix = std::vector<std::vector<char>>; using CharMatrix = std::vector<std::vector<char>>;

} // end anonymous namespace } // end anonymous namespace

▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines

/// loop. /// loop.

class LoopInterchangeProfitability { class LoopInterchangeProfitability {

public: public:

LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE, LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE,

OptimizationRemarkEmitter *ORE) OptimizationRemarkEmitter *ORE)

: OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE) {} : OuterLoop(Outer), InnerLoop(Inner), SE(SE), ORE(ORE) {}

/// Check if the loop interchange is profitable. /// Check if the loop interchange is profitable.

bool isProfitable(unsigned InnerLoopId, unsigned OuterLoopId, bool isProfitable(const Loop *InnerLoop, const Loop *OuterLoop,

CharMatrix &DepMatrix); unsigned InnerLoopId, unsigned OuterLoopId,

CharMatrix &DepMatrix,

const DenseMap<const Loop *, unsigned> &CostMap);

private: private:

int getInstrOrderCost(); int getInstrOrderCost();

Loop *OuterLoop; Loop *OuterLoop;

Loop *InnerLoop; Loop *InnerLoop;

/// Scev analysis. /// Scev analysis.

Show All 34 Lines private:

const LoopInterchangeLegality &LIL; const LoopInterchangeLegality &LIL;

}; };

struct LoopInterchange { struct LoopInterchange {

ScalarEvolution *SE = nullptr; ScalarEvolution *SE = nullptr;

LoopInfo *LI = nullptr; LoopInfo *LI = nullptr;

DependenceInfo *DI = nullptr; DependenceInfo *DI = nullptr;

DominatorTree *DT = nullptr; DominatorTree *DT = nullptr;

std::unique_ptr<CacheCost> CC = nullptr;

/// Interface to emit optimization remarks. /// Interface to emit optimization remarks.

OptimizationRemarkEmitter *ORE; OptimizationRemarkEmitter *ORE;

LoopInterchange(ScalarEvolution *SE, LoopInfo *LI, DependenceInfo *DI, LoopInterchange(ScalarEvolution *SE, LoopInfo *LI, DependenceInfo *DI,

DominatorTree *DT, OptimizationRemarkEmitter *ORE) DominatorTree *DT, std::unique_ptr<CacheCost> &CC,

: SE(SE), LI(LI), DI(DI), DT(DT), ORE(ORE) {} OptimizationRemarkEmitter *ORE)

: SE(SE), LI(LI), DI(DI), DT(DT), CC(std::move(CC)), ORE(ORE) {}

bool run(Loop *L) { bool run(Loop *L) {

if (L->getParentLoop()) if (L->getParentLoop())

return false; return false;

SmallVector<Loop *, 8> LoopList; SmallVector<Loop *, 8> LoopList;

populateWorklist(*L, LoopList); populateWorklist(*L, LoopList);

return processLoopList(LoopList); return processLoopList(LoopList);

} }

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines #endif

// Get the Outermost loop exit. // Get the Outermost loop exit.

BasicBlock *LoopNestExit = OuterMostLoop->getExitBlock(); BasicBlock *LoopNestExit = OuterMostLoop->getExitBlock();

if (!LoopNestExit) { if (!LoopNestExit) {

LLVM_DEBUG(dbgs() << "OuterMostLoop needs an unique exit block"); LLVM_DEBUG(dbgs() << "OuterMostLoop needs an unique exit block");

return false; return false;

} }

unsigned SelecLoopId = selectLoopForInterchange(LoopList); unsigned SelecLoopId = selectLoopForInterchange(LoopList);

// Obtain the loop vector returned from loop cache analysis beforehand,

// and put each <Loop, index> pair into a map for constant time query

bmahjourUnsubmitted

Not Done

[nit] and populate the <Loop, index> pair into a map -> and put each <Loop, index> pair into a map...

bmahjour: [nit] `and populate the <Loop, index> pair into a map` -> `and put each <Loop, index> pair into…

// later. Indices in loop vector reprsent the optimal order of the

bmahjourUnsubmitted

Not Done

Add a comment about what the index reprsents (ie the optimal order of loops in the nest).

bmahjour: Add a comment about what the index reprsents (ie the optimal order of loops in the nest).

congzheAuthorUnsubmitted

Done

Thanks and updated accordingly.

congzhe: Thanks and updated accordingly.

// corresponding loop, e.g., given a loopnest with depth N, index 0

// indicates the loop should be placed as the outermost loop and index N

// indicates the loop should be placed as the innermost loop.

bmahjourUnsubmitted

Not Done

Since CC can be nullptr we need to assert that that's not the case.

bmahjour: Since `CC` can be nullptr we need to assert that that's not the case.

// For the old pass manager CacheCost would be null.

DenseMap<const Loop *, unsigned> CostMap;

bmahjourUnsubmitted

Not Done

// indicates the loop should be placed as the innermost loop.

- // For the new pass manager CacheCost should not be null, which is not

- // the case for the old pass manager.

+ // For the old pass manager CacheCost would be null.

DenseMap<const Loop *, unsigned> CostMap;

bmahjour:

if (CC != nullptr) {

const auto &LoopCosts = CC->getLoopCosts();

for (unsigned i = 0; i < LoopCosts.size(); i++) {

CostMap[LoopCosts[i].first] = i;

}

// We try to achieve the globally optimal memory access for the loopnest, // We try to achieve the globally optimal memory access for the loopnest,

// and do interchange based on a bubble-sort fasion. We start from // and do interchange based on a bubble-sort fasion. We start from

// the innermost loop, move it outwards to the best possible position // the innermost loop, move it outwards to the best possible position

// and repeat this process. // and repeat this process.

for (unsigned j = SelecLoopId; j > 0; j--) { for (unsigned j = SelecLoopId; j > 0; j--) {

bool ChangedPerIter = false; bool ChangedPerIter = false;

for (unsigned i = SelecLoopId; i > SelecLoopId - j; i--) { for (unsigned i = SelecLoopId; i > SelecLoopId - j; i--) {

bool Interchanged = processLoop(LoopList[i], LoopList[i - 1], i, i - 1, bool Interchanged = processLoop(LoopList[i], LoopList[i - 1], i, i - 1,

DependencyMatrix); DependencyMatrix, CostMap);

if (!Interchanged) if (!Interchanged)

continue; continue;

// Loops interchanged, update LoopList accordingly. // Loops interchanged, update LoopList accordingly.

std::swap(LoopList[i - 1], LoopList[i]); std::swap(LoopList[i - 1], LoopList[i]);

// Update the DependencyMatrix // Update the DependencyMatrix

interChangeDependencies(DependencyMatrix, i, i - 1); interChangeDependencies(DependencyMatrix, i, i - 1);

#ifdef DUMP_DEP_MATRICIES #ifdef DUMP_DEP_MATRICIES

LLVM_DEBUG(dbgs() << "Dependence after interchange\n"); LLVM_DEBUG(dbgs() << "Dependence after interchange\n");

printDepMatrix(DependencyMatrix); printDepMatrix(DependencyMatrix);

#endif #endif

ChangedPerIter |= Interchanged; ChangedPerIter |= Interchanged;

Changed |= Interchanged; Changed |= Interchanged;

} }

// Early abort if there was no interchange during an entire round of // Early abort if there was no interchange during an entire round of

// moving loops outwards. // moving loops outwards.

if (!ChangedPerIter) if (!ChangedPerIter)

break; break;

} }

return Changed; return Changed;

} }

bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId, bool processLoop(Loop *InnerLoop, Loop *OuterLoop, unsigned InnerLoopId,

unsigned OuterLoopId, unsigned OuterLoopId,

std::vector<std::vector<char>> &DependencyMatrix) { std::vector<std::vector<char>> &DependencyMatrix,

const DenseMap<const Loop *, unsigned> &CostMap) {

LLVM_DEBUG(dbgs() << "Processing InnerLoopId = " << InnerLoopId LLVM_DEBUG(dbgs() << "Processing InnerLoopId = " << InnerLoopId

<< " and OuterLoopId = " << OuterLoopId << "\n"); << " and OuterLoopId = " << OuterLoopId << "\n");

LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, ORE); LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, ORE);

if (!LIL.canInterchangeLoops(InnerLoopId, OuterLoopId, DependencyMatrix)) { if (!LIL.canInterchangeLoops(InnerLoopId, OuterLoopId, DependencyMatrix)) {

LLVM_DEBUG(dbgs() << "Not interchanging loops. Cannot prove legality.\n"); LLVM_DEBUG(dbgs() << "Not interchanging loops. Cannot prove legality.\n");

return false; return false;

} }

LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n"); LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n");

LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE); LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE);

if (!LIP.isProfitable(InnerLoopId, OuterLoopId, DependencyMatrix)) { if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId,

DependencyMatrix, CostMap)) {

LLVM_DEBUG(dbgs() << "Interchanging loops not profitable.\n"); LLVM_DEBUG(dbgs() << "Interchanging loops not profitable.\n");

return false; return false;

} }

ORE->emit([&]() { ORE->emit([&]() {

return OptimizationRemark(DEBUG_TYPE, "Interchanged", return OptimizationRemark(DEBUG_TYPE, "Interchanged",

InnerLoop->getStartLoc(), InnerLoop->getStartLoc(),

InnerLoop->getHeader()) InnerLoop->getHeader())

▲ Show 20 Lines • Show All 577 Lines • ▼ Show 20 Lines if (Row[OuterLoopId] != '=')

return false; return false;

} }

// If outer loop has dependence and inner loop is loop independent then it is // If outer loop has dependence and inner loop is loop independent then it is

// profitable to interchange to enable parallelism. // profitable to interchange to enable parallelism.

// If there are no dependences, interchanging will not improve anything. // If there are no dependences, interchanging will not improve anything.

return !DepMatrix.empty(); return !DepMatrix.empty();

} }

bool LoopInterchangeProfitability::isProfitable(unsigned InnerLoopId, bool LoopInterchangeProfitability::isProfitable(

unsigned OuterLoopId, const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,

bmahjourUnsubmitted

Not Done

I think we should try to limit the use of legacy cost model only to cases were LoopCacheCost fails due to delinearization, as you mentioned in the description. Can we do it like this:

if (CostMap.find(InnerLoop) != CostMap.end() &&
      CostMap.find(OuterLoop) != CostMap.end()) {
...
} else {
   Cost = getInstrOrderCost();
    LLVM_DEBUG(dbgs() << "Cost = " << Cost << "\n");
    if (Cost < -LoopInterchangeCostThreshold)
      return true;
}

bmahjour: I think we should try to limit the use of legacy cost model only to cases were LoopCacheCost…

congzheAuthorUnsubmitted

Done

Thanks, I updated the code as you suggested.

congzhe: Thanks, I updated the code as you suggested.

CharMatrix &DepMatrix) { unsigned OuterLoopId, CharMatrix &DepMatrix,

// TODO: Add better profitability checks. const DenseMap<const Loop *, unsigned> &CostMap) {

// e.g // TODO: Remove the legacy cost model.

// 1) Construct dependency matrix and move the one with no loop carried dep

// inside to enable vectorization. // This is the new cost model returned from loop cache analysis.

// A smaller index means the loop should be placed an outer loop, and vice

// This is rough cost estimation algorithm. It counts the good and bad order // versa.

// of induction variables in the instruction and allows reordering if number if (CostMap.find(InnerLoop) != CostMap.end() &&

bmahjourUnsubmitted

Not Done

[nit] IndexInner -> InnerIndex
[nit] IndexOuter -> OuterIndex

bmahjour: [nit] IndexInner -> InnerIndex [nit] IndexOuter -> OuterIndex

// of bad orders is more than good. CostMap.find(OuterLoop) != CostMap.end()) {

unsigned InnerIndex = 0, OuterIndex = 0;

InnerIndex = CostMap.find(InnerLoop)->second;

OuterIndex = CostMap.find(OuterLoop)->second;

LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex

<< ", OuterIndex = " << OuterIndex << "\n");

bmahjourUnsubmitted

Not Done

OuterIndex = CostMap.find(OuterLoop)->second;

- LLVM_DEBUG(dbgs() << "IndexInner = " << InnerIndex

- << ", IndexOuter = " << OuterIndex << "\n");

+ LLVM_DEBUG(dbgs() << "InnerIndex = " << InnerIndex

+ << ", OuterIndex = " << OuterIndex << "\n");

if (InnerIndex < OuterIndex)

bmahjour:

if (InnerIndex < OuterIndex)

return true;

} else {

// Legacy cost model: this is rough cost estimation algorithm. It counts the

// good and bad order of induction variables in the instruction and allows

// reordering if number of bad orders is more than good.

int Cost = getInstrOrderCost(); int Cost = getInstrOrderCost();

LLVM_DEBUG(dbgs() << "Cost = " << Cost << "\n"); LLVM_DEBUG(dbgs() << "Cost = " << Cost << "\n");

if (Cost < -LoopInterchangeCostThreshold) if (Cost < -LoopInterchangeCostThreshold)

return true; return true;

}

// It is not profitable as per current cache profitability model. But check if // It is not profitable as per current cache profitability model. But check if

// we can move this loop outside to improve parallelism. // we can move this loop outside to improve parallelism.

if (isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix)) if (isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix))

return true; return true;

ORE->emit([&]() { ORE->emit([&]() {

bmahjourUnsubmitted

Not Done

Can we unify the optimization remark output for both cost models? I don't think the specific cost values are of interest to the user, so just saying whether we found the loops profitable or not should suffice. For developers interested in the actual cost, they can get it through -debug output.

bmahjour: Can we unify the optimization remark output for both cost models? I don't think the specific…

return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable", return OptimizationRemarkMissed(DEBUG_TYPE, "InterchangeNotProfitable",

InnerLoop->getStartLoc(), InnerLoop->getStartLoc(),

InnerLoop->getHeader()) InnerLoop->getHeader())

<< "Interchanging loops is too costly (cost=" << "Interchanging loops is too costly and it does not improve "

<< ore::NV("Cost", Cost) << ", threshold=" "parallelism.";

<< ore::NV("Threshold", LoopInterchangeCostThreshold)

<< ") and it does not improve parallelism.";

}); });

return false; return false;

} }

void LoopInterchangeTransform::removeChildLoop(Loop *OuterLoop, void LoopInterchangeTransform::removeChildLoop(Loop *OuterLoop,

Loop *InnerLoop) { Loop *InnerLoop) {

for (Loop *L : *OuterLoop) for (Loop *L : *OuterLoop)

if (L == InnerLoop) { if (L == InnerLoop) {

▲ Show 20 Lines • Show All 529 Lines • ▼ Show 20 Lines bool runOnLoop(Loop *L, LPPassManager &LPM) override {

if (skipLoop(L)) if (skipLoop(L))

return false; return false;

auto *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE(); auto *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();

auto *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo(); auto *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();

auto *DI = &getAnalysis<DependenceAnalysisWrapperPass>().getDI(); auto *DI = &getAnalysis<DependenceAnalysisWrapperPass>().getDI();

auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree(); auto *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();

auto *ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE(); auto *ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();

std::unique_ptr<CacheCost> CC = nullptr;

return LoopInterchange(SE, LI, DI, DT, ORE).run(L); return LoopInterchange(SE, LI, DI, DT, CC, ORE).run(L);

} }

}; };

} // namespace } // namespace

char LoopInterchangeLegacyPass::ID = 0; char LoopInterchangeLegacyPass::ID = 0;

INITIALIZE_PASS_BEGIN(LoopInterchangeLegacyPass, "loop-interchange", INITIALIZE_PASS_BEGIN(LoopInterchangeLegacyPass, "loop-interchange",

"Interchanges loops for cache reuse", false, false) "Interchanges loops for cache reuse", false, false)

Show All 10 Lines

PreservedAnalyses LoopInterchangePass::run(LoopNest &LN, PreservedAnalyses LoopInterchangePass::run(LoopNest &LN,

LoopAnalysisManager &AM, LoopAnalysisManager &AM,

LoopStandardAnalysisResults &AR, LoopStandardAnalysisResults &AR,

LPMUpdater &U) { LPMUpdater &U) {

Function &F = *LN.getParent(); Function &F = *LN.getParent();

DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI); DependenceInfo DI(&F, &AR.AA, &AR.SE, &AR.LI);

std::unique_ptr<CacheCost> CC =

CacheCost::getCacheCost(LN.getOutermostLoop(), AR, DI);

OptimizationRemarkEmitter ORE(&F); OptimizationRemarkEmitter ORE(&F);

if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, &ORE).run(LN)) if (!LoopInterchange(&AR.SE, &AR.LI, &DI, &AR.DT, CC, &ORE).run(LN))

return PreservedAnalyses::all(); return PreservedAnalyses::all();

return getLoopPassPreservedAnalyses(); return getLoopPassPreservedAnalyses();

} }

llvm/test/Transforms/LICM/lnicm.ll

	; RUN: opt -aa-pipeline=basic-aa -passes='loop(loop-interchange)' -S %s \| FileCheck %s --check-prefixes INTC			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -aa-pipeline=basic-aa -passes='loop-mssa(lnicm),loop(loop-interchange)' -S %s \| FileCheck %s --check-prefixes LNICM,CHECK			; RUN: opt -aa-pipeline=basic-aa -passes='loop(loop-interchange)' -cache-line-size=64 -S %s \| FileCheck %s --check-prefixes INTC
	; RUN: opt -aa-pipeline=basic-aa -passes='loop-mssa(licm),loop(loop-interchange)' -S %s \| FileCheck %s --check-prefixes LICM,CHECK			; RUN: opt -aa-pipeline=basic-aa -passes='loop-mssa(lnicm),loop(loop-interchange)' -cache-line-size=64 -S %s \| FileCheck %s --check-prefixes LNICM
				; RUN: opt -aa-pipeline=basic-aa -passes='loop-mssa(licm),loop(loop-interchange)' -cache-line-size=64 -S %s \| FileCheck %s --check-prefixes LICM

	; This test represents the following function:			; This test represents the following function:
	; void test(int x[10][10], int y[10], int *z) {			; void test(int n, int m, int x[m][n], int y[n], int *z) {
	; for (int k = 0; k < 10; k++) {			; for (int k = 0; k < n; k++) {
				bmahjourUnsubmitted Not Done Reply Inline Actions [nit] please add `n` and `m` as parameter. ie `void test(int n, int m, int x[m][n], int y[n], int z) {` bmahjour:* [nit] please add `n` and `m` as parameter. ie `void test(int n, int m, int x[m][n], int y[n]…
	; int tmp = *z;			; int tmp = *z;
	; for (int i = 0; i < 10; i++)			; for (int i = 0; i < m; i++)
	; x[i][k] += y[k] + tmp;			; x[i][k] += y[k] + tmp;
	; }			; }
	; }			; }
	; We only want to hoist the load of z out of the loop nest.			; We only want to hoist the load of z out of the loop nest.
	; LICM hoists the load of y[k] out of the i-loop, but LNICM doesn't do so			; LICM hoists the load of y[k] out of the i-loop, but LNICM doesn't do so
	; to keep perfect loop nest. This enables optimizations that require			; to keep perfect loop nest. This enables optimizations that require
	; perfect loop nest (e.g. loop-interchange) to perform.			; perfect loop nest (e.g. loop-interchange) to perform.


	define dso_local void @test([10 x i32]* noalias %x, i32* noalias readonly %y, i32* readonly %z) {			define dso_local void @test(i64 %n, i64 %m, ptr noalias %x, ptr noalias readonly %y, ptr readonly %z) {
				bmahjourUnsubmitted Not Done Reply Inline Actions [nit] correct order based on the C code above. bmahjour: [nit] correct order based on the C code above.
				bmahjourUnsubmitted Not Done Reply Inline Actions Why can't this be kept as CHECK-NEXT (like before)? The load of `z` should be moved out by both LICM and LNICM I think. bmahjour: Why can't this be kept as CHECK-NEXT (like before)? The load of `z` should be moved out by both…
	; CHECK-LABEL: @test(			; The loopnest is not interchanged when we only run loop interchange.
	; CHECK-NEXT: entry:			; INTC-LABEL: @test(
	; CHECK-NEXT: [[Z:%.]] = load i32, i32 %z, align 4			; INTC-NEXT: gurad:
	; CHECK-NEXT: br label [[FOR_BODY3_PREHEADER:%.*]]			; INTC-NEXT: [[CMP23:%.]] = icmp sgt i64 [[M:%.]], 0
	; LNICM: for.body.preheader:			; INTC-NEXT: [[CMP32:%.]] = icmp sgt i64 [[N:%.]], 0
	; LICM-NOT: for.body.preheader:			; INTC-NEXT: br i1 [[CMP23]], label [[FOR_COND1_PREHEADER_LR_PH:%.]], label [[FOR_END11:%.]]
	; INTC-NOT: for.body.preheader:			; INTC: for.cond1.preheader.lr.ph:
				; INTC-NEXT: br i1 [[CMP32]], label [[FOR_I_PREHEADER:%.*]], label [[FOR_END11]]
				; INTC: for.i.preheader:
				; INTC-NEXT: br label [[ENTRY:%.*]]
				; INTC: entry:
				; INTC-NEXT: br label [[FOR_BODY:%.*]]
				; INTC: for.body:
				; INTC-NEXT: [[K_02:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC10:%.]], [[FOR_END:%.*]] ]
				; INTC-NEXT: [[TMP0:%.]] = load i32, ptr [[Z:%.]], align 4
				; INTC-NEXT: br label [[FOR_BODY3:%.*]]
				; INTC: for.body3:
				; INTC-NEXT: [[I_01:%.]] = phi i32 [ 0, [[FOR_BODY]] ], [ [[INC:%.]], [[FOR_BODY3]] ]
				; INTC-NEXT: [[IDXPROM:%.*]] = sext i32 [[K_02]] to i64
				; INTC-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[Y:%.]], i64 [[IDXPROM]]
				; INTC-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; INTC-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP1]], [[TMP0]]
				; INTC-NEXT: [[IDXPROM4:%.*]] = sext i32 [[I_01]] to i64
				; INTC-NEXT: [[INDEX0:%.*]] = mul i64 [[IDXPROM4]], [[N]]
				; INTC-NEXT: [[INDEX1:%.*]] = add i64 [[INDEX0]], [[IDXPROM]]
				; INTC-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, ptr [[X:%.]], i64 [[INDEX1]]
				; INTC-NEXT: [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX7]], align 4
				; INTC-NEXT: [[ADD8:%.*]] = add nsw i32 [[TMP2]], [[ADD]]
				; INTC-NEXT: store i32 [[ADD8]], ptr [[ARRAYIDX7]], align 4
				; INTC-NEXT: [[INC]] = add nsw i32 [[I_01]], 1
				; INTC-NEXT: [[INC_EXT:%.*]] = sext i32 [[INC]] to i64
				; INTC-NEXT: [[CMP2:%.*]] = icmp slt i64 [[INC_EXT]], [[M]]
				; INTC-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END]], !llvm.loop [[LOOP0:![0-9]+]]
				; INTC: for.end:
				; INTC-NEXT: [[INC10]] = add nsw i32 [[K_02]], 1
				; INTC-NEXT: [[INC10_EXT:%.*]] = sext i32 [[INC10]] to i64
				; INTC-NEXT: [[CMP:%.*]] = icmp slt i64 [[INC10_EXT]], [[N]]
				; INTC-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END11_LOOPEXIT:%.*]], !llvm.loop [[LOOP2:![0-9]+]]
				; INTC: for.end11.loopexit:
				; INTC-NEXT: br label [[FOR_END11]]
				; INTC: for.end11:
				; INTC-NEXT: ret void
				;
				; The loopnest is interchanged when we run lnicm and loop interchange.
				; LNICM-LABEL: @test(
				; LNICM-NEXT: gurad:
				; LNICM-NEXT: [[CMP23:%.]] = icmp sgt i64 [[M:%.]], 0
				; LNICM-NEXT: [[CMP32:%.]] = icmp sgt i64 [[N:%.]], 0
				; LNICM-NEXT: br i1 [[CMP23]], label [[FOR_COND1_PREHEADER_LR_PH:%.]], label [[FOR_END11:%.]]
				; LNICM: for.cond1.preheader.lr.ph:
				; LNICM-NEXT: br i1 [[CMP32]], label [[FOR_I_PREHEADER:%.*]], label [[FOR_END11]]
				; LNICM: for.i.preheader:
				; LNICM-NEXT: br label [[FOR_BODY3_PREHEADER:%.*]]
				; LNICM: entry:
	; LNICM-NEXT: br label [[FOR_BODY:%.*]]			; LNICM-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; LNICM: for.body:
	; LNICM-NEXT: [[K:%.]] = phi i32 [ [[INC10:%.]], [[FOR_END:%.]] ], [ 0, [[FOR_BODY_PREHEADER:%.]] ]			; LNICM-NEXT: [[K_02:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC10:%.]], [[FOR_END:%.]] ]
	; LNICM-NEXT: br label [[FOR_BODY3_SPLIT1:%.*]]			; LNICM-NEXT: br label [[FOR_BODY3_SPLIT1:%.*]]
	; LICM: [[TMP:%.]] = load i32, i32 [[ARRAYIDX:%.*]], align 4
	; LNICM: for.body3.preheader:			; LNICM: for.body3.preheader:
	; LICM-NOT: for.body3.preheader:			; LNICM-NEXT: [[TMP0:%.]] = load i32, ptr [[Z:%.]], align 4
	; INTC-NOT: for.body3.preheader:
	; LNICM-NEXT: br label [[FOR_BODY3:%.*]]			; LNICM-NEXT: br label [[FOR_BODY3:%.*]]
	; CHECK: for.body3:			; LNICM: for.body3:
	; LNICM-NEXT: [[I:%.]] = phi i32 [ [[TMP3:%.]], [[FOR_BODY3_SPLIT:%.]] ], [ 0, [[FOR_BODY3_PREHEADER:%.]] ]			; LNICM-NEXT: [[I_01:%.]] = phi i32 [ [[TMP3:%.]], [[FOR_BODY3_SPLIT:%.*]] ], [ 0, [[FOR_BODY3_PREHEADER]] ]
	; LNICM-NEXT: br label [[FOR_BODY_PREHEADER:%.*]]			; LNICM-NEXT: br label [[ENTRY]]
	; LNICM: for.body3.split1:			; LNICM: for.body3.split1:
	; LNICM-NEXT: [[IDXPROM:%.]] = sext i32 [[K:%.]] to i64			; LNICM-NEXT: [[IDXPROM:%.*]] = sext i32 [[K_02]] to i64
	; LNICM-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 %y, i64 [[IDXPROM:%.*]]			; LNICM-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[Y:%.]], i64 [[IDXPROM]]
	; LNICM-NEXT: [[TMP:%.]] = load i32, i32 [[ARRAYIDX:%.*]], align 4			; LNICM-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
	; LNICM-NEXT: [[ADD:%.]] = add nsw i32 [[TMP:%.]], [[Z:%.*]]			; LNICM-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP1]], [[TMP0]]
	; LNICM-NEXT: [[IDXPROM4:%.]] = sext i32 [[I:%.]] to i64			; LNICM-NEXT: [[IDXPROM4:%.*]] = sext i32 [[I_01]] to i64
	; LNICM-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds [10 x i32], [10 x i32] %x, i64 [[IDXPROM4:%.*]]			; LNICM-NEXT: [[INDEX0:%.*]] = mul i64 [[IDXPROM4]], [[N]]
	; LNICM-NEXT: [[IDXPROM6:%.]] = sext i32 [[K:%.]] to i64			; LNICM-NEXT: [[INDEX1:%.*]] = add i64 [[INDEX0]], [[IDXPROM]]
	; LNICM-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds [10 x i32], [10 x i32] [[ARRAYIDX5:%.]], i64 0, i64 [[IDXPROM6:%.]]			; LNICM-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, ptr [[X:%.]], i64 [[INDEX1]]
	; LNICM-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX7:%.*]], align 4			; LNICM-NEXT: [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX7]], align 4
	; LNICM-NEXT: [[ADD8:%.]] = add nsw i32 [[TMP2:%.]], [[ADD:%.*]]			; LNICM-NEXT: [[ADD8:%.*]] = add nsw i32 [[TMP2]], [[ADD]]
	; LNICM-NEXT: store i32 [[ADD8:%.]], i32 [[ARRAYIDX7:%.*]], align 4			; LNICM-NEXT: store i32 [[ADD8]], ptr [[ARRAYIDX7]], align 4
	; LNICM-NEXT: [[INC:%.]] = add nsw i32 [[I:%.]], 1			; LNICM-NEXT: [[INC:%.*]] = add nsw i32 [[I_01]], 1
	; LNICM-NEXT: [[CMP2:%.]] = icmp slt i32 [[INC:%.]], 10			; LNICM-NEXT: [[INC_EXT:%.*]] = sext i32 [[INC]] to i64
	; LNICM-NEXT: br label [[FOR_END:%.*]]			; LNICM-NEXT: [[CMP2:%.*]] = icmp slt i64 [[INC_EXT]], [[M]]
				; LNICM-NEXT: br label [[FOR_END]]
	; LNICM: for.body3.split:			; LNICM: for.body3.split:
	; LICM-NOT: for.body3.split:			; LNICM-NEXT: [[TMP3]] = add nsw i32 [[I_01]], 1
	; INTC-NOT: for.body3.split:			; LNICM-NEXT: [[TMP4:%.*]] = sext i32 [[TMP3]] to i64
	; LNICM-NEXT: [[TMP3:%.]] = add nsw i32 [[I:%.]], 1			; LNICM-NEXT: [[TMP5:%.*]] = icmp slt i64 [[TMP4]], [[M]]
	; LNICM-NEXT: [[TMP4:%.]] = icmp slt i32 [[TMP3:%.]], 10			; LNICM-NEXT: br i1 [[TMP5]], label [[FOR_BODY3]], label [[FOR_END11_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
	; LNICM-NEXT: br i1 [[TMP4:%.]], label [[FOR_BODY3:%.]], label [[FOR_END11:%.*]], !llvm.loop !0
	; LNICM: for.end:			; LNICM: for.end:
	; LNICM-NEXT: [[INC10:%.]] = add nsw i32 [[K:%.]], 1			; LNICM-NEXT: [[INC10]] = add nsw i32 [[K_02]], 1
	; LNICM-NEXT: [[CMP:%.]] = icmp slt i32 [[INC10:%.]], 10			; LNICM-NEXT: [[INC10_EXT:%.*]] = sext i32 [[INC10]] to i64
	; LNICM-NEXT: br i1 [[CMP:%.]], label [[FOR_BODY:%.]], label [[FOR_BODY3_SPLIT:%.*]], !llvm.loop !2			; LNICM-NEXT: [[CMP:%.*]] = icmp slt i64 [[INC10_EXT]], [[N]]
				; LNICM-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_BODY3_SPLIT]], !llvm.loop [[LOOP2:![0-9]+]]
				; LNICM: for.end11.loopexit:
				; LNICM-NEXT: br label [[FOR_END11]]
	; LNICM: for.end11:			; LNICM: for.end11:
	; LNICM-NEXT: ret void			; LNICM-NEXT: ret void
				;
				; The loopnest is not interchanged when we run licm and loop interchange.
				; LICM-LABEL: @test(
				; LICM-NEXT: gurad:
				; LICM-NEXT: [[CMP23:%.]] = icmp sgt i64 [[M:%.]], 0
				; LICM-NEXT: [[CMP32:%.]] = icmp sgt i64 [[N:%.]], 0
				; LICM-NEXT: br i1 [[CMP23]], label [[FOR_COND1_PREHEADER_LR_PH:%.]], label [[FOR_END11:%.]]
				; LICM: for.cond1.preheader.lr.ph:
				; LICM-NEXT: br i1 [[CMP32]], label [[FOR_I_PREHEADER:%.*]], label [[FOR_END11]]
				; LICM: for.i.preheader:
				; LICM-NEXT: br label [[ENTRY:%.*]]
				; LICM: entry:
				; LICM-NEXT: [[TMP0:%.]] = load i32, ptr [[Z:%.]], align 4
				; LICM-NEXT: br label [[FOR_BODY:%.*]]
				; LICM: for.body:
				; LICM-NEXT: [[K_02:%.]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC10:%.]], [[FOR_END:%.*]] ]
				; LICM-NEXT: [[IDXPROM:%.*]] = sext i32 [[K_02]] to i64
				; LICM-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, ptr [[Y:%.]], i64 [[IDXPROM]]
				; LICM-NEXT: [[TMP1:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
				; LICM-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP1]], [[TMP0]]
				; LICM-NEXT: br label [[FOR_BODY3:%.*]]
				; LICM: for.body3:
				; LICM-NEXT: [[I_01:%.]] = phi i32 [ 0, [[FOR_BODY]] ], [ [[INC:%.]], [[FOR_BODY3]] ]
				; LICM-NEXT: [[IDXPROM4:%.*]] = sext i32 [[I_01]] to i64
				; LICM-NEXT: [[INDEX0:%.*]] = mul i64 [[IDXPROM4]], [[N]]
				; LICM-NEXT: [[INDEX1:%.*]] = add i64 [[INDEX0]], [[IDXPROM]]
				; LICM-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, ptr [[X:%.]], i64 [[INDEX1]]
				; LICM-NEXT: [[TMP2:%.*]] = load i32, ptr [[ARRAYIDX7]], align 4
				; LICM-NEXT: [[ADD8:%.*]] = add nsw i32 [[TMP2]], [[ADD]]
				; LICM-NEXT: store i32 [[ADD8]], ptr [[ARRAYIDX7]], align 4
				; LICM-NEXT: [[INC]] = add nsw i32 [[I_01]], 1
				; LICM-NEXT: [[INC_EXT:%.*]] = sext i32 [[INC]] to i64
				; LICM-NEXT: [[CMP2:%.*]] = icmp slt i64 [[INC_EXT]], [[M]]
				; LICM-NEXT: br i1 [[CMP2]], label [[FOR_BODY3]], label [[FOR_END]], !llvm.loop [[LOOP0:![0-9]+]]
				; LICM: for.end:
				; LICM-NEXT: [[INC10]] = add nsw i32 [[K_02]], 1
				; LICM-NEXT: [[INC10_EXT:%.*]] = sext i32 [[INC10]] to i64
				; LICM-NEXT: [[CMP:%.*]] = icmp slt i64 [[INC10_EXT]], [[N]]
				; LICM-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END11_LOOPEXIT:%.*]], !llvm.loop [[LOOP2:![0-9]+]]
				; LICM: for.end11.loopexit:
				; LICM-NEXT: br label [[FOR_END11]]
				; LICM: for.end11:
				; LICM-NEXT: ret void
				;

				gurad:
				%cmp23 = icmp sgt i64 %m, 0
				%cmp32 = icmp sgt i64 %n, 0
				br i1 %cmp23, label %for.cond1.preheader.lr.ph, label %for.end11

	entry:			for.cond1.preheader.lr.ph: ; preds = %gurad
				br i1 %cmp32, label %for.i.preheader, label %for.end11

				for.i.preheader: ; preds = %for.cond1.preheader.lr.ph
				br label %entry

				entry: ; preds = %for.i.preheader
	br label %for.body			br label %for.body

	for.body:			for.body:
	%k.02 = phi i32 [ 0, %entry ], [ %inc10, %for.end ]			%k.02 = phi i32 [ 0, %entry ], [ %inc10, %for.end ]
	%0 = load i32, i32* %z, align 4			%0 = load i32, ptr %z, align 4
	br label %for.body3			br label %for.body3

	for.body3:			for.body3:
	%i.01 = phi i32 [ 0, %for.body ], [ %inc, %for.body3 ]			%i.01 = phi i32 [ 0, %for.body ], [ %inc, %for.body3 ]
	%idxprom = sext i32 %k.02 to i64			%idxprom = sext i32 %k.02 to i64
	%arrayidx = getelementptr inbounds i32, i32* %y, i64 %idxprom			%arrayidx = getelementptr inbounds i32, ptr %y, i64 %idxprom
	%1 = load i32, i32* %arrayidx, align 4			%1 = load i32, ptr %arrayidx, align 4
	%add = add nsw i32 %1, %0			%add = add nsw i32 %1, %0
	%idxprom4 = sext i32 %i.01 to i64			%idxprom4 = sext i32 %i.01 to i64
	%arrayidx5 = getelementptr inbounds [10 x i32], [10 x i32]* %x, i64 %idxprom4			%index0 = mul i64 %idxprom4, %n
	%idxprom6 = sext i32 %k.02 to i64			%index1 = add i64 %index0, %idxprom
	%arrayidx7 = getelementptr inbounds [10 x i32], [10 x i32]* %arrayidx5, i64 0, i64 %idxprom6			%arrayidx7 = getelementptr inbounds i32, ptr %x, i64 %index1
	%2 = load i32, i32* %arrayidx7, align 4			%2 = load i32, ptr %arrayidx7, align 4
	%add8 = add nsw i32 %2, %add			%add8 = add nsw i32 %2, %add
	store i32 %add8, i32* %arrayidx7, align 4			store i32 %add8, ptr %arrayidx7, align 4
	%inc = add nsw i32 %i.01, 1			%inc = add nsw i32 %i.01, 1
	%cmp2 = icmp slt i32 %inc, 10			%inc.ext = sext i32 %inc to i64
				%cmp2 = icmp slt i64 %inc.ext, %m
	br i1 %cmp2, label %for.body3, label %for.end, !llvm.loop !0			br i1 %cmp2, label %for.body3, label %for.end, !llvm.loop !0

	for.end:			for.end:
	%inc10 = add nsw i32 %k.02, 1			%inc10 = add nsw i32 %k.02, 1
	%cmp = icmp slt i32 %inc10, 10			%inc10.ext = sext i32 %inc10 to i64
				%cmp = icmp slt i64 %inc10.ext, %n
	br i1 %cmp, label %for.body, label %for.end11, !llvm.loop !2			br i1 %cmp, label %for.body, label %for.end11, !llvm.loop !2

	for.end11:			for.end11:
	ret void			ret void
	}			}

	!0 = distinct !{!0, !1}			!0 = distinct !{!0, !1}
	!1 = !{!"llvm.loop.mustprogress"}			!1 = !{!"llvm.loop.mustprogress"}
	!2 = distinct !{!2, !1}			!2 = distinct !{!2, !1}

llvm/test/Transforms/LoopInterchange/call-instructions.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
				bmahjourUnsubmitted Not Done Reply Inline Actions If we use `powerpc64le-unknown-linux-gnu` as the target triple, then we wouldn't need to specify `-mcpu`, since `getCacheLineSize()` returns valid cache line size for all Power subtargets. bmahjour: If we use `powerpc64le-unknown-linux-gnu` as the target triple, then we wouldn't need to…
				congzheAuthorUnsubmitted Done Reply Inline Actions I've now used `powerpc64le-unknown-linux-gnu` for all tests. congzhe: I've now used `powerpc64le-unknown-linux-gnu` for all tests.
	; RUN: -verify-dom-info -verify-loop-info -stats 2>&1 \| FileCheck -check-prefix=STATS %s			; RUN: -verify-dom-info -verify-loop-info -stats 2>&1 \| FileCheck -check-prefix=STATS %s
	; RUN: FileCheck --input-file=%t %s			; RUN: FileCheck --input-file=%t %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer

	declare void @foo(i64 %a)			declare void @foo(i64 %a)
	declare void @bar(i64 %a) readnone			declare void @bar(i64 %a) readnone

	;;--------------------------------------Test case 01------------------------------------			;;--------------------------------------Test case 01------------------------------------
	;; Not safe to interchange, because the called function `foo` is not marked as			;; Not safe to interchange, because the called function `foo` is not marked as
	▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/currentLimitation.ll

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' \
	; RUN: -pass-remarks-output=%t -verify-loop-info -verify-dom-info -S \| FileCheck -check-prefix=IR %s			; RUN: -pass-remarks-output=%t -verify-loop-info -verify-dom-info -S \| FileCheck -check-prefix=IR %s
	; RUN: FileCheck --input-file=%t %s			; RUN: FileCheck --input-file=%t %s

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' \
	; RUN: -da-disable-delinearization-checks -pass-remarks-output=%t \			; RUN: -da-disable-delinearization-checks -pass-remarks-output=%t \
	; RUN: -verify-loop-info -verify-dom-info -S \| FileCheck -check-prefix=IR %s			; RUN: -verify-loop-info -verify-dom-info -S \| FileCheck -check-prefix=IR %s
	; RUN: FileCheck --check-prefix=DELIN --input-file=%t %s			; RUN: FileCheck --check-prefix=DELIN --input-file=%t %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer
	@B = common global [100 x [100 x [100 x i32]]] zeroinitializer			@B = common global [100 x [100 x [100 x i32]]] zeroinitializer
	@C = common global [100 x [100 x i64]] zeroinitializer			@C = common global [100 x [100 x i64]] zeroinitializer

	;;--------------------------------------Test case 01------------------------------------			;;--------------------------------------Test case 01------------------------------------
	;; This loop can be interchanged with -da-disable-delinearization-checks, otherwise it cannot			;; This loop can be interchanged with -da-disable-delinearization-checks, otherwise it cannot
	;; be interchanged due to dependence.			;; be interchanged due to dependence.
	;; for(int i=0;i<N-1;i++)			;; for(int i=0;i<N-1;i++)
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/debuginfo.ll

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks='loop-interchange' -pass-remarks-output=%t -S \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks='loop-interchange' -pass-remarks-output=%t -S \
	; RUN: -verify-dom-info -verify-loop-info \| FileCheck %s			; RUN: -verify-dom-info -verify-loop-info \| FileCheck %s
	; RUN: FileCheck -check-prefix=REMARK --input-file=%t %s			; RUN: FileCheck -check-prefix=REMARK --input-file=%t %s


	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i64]] zeroinitializer			@A = common global [100 x [100 x i64]] zeroinitializer

	;; for(int i=0;i<100;i++)			;; for(int i=0;i<100;i++)
	;; for(int j=0;j<100;j++)			;; for(int j=0;j<100;j++)
	;; A[j][i] = A[j][i]+k;			;; A[j][i] = A[j][i]+k;

	; REMARK: Name: Interchanged			; REMARK: Name: Interchanged
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/inner-indvar-depend-on-outer-indvar.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -S -debug 2>&1 \| FileCheck %s			; RUN: -S -debug 2>&1 \| FileCheck %s

	@A = common global [100 x [100 x i64]] zeroinitializer			@A = common global [100 x [100 x i64]] zeroinitializer
	@N = dso_local local_unnamed_addr global i64 100, align 8			@N = dso_local local_unnamed_addr global i64 100, align 8


	;; for(int i=0;i<100;i++)			;; for(int i=0;i<100;i++)
	;; for(int j=0;j<i;j++)			;; for(int j=0;j<i;j++)
	▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
	; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa 2>&1 \| FileCheck -check-prefix=IR %s			; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa 2>&1 \| FileCheck -check-prefix=IR %s
	; RUN: FileCheck --input-file=%t %s			; RUN: FileCheck --input-file=%t %s

	; Inner loop only reductions are not supported currently. See discussion at			; Inner loop only reductions are not supported currently. See discussion at
	; D53027 for more information on the required checks.			; D53027 for more information on the required checks.

	@A = common global [500 x [500 x i32]] zeroinitializer			@A = common global [500 x [500 x i32]] zeroinitializer
	@X = common global i32 0			@X = common global i32 0
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -S -debug 2>&1 \| FileCheck %s			; RUN: -S -debug 2>&1 \| FileCheck %s

	@a = common global i32 0, align 4			@a = common global i32 0, align 4
	@d = common dso_local local_unnamed_addr global [1 x [6 x i32]] zeroinitializer, align 4			@d = common dso_local local_unnamed_addr global [1 x [6 x i32]] zeroinitializer, align 4

	;; After interchanging the innermost and the middle loop, we should not continue			;; After interchanging the innermost and the middle loop, we should not continue
	;; doing interchange for the (new) middle loop and the outermost loop, because of			;; doing interchange for the (new) middle loop and the outermost loop, because of
	;; values defined in the new innermost loop not available in the exiting block of			;; values defined in the new innermost loop not available in the exiting block of
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/interchange-flow-dep-outer.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -S -debug 2>&1 \| FileCheck %s			; RUN: -S -debug 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer
	@B = common global [100 x i32] zeroinitializer			@B = common global [100 x i32] zeroinitializer
	@C = common global [100 x [100 x i32]] zeroinitializer			@C = common global [100 x [100 x i32]] zeroinitializer
	@D = common global [100 x [100 x [100 x i32]]] zeroinitializer			@D = common global [100 x [100 x [100 x i32]]] zeroinitializer

	;; Test that a flow dependency in outer loop doesn't prevent interchange in			;; Test that a flow dependency in outer loop doesn't prevent interchange in
	;; loops i and j.			;; loops i and j.
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/interchange-insts-between-indvar.ll

	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -S -pass-remarks=loop-interchange 2>&1 \| FileCheck %s			; RUN: -S -pass-remarks=loop-interchange 2>&1 \| FileCheck %s

	@A10 = local_unnamed_addr global [3 x [3 x i32]] zeroinitializer, align 16			@A10 = local_unnamed_addr global [3 x [3 x i32]] zeroinitializer, align 16

	;; Test to make sure we can handle zext instructions introduced by			;; Test to make sure we can handle zext instructions introduced by
	;; IndVarSimplify.			;; IndVarSimplify.
	;;			;;
	;; for (int i = 0; i < 2; ++i)			;; for (int i = 0; i < 2; ++i)
	Show All 39 Lines

llvm/test/Transforms/LoopInterchange/interchange-no-deps.ll

; REQUIRES: asserts		; REQUIRES: asserts
; RUN: opt < %s -loop-interchange -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -pass-remarks-output=%t \		; RUN: opt < %s -loop-interchange -cache-line-size=64 -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -pass-remarks-output=%t \
; RUN: -pass-remarks=loop-interchange -pass-remarks-missed=loop-interchange -stats -S 2>&1 \		; RUN: -pass-remarks=loop-interchange -pass-remarks-missed=loop-interchange -stats -S 2>&1 \
; RUN: \| FileCheck -check-prefix=STATS %s		; RUN: \| FileCheck -check-prefix=STATS %s
; RUN: FileCheck -input-file %t %s		; RUN: FileCheck -input-file %t %s


; no_deps_interchange just accesses a single nested array and can be interchange.		; no_deps_interchange just accesses a single nested array and can be interchange.
; CHECK: Name: Interchanged		; CHECK: Name: Interchanged
; CHECK-NEXT: Function: no_deps_interchange		; CHECK-NEXT: Function: no_deps_interchange
Show All 18 Lines	for1.inc:
%exitcond21 = icmp ne i64 %indvars.iv.next20, 1024		%exitcond21 = icmp ne i64 %indvars.iv.next20, 1024
br i1 %exitcond21, label %for1.header, label %exit		br i1 %exitcond21, label %for1.header, label %exit

exit: ; preds = %for1.inc		exit: ; preds = %for1.inc
ret i32 0		ret i32 0

}		}

; Only the inner loop induction variable is used for memory accesses.
; Interchanging is not beneficial.
; CHECK: Name: InterchangeNotProfitable
; CHECK-NEXT: Function: no_bad_order
define i32 @no_bad_order(i32* %Arr) {
entry:
br label %for1.header

for1.header: ; preds = %entry, %for1.inc
%indvars.iv19 = phi i64 [ 0, %entry ], [ %indvars.iv.next20, %for1.inc ]
br label %for2

for2: ; preds = %for1.header, %for2
%indvars.iv = phi i64 [ 0, %for1.header ], [ %indvars.iv.next, %for2 ]
%arrayidx6 = getelementptr inbounds i32, i32* %Arr, i64 %indvars.iv
store i32 0, i32* %arrayidx6, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp ne i64 %indvars.iv.next, 1024
br i1 %exitcond, label %for2, label %for1.inc

for1.inc:
%indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1
%exitcond21 = icmp ne i64 %indvars.iv.next20, 1024
br i1 %exitcond21, label %for1.header, label %exit

exit: ; preds = %for1.inc
ret i32 0
}

; No memory access using any induction variables, interchanging not beneficial.		; No memory access using any induction variables, interchanging not beneficial.
; CHECK: Name: InterchangeNotProfitable		; CHECK: Name: InterchangeNotProfitable
; CHECK-NEXT: Function: no_mem_instrs		; CHECK-NEXT: Function: no_mem_instrs
define i32 @no_mem_instrs(i64* %ptr) {		define i32 @no_mem_instrs(i64* %ptr) {
entry:		entry:
br label %for1.header		br label %for1.header

for1.header: ; preds = %entry, %for1.inc		for1.header: ; preds = %entry, %for1.inc
Show All 22 Lines

llvm/test/Transforms/LoopInterchange/interchangeable-innerloop-multiple-indvars.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s

	@b = common dso_local local_unnamed_addr global [200 x [200 x i32]] zeroinitializer, align 4			@b = common dso_local local_unnamed_addr global [200 x [200 x i32]] zeroinitializer, align 4
	@a = common dso_local local_unnamed_addr global i32 0, align 4			@a = common dso_local local_unnamed_addr global i32 0, align 4

	;; int a, c, d, e;			;; int a, c, d, e;
	;; int b[200][200];			;; int b[200][200];
	;; void fn1() {			;; void fn1() {
	;; for (c = 0; c < 100; c++) {			;; for (c = 0; c < 100; c++) {
	▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/interchangeable-outerloop-multiple-indvars.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s --basic-aa -loop-interchange -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s			; RUN: opt < %s --basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s

	@b = constant [200 x [100 x i32]] zeroinitializer, align 4			@b = constant [200 x [100 x i32]] zeroinitializer, align 4
	@a = constant i32 0, align 4			@a = constant i32 0, align 4

	; // Loop wth two outer indvars.			; // Loop wth two outer indvars.
	; int a, c, d, e;			; int a, c, d, e;
	; int b[200][100];			; int b[200][100];
	; void test1() {			; void test1() {
	▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/interchangeable.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s
	; RUN: opt < %s -aa-pipeline=basic-aa -passes=loop-interchange -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s			; RUN: opt < %s -aa-pipeline=basic-aa -passes=loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -S \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i64]] zeroinitializer			@A = common global [100 x [100 x i64]] zeroinitializer
	@B = common global [100 x i64] zeroinitializer			@B = common global [100 x i64] zeroinitializer

	;; for(int i=0;i<100;i++)			;; for(int i=0;i<100;i++)
	;; for(int j=0;j<100;j++)			;; for(int j=0;j<100;j++)
	;; A[j][i] = A[j][i]+k;			;; A[j][i] = A[j][i]+k;

	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/interchanged-loop-nest-3.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -S -debug 2>&1 \| FileCheck %s			; RUN: -S -debug 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@D = common global [100 x [100 x [100 x i32]]] zeroinitializer			@D = common global [100 x [100 x [100 x i32]]] zeroinitializer

	;; Test for interchange in loop nest greater than 2.			;; Test for interchange in loop nest greater than 2.
	;; for(int i=0;i<100;i++)			;; for(int i=0;i<100;i++)
	;; for(int j=0;j<100;j++)			;; for(int j=0;j<100;j++)
	;; for(int k=0;k<100;k++)			;; for(int k=0;k<100;k++)
	;; D[k][j][i] = D[k][j][i]+t;			;; D[k][j][i] = D[k][j][i]+t;

	; CHECK: Processing InnerLoopId = 2 and OuterLoopId = 1			; CHECK: Processing InnerLoopId = 2 and OuterLoopId = 1
	; CHECK: Loops interchanged.			; CHECK: Loops interchanged.

	; CHECK: Processing InnerLoopId = 1 and OuterLoopId = 0			; CHECK: Processing InnerLoopId = 1 and OuterLoopId = 0
	; CHECK: Loops interchanged.			; CHECK: Loops interchanged.

	define void @interchange_08(i32 %t){			define void @interchange_08(i32 %t){
	entry:			entry:
	br label %for.cond1.preheader			br label %for.cond1.preheader

	for.cond1.preheader: ; preds = %for.inc15, %entry			for.cond1.preheader: ; preds = %for.inc15, %entry
	%i.028 = phi i32 [ 0, %entry ], [ %inc16, %for.inc15 ]			%i.028 = phi i64 [ 0, %entry ], [ %inc16, %for.inc15 ]
	br label %for.cond4.preheader			br label %for.cond4.preheader

	for.cond4.preheader: ; preds = %for.inc12, %for.cond1.preheader			for.cond4.preheader: ; preds = %for.inc12, %for.cond1.preheader
	%j.027 = phi i32 [ 0, %for.cond1.preheader ], [ %inc13, %for.inc12 ]			%j.027 = phi i64 [ 0, %for.cond1.preheader ], [ %inc13, %for.inc12 ]
	br label %for.body6			br label %for.body6

	for.body6: ; preds = %for.body6, %for.cond4.preheader			for.body6: ; preds = %for.body6, %for.cond4.preheader
	%k.026 = phi i32 [ 0, %for.cond4.preheader ], [ %inc, %for.body6 ]			%k.026 = phi i64 [ 0, %for.cond4.preheader ], [ %inc, %for.body6 ]
	%arrayidx8 = getelementptr inbounds [100 x [100 x [100 x i32]]], [100 x [100 x [100 x i32]]]* @D, i32 0, i32 %k.026, i32 %j.027, i32 %i.028			%arrayidx8 = getelementptr inbounds [100 x [100 x [100 x i32]]], [100 x [100 x [100 x i32]]]* @D, i64 0, i64 %k.026, i64 %j.027, i64 %i.028
	%0 = load i32, i32* %arrayidx8			%0 = load i32, i32* %arrayidx8
	%add = add nsw i32 %0, %t			%add = add nsw i32 %0, %t
	store i32 %add, i32* %arrayidx8			store i32 %add, i32* %arrayidx8
	%inc = add nuw nsw i32 %k.026, 1			%inc = add nuw nsw i64 %k.026, 1
	%exitcond = icmp eq i32 %inc, 100			%exitcond = icmp eq i64 %inc, 100
	br i1 %exitcond, label %for.inc12, label %for.body6			br i1 %exitcond, label %for.inc12, label %for.body6

	for.inc12: ; preds = %for.body6			for.inc12: ; preds = %for.body6
	%inc13 = add nuw nsw i32 %j.027, 1			%inc13 = add nuw nsw i64 %j.027, 1
	%exitcond29 = icmp eq i32 %inc13, 100			%exitcond29 = icmp eq i64 %inc13, 100
	br i1 %exitcond29, label %for.inc15, label %for.cond4.preheader			br i1 %exitcond29, label %for.inc15, label %for.cond4.preheader

	for.inc15: ; preds = %for.inc12			for.inc15: ; preds = %for.inc12
	%inc16 = add nuw nsw i32 %i.028, 1			%inc16 = add nuw nsw i64 %i.028, 1
	%exitcond30 = icmp eq i32 %inc16, 100			%exitcond30 = icmp eq i64 %inc16, 100
	br i1 %exitcond30, label %for.end17, label %for.cond1.preheader			br i1 %exitcond30, label %for.end17, label %for.cond1.preheader

	for.end17: ; preds = %for.inc15			for.end17: ; preds = %for.inc15
	ret void			ret void
	}			}

llvm/test/Transforms/LoopInterchange/lcssa-preheader.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -S \| FileCheck %s			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -S \| FileCheck %s
	; RUN: opt < %s -basic-aa -loop-interchange -da-disable-delinearization-checks -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -S \| FileCheck -check-prefix=CHECK-DELIN %s			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -da-disable-delinearization-checks -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -S \| FileCheck -check-prefix=CHECK-DELIN %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	; void foo(int n, int m) {			; void foo(int n, int m) {
	; int temp[16][16];			; int temp[16][16];
	; int res[16][16];			; int res[16][16];
	; for(int i = 0; i < n; i++) {			; for(int i = 0; i < n; i++) {
	; for(int j = 0; j < m; j++)			; for(int j = 0; j < m; j++)
	▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/lcssa.ll

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -pass-remarks-output=%t -S			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -verify-loop-lcssa -pass-remarks-output=%t -S
	; RUN: FileCheck --input-file %t --check-prefix REMARK %s			; RUN: FileCheck --input-file %t --check-prefix REMARK %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer
	@C = common global [100 x [100 x i32]] zeroinitializer			@C = common global [100 x [100 x i32]] zeroinitializer
	@X = common global i32 0			@X = common global i32 0
	@Y = common global i64 0			@Y = common global i64 0
	@F = common global float 0.0			@F = common global float 0.0

	; We cannot interchange this loop at the moment, because iv.outer.next is			; We cannot interchange this loop at the moment, because iv.outer.next is
	▲ Show 20 Lines • Show All 321 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll

	; Test optimization remarks generated by the LoopInterchange pass.			; Test optimization remarks generated by the LoopInterchange pass.
	;			;
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -pass-remarks-output=%t -pass-remarks-missed='loop-interchange' \			; RUN: -pass-remarks-output=%t -pass-remarks-missed='loop-interchange' \
	; RUN: -pass-remarks='loop-interchange' -S			; RUN: -pass-remarks='loop-interchange' -S
	; RUN: cat %t \| FileCheck %s			; RUN: cat %t \| FileCheck %s

	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -pass-remarks-output=%t -pass-remarks-missed='loop-interchange' \			; RUN: -pass-remarks-output=%t -pass-remarks-missed='loop-interchange' \
	; RUN: -pass-remarks='loop-interchange' -S -da-disable-delinearization-checks			; RUN: -pass-remarks='loop-interchange' -S -da-disable-delinearization-checks
	; RUN: cat %t \| FileCheck --check-prefix=DELIN %s			; RUN: cat %t \| FileCheck --check-prefix=DELIN %s

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer
	@B = common global [100 x [100 x i32]] zeroinitializer			@B = common global [100 x [100 x i32]] zeroinitializer
	@C = common global [100 x i32] zeroinitializer			@C = common global [100 x i32] zeroinitializer

	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: - String: Cannot interchange loops due to dependences.			; CHECK-NEXT: - String: Cannot interchange loops due to dependences.
	; CHECK-NEXT: ...			; CHECK-NEXT: ...

	; DELIN: --- !Missed			; DELIN: --- !Missed
	; DELIN-NEXT: Pass: loop-interchange			; DELIN-NEXT: Pass: loop-interchange
	; DELIN-NEXT: Name: InterchangeNotProfitable			; DELIN-NEXT: Name: InterchangeNotProfitable
	; DELIN-NEXT: Function: test01			; DELIN-NEXT: Function: test01
	; DELIN-NEXT: Args:			; DELIN-NEXT: Args:
	; DELIN-NEXT: - String: 'Interchanging loops is too costly (cost='			; DELIN-NEXT: - String: Interchanging loops is too costly and it does not improve parallelism.
	; DELIN-NEXT: - Cost: '2'
	; DELIN-NEXT: - String: ', threshold='
	; DELIN-NEXT: - Threshold: '0'
	; DELIN-NEXT: - String: ') and it does not improve parallelism.'
	; DELIN-NEXT: ...			; DELIN-NEXT: ...

	;;--------------------------------------Test case 02------------------------------------			;;--------------------------------------Test case 02------------------------------------
	;; [FIXME] This loop though valid is currently not interchanged due to the			;; [FIXME] This loop though valid is currently not interchanged due to the
	;; limitation that we cannot split the inner loop latch due to multiple use of inner induction			;; limitation that we cannot split the inner loop latch due to multiple use of inner induction
	;; variable.(used to increment the loop counter and to access A[j+1][i+1]			;; variable.(used to increment the loop counter and to access A[j+1][i+1]
	;; for(int i=0;i<N-1;i++)			;; for(int i=0;i<N-1;i++)
	;; for(int j=1;j<N-1;j++)			;; for(int j=1;j<N-1;j++)
	▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/not-interchanged-dependencies-1.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -S -debug 2>&1 \| FileCheck %s			; RUN: -S -debug 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer
	@B = common global [100 x i32] zeroinitializer			@B = common global [100 x i32] zeroinitializer

	;; Loops should not be interchanged in this case as it is not legal due to dependency.			;; Loops should not be interchanged in this case as it is not legal due to dependency.
	;; for(int j=0;j<99;j++)			;; for(int j=0;j<99;j++)
	;; for(int i=0;i<99;i++)			;; for(int i=0;i<99;i++)
	;; A[j][i+1] = A[j+1][i]+k;			;; A[j][i+1] = A[j+1][i]+k;
	Show All 30 Lines

llvm/test/Transforms/LoopInterchange/not-interchanged-loop-nest-3.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -S -debug 2>&1 \| FileCheck %s			; RUN: -S -debug 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@D = common global [100 x [100 x [100 x i32]]] zeroinitializer			@D = common global [100 x [100 x [100 x i32]]] zeroinitializer

	;; Test for interchange in loop nest greater than 2.			;; Test for interchange in loop nest greater than 2.
	;; for(int i=0;i<100;i++)			;; for(int i=0;i<100;i++)
	;; for(int j=0;j<100;j++)			;; for(int j=0;j<100;j++)
	;; for(int k=0;k<100;k++)			;; for(int k=0;k<100;k++)
	;; D[i][k][j] = D[i][k][j]+t;			;; D[i][k][j] = D[i][k][j]+t;

	; CHECK: Processing InnerLoopId = 2 and OuterLoopId = 1			; CHECK: Processing InnerLoopId = 2 and OuterLoopId = 1
	; CHECK: Loops interchanged.			; CHECK: Loops interchanged.

	; CHECK: Processing InnerLoopId = 1 and OuterLoopId = 0			; CHECK: Processing InnerLoopId = 1 and OuterLoopId = 0
	; CHECK: Interchanging loops not profitable.			; CHECK: Interchanging loops not profitable.

	define void @interchange_08(i32 %t){			define void @interchange_08(i32 %t){
	entry:			entry:
	br label %for.cond1.preheader			br label %for.cond1.preheader

	for.cond1.preheader: ; preds = %for.inc15, %entry			for.cond1.preheader: ; preds = %for.inc15, %entry
	%i.028 = phi i32 [ 0, %entry ], [ %inc16, %for.inc15 ]			%i.028 = phi i64 [ 0, %entry ], [ %inc16, %for.inc15 ]
	br label %for.cond4.preheader			br label %for.cond4.preheader

	for.cond4.preheader: ; preds = %for.inc12, %for.cond1.preheader			for.cond4.preheader: ; preds = %for.inc12, %for.cond1.preheader
	%j.027 = phi i32 [ 0, %for.cond1.preheader ], [ %inc13, %for.inc12 ]			%j.027 = phi i64 [ 0, %for.cond1.preheader ], [ %inc13, %for.inc12 ]
	br label %for.body6			br label %for.body6

	for.body6: ; preds = %for.body6, %for.cond4.preheader			for.body6: ; preds = %for.body6, %for.cond4.preheader
	%k.026 = phi i32 [ 0, %for.cond4.preheader ], [ %inc, %for.body6 ]			%k.026 = phi i64 [ 0, %for.cond4.preheader ], [ %inc, %for.body6 ]
	%arrayidx8 = getelementptr inbounds [100 x [100 x [100 x i32]]], [100 x [100 x [100 x i32]]]* @D, i32 0, i32 %i.028, i32 %k.026, i32 %j.027			%arrayidx8 = getelementptr inbounds [100 x [100 x [100 x i32]]], [100 x [100 x [100 x i32]]]* @D, i32 0, i64 %i.028, i64 %k.026, i64 %j.027
	%0 = load i32, i32* %arrayidx8			%0 = load i32, i32* %arrayidx8
	%add = add nsw i32 %0, %t			%add = add nsw i32 %0, %t
	store i32 %add, i32* %arrayidx8			store i32 %add, i32* %arrayidx8
	%inc = add nuw nsw i32 %k.026, 1			%inc = add nuw nsw i64 %k.026, 1
	%exitcond = icmp eq i32 %inc, 100			%exitcond = icmp eq i64 %inc, 100
	br i1 %exitcond, label %for.inc12, label %for.body6			br i1 %exitcond, label %for.inc12, label %for.body6

	for.inc12: ; preds = %for.body6			for.inc12: ; preds = %for.body6
	%inc13 = add nuw nsw i32 %j.027, 1			%inc13 = add nuw nsw i64 %j.027, 1
	%exitcond29 = icmp eq i32 %inc13, 100			%exitcond29 = icmp eq i64 %inc13, 100
	br i1 %exitcond29, label %for.inc15, label %for.cond4.preheader			br i1 %exitcond29, label %for.inc15, label %for.cond4.preheader

	for.inc15: ; preds = %for.inc12			for.inc15: ; preds = %for.inc12
	%inc16 = add nuw nsw i32 %i.028, 1			%inc16 = add nuw nsw i64 %i.028, 1
	%exitcond30 = icmp eq i32 %inc16, 100			%exitcond30 = icmp eq i64 %inc16, 100
	br i1 %exitcond30, label %for.end17, label %for.cond1.preheader			br i1 %exitcond30, label %for.end17, label %for.cond1.preheader

	for.end17: ; preds = %for.inc15			for.end17: ; preds = %for.inc15
	ret void			ret void
	}			}

llvm/test/Transforms/LoopInterchange/not-interchanged-tightly-nested.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -basic-aa -loop-interchange -verify-dom-info -verify-loop-info \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info \
	; RUN: -S -debug 2>&1 \| FileCheck %s			; RUN: -S -debug 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer
	@B = common global [100 x i32] zeroinitializer			@B = common global [100 x i32] zeroinitializer
	@C = common global [100 x [100 x i32]] zeroinitializer			@C = common global [100 x [100 x i32]] zeroinitializer
	@D = common global [100 x [100 x [100 x i32]]] zeroinitializer			@D = common global [100 x [100 x [100 x i32]]] zeroinitializer

	;; Loops not tightly nested are not interchanged			;; Loops not tightly nested are not interchanged
	;; for(int j=0;j<N;j++) {			;; for(int j=0;j<N;j++) {
	▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	for.end12:			for.end12:
	ret void			ret void
	}			}

	;; The following Loop is not considered tightly nested and is not interchanged.			;; The following Loop is not considered tightly nested and is not interchanged.
	;; The outer loop header does not branch to the inner loop preheader, or the			;; The outer loop header does not branch to the inner loop preheader, or the
	;; inner loop header, or the outer loop latch.			;; inner loop header, or the outer loop latch.
	; CHECK: Not interchanging loops. Cannot prove legality.			; CHECK: Not interchanging loops. Cannot prove legality.
	define void @interchange_07(i32 %k, i32 %N, i32 %ny) {			define void @interchange_07(i32 %k, i32 %N, i64 %ny) {
	entry:			entry:
	br label %for1.header			br label %for1.header

	for1.header:			for1.header:
	%j23 = phi i32 [ 0, %entry ], [ %j.next24, %for1.inc10 ]			%j23 = phi i64 [ 0, %entry ], [ %j.next24, %for1.inc10 ]
	%cmp21 = icmp slt i32 0, %ny			%cmp21 = icmp slt i64 0, %ny
	br label %singleSucc			br label %singleSucc

	singleSucc:			singleSucc:
	br i1 %cmp21, label %preheader.j, label %for1.inc10			br i1 %cmp21, label %preheader.j, label %for1.inc10

	preheader.j:			preheader.j:
	br label %for2			br label %for2

	for2:			for2:
	%j = phi i32 [ %j.next, %for2 ], [ 0, %preheader.j ]			%j = phi i64 [ %j.next, %for2 ], [ 0, %preheader.j ]
	%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i32 0, i32 %j, i32 %j23			%arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %j, i64 %j23
	%lv = load i32, i32* %arrayidx5			%lv = load i32, i32* %arrayidx5
	%add = add nsw i32 %lv, %k			%add = add nsw i32 %lv, %k
	store i32 %add, i32* %arrayidx5			store i32 %add, i32* %arrayidx5
	%j.next = add nuw nsw i32 %j, 1			%j.next = add nuw nsw i64 %j, 1
	%exitcond = icmp eq i32 %j, 99			%exitcond = icmp eq i64 %j, 99
	br i1 %exitcond, label %for1.inc10, label %for2			br i1 %exitcond, label %for1.inc10, label %for2

	for1.inc10:			for1.inc10:
	%j.next24 = add nuw nsw i32 %j23, 1			%j.next24 = add nuw nsw i64 %j23, 1
	%exitcond26 = icmp eq i32 %j23, 99			%exitcond26 = icmp eq i64 %j23, 99
	br i1 %exitcond26, label %for.end12, label %for1.header			br i1 %exitcond26, label %for.end12, label %for1.header

	for.end12:			for.end12:
	ret void			ret void
	}			}

llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll

	; RUN: opt -basic-aa -loop-interchange -verify-dom-info -verify-loop-info -verify-loop-lcssa -S %s \| FileCheck %s			; RUN: opt -basic-aa -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-loop-lcssa -S %s \| FileCheck %s

	@b = global [3 x [5 x [8 x i16]]] [[5 x [8 x i16]] zeroinitializer, [5 x [8 x i16]] [[8 x i16] zeroinitializer, [8 x i16] [i16 0, i16 0, i16 0, i16 6, i16 1, i16 6, i16 0, i16 0], [8 x i16] zeroinitializer, [8 x i16] zeroinitializer, [8 x i16] zeroinitializer], [5 x [8 x i16]] zeroinitializer], align 2			@b = global [3 x [5 x [8 x i16]]] [[5 x [8 x i16]] zeroinitializer, [5 x [8 x i16]] [[8 x i16] zeroinitializer, [8 x i16] [i16 0, i16 0, i16 0, i16 6, i16 1, i16 6, i16 0, i16 0], [8 x i16] zeroinitializer, [8 x i16] zeroinitializer, [8 x i16] zeroinitializer], [5 x [8 x i16]] zeroinitializer], align 2
	@a = common global i32 0, align 4			@a = common global i32 0, align 4
	@d = common dso_local local_unnamed_addr global [1 x [6 x i32]] zeroinitializer, align 4			@d = common dso_local local_unnamed_addr global [1 x [6 x i32]] zeroinitializer, align 4


	; Doubly nested loop			; Doubly nested loop
	;; C test case:			;; C test case:
	▲ Show 20 Lines • Show All 193 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/outer-only-reductions.ll

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
	; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa 2>&1 \| FileCheck -check-prefix=IR %s			; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa 2>&1 \| FileCheck -check-prefix=IR %s
	; RUN: FileCheck --input-file=%t %s			; RUN: FileCheck --input-file=%t %s

	; Outer loop only reductions are not supported currently.			; Outer loop only reductions are not supported currently.

	@A = common global [500 x [500 x i32]] zeroinitializer			@A = common global [500 x [500 x i32]] zeroinitializer

	;; global X			;; global X
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/perserve-lcssa.ll

	; RUN: opt < %s -loop-interchange -loop-interchange-threshold=-100 -verify-loop-lcssa -S \| FileCheck %s			; RUN: opt < %s -loop-interchange -cache-line-size=64 -loop-interchange-threshold=-100 -verify-loop-lcssa -S \| FileCheck %s

	; Test case for PR41725. The induction variables in the latches escape the			; Test case for PR41725. The induction variables in the latches escape the
	; loops and we must move some PHIs around.			; loops and we must move some PHIs around.

	@a = common dso_local global i64 0, align 4			@a = common dso_local global i64 0, align 4
	@b = common dso_local global i64 0, align 4			@b = common dso_local global i64 0, align 4
	@c = common dso_local global [10 x [10 x i32 ]] zeroinitializer, align 16			@c = common dso_local global [10 x [10 x i32 ]] zeroinitializer, align 16

	▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/phi-ordering.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -loop-interchange -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -loop-interchange-threshold=0 -S 2>&1 \| FileCheck %s			; RUN: opt < %s -loop-interchange -cache-line-size=64 -verify-dom-info -verify-loop-info -verify-scev -verify-loop-lcssa -loop-interchange-threshold=0 -S 2>&1 \| FileCheck %s
	;; Checks the order of the inner phi nodes does not cause havoc.			;; Checks the order of the inner phi nodes does not cause havoc.
	;; The inner loop has a reduction into c. The IV is not the first phi.			;; The inner loop has a reduction into c. The IV is not the first phi.

	target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "armv8--linux-gnueabihf"



	; Function Attrs: norecurse nounwind			; Function Attrs: norecurse nounwind
	define void @test(i32 %T, [90 x i32]* noalias nocapture %C, [90 x [90 x i16]]* noalias nocapture readonly %A, i16* noalias nocapture readonly %B) local_unnamed_addr #0 {			define void @test(i32 %T, [90 x i32]* noalias nocapture %C, [90 x [90 x i16]]* noalias nocapture readonly %A, i16* noalias nocapture readonly %B) local_unnamed_addr #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR3_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR3_PREHEADER:%.*]]
	Show All 10 Lines
	; CHECK-NEXT: br label [[FOR3_SPLIT1:%.*]]			; CHECK-NEXT: br label [[FOR3_SPLIT1:%.*]]
	; CHECK: for3.preheader:			; CHECK: for3.preheader:
	; CHECK-NEXT: br label [[FOR3:%.*]]			; CHECK-NEXT: br label [[FOR3:%.*]]
	; CHECK: for3:			; CHECK: for3:
	; CHECK-NEXT: [[K:%.]] = phi i32 [ [[TMP1:%.]], [[FOR3_SPLIT:%.*]] ], [ 1, [[FOR3_PREHEADER]] ]			; CHECK-NEXT: [[K:%.]] = phi i32 [ [[TMP1:%.]], [[FOR3_SPLIT:%.*]] ], [ 1, [[FOR3_PREHEADER]] ]
	; CHECK-NEXT: br label [[FOR1_HEADER_PREHEADER]]			; CHECK-NEXT: br label [[FOR1_HEADER_PREHEADER]]
	; CHECK: for3.split1:			; CHECK: for3.split1:
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[K]], [[MUL]]			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[K]], [[MUL]]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [90 x [90 x i16]], [90 x [90 x i16]] [[A:%.*]], i32 [[ADD]], i32 [[J]], i32 [[I]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [90 x [90 x i16]], [90 x [90 x i16]] [[A:%.*]], i32 [[ADD]], i32 [[I]], i32 [[J]]
	; CHECK-NEXT: [[TMP0:%.]] = load i16, i16 [[ARRAYIDX]], align 2			; CHECK-NEXT: [[TMP0:%.]] = load i16, i16 [[ARRAYIDX]], align 2
	; CHECK-NEXT: [[ADD15:%.*]] = add nsw i16 [[TMP0]], 1			; CHECK-NEXT: [[ADD15:%.*]] = add nsw i16 [[TMP0]], 1
	; CHECK-NEXT: store i16 [[ADD15]], i16* [[ARRAYIDX]]			; CHECK-NEXT: store i16 [[ADD15]], i16* [[ARRAYIDX]]
	; CHECK-NEXT: [[INC:%.*]] = add nuw nsw i32 [[K]], 1			; CHECK-NEXT: [[INC:%.*]] = add nuw nsw i32 [[K]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 90			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 90
	; CHECK-NEXT: br label [[FOR2_INC16]]			; CHECK-NEXT: br label [[FOR2_INC16]]
	; CHECK: for3.split:			; CHECK: for3.split:
	; CHECK-NEXT: [[TMP1]] = add nuw nsw i32 [[K]], 1			; CHECK-NEXT: [[TMP1]] = add nuw nsw i32 [[K]], 1
	Show All 22 Lines

	for2.header: ; preds = %for2.inc16, %for1.header			for2.header: ; preds = %for2.inc16, %for1.header
	%j = phi i32 [ 0, %for1.header ], [ %inc17, %for2.inc16 ]			%j = phi i32 [ 0, %for1.header ], [ %inc17, %for2.inc16 ]
	br label %for3			br label %for3

	for3: ; preds = %for3, %for2.header			for3: ; preds = %for3, %for2.header
	%k = phi i32 [ 1, %for2.header ], [ %inc, %for3 ]			%k = phi i32 [ 1, %for2.header ], [ %inc, %for3 ]
	%add = add nsw i32 %k, %mul			%add = add nsw i32 %k, %mul
	%arrayidx = getelementptr inbounds [90 x [90 x i16]], [90 x [90 x i16]]* %A, i32 %add, i32 %j, i32 %i			%arrayidx = getelementptr inbounds [90 x [90 x i16]], [90 x [90 x i16]]* %A, i32 %add, i32 %i, i32 %j
	%0 = load i16, i16* %arrayidx, align 2			%0 = load i16, i16* %arrayidx, align 2
	%add15 = add nsw i16 %0, 1			%add15 = add nsw i16 %0, 1
	store i16 %add15, i16* %arrayidx			store i16 %add15, i16* %arrayidx
	%inc = add nuw nsw i32 %k, 1			%inc = add nuw nsw i32 %k, 1
	%exitcond = icmp eq i32 %inc, 90			%exitcond = icmp eq i32 %inc, 90
	br i1 %exitcond, label %for2.inc16, label %for3			br i1 %exitcond, label %for2.inc16, label %for3

	for2.inc16: ; preds = %for.body6			for2.inc16: ; preds = %for.body6
	Show All 15 Lines

llvm/test/Transforms/LoopInterchange/pr43176-move-to-new-latch.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-interchange -verify-loop-lcssa -verify-dom-info -S %s \| FileCheck %s			; RUN: opt -loop-interchange -cache-line-size=64 -verify-loop-lcssa -verify-dom-info -S %s \| FileCheck %s

	@b = external dso_local global [5 x i32], align 16			@b = external dso_local global [5 x i32], align 16

	define void @test1() {			define void @test1() {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY2_PREHEADER:%.*]]			; CHECK-NEXT: br label [[FOR_BODY2_PREHEADER:%.*]]
	; CHECK: for.body.preheader:			; CHECK: for.body.preheader:
	▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/pr43326-ideal-access-pattern.ll

; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \		; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa -stats 2>&1		; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa -stats 2>&1
; RUN: FileCheck --input-file=%t --check-prefix=REMARKS %s		; RUN: FileCheck --input-file=%t --check-prefix=REMARKS %s

; Triply nested loop, should be able to do interchange three times		; Triply nested loop, should be able to do interchange three times
; to get the ideal access pattern.		; to get the ideal access pattern.
; void f(int e[10][10][10], int f[10][10][10]) {		; void f(int e[10][10][10], int f[10][10][10]) {
; for (int a = 0; a < 10; a++) {		; for (int a = 0; a < 10; a++) {
; for (int b = 0; b < 10; b++) {		; for (int b = 0; b < 10; b++) {
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	for.innermost: ; preds = %for.middle.header, %for.innermost
%arrayidx12 = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]]* %e, i64 %indvars.innermost, i64 %indvars.middle, i64 %indvars.outermost		%arrayidx12 = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]]* %e, i64 %indvars.innermost, i64 %indvars.middle, i64 %indvars.outermost
%0 = load i32, i32* %arrayidx12		%0 = load i32, i32* %arrayidx12
%arrayidx18 = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]]* %f, i64 %indvars.innermost, i64 %indvars.middle, i64 %indvars.outermost		%arrayidx18 = getelementptr inbounds [10 x [10 x i32]], [10 x [10 x i32]]* %f, i64 %indvars.innermost, i64 %indvars.middle, i64 %indvars.outermost
store i32 %0, i32* %arrayidx18		store i32 %0, i32* %arrayidx18
%indvars.innermost.next = add nuw nsw i64 %indvars.innermost, 1		%indvars.innermost.next = add nuw nsw i64 %indvars.innermost, 1
%exitcond.innermost = icmp ne i64 %indvars.innermost.next, 10		%exitcond.innermost = icmp ne i64 %indvars.innermost.next, 10
br i1 %exitcond.innermost, label %for.innermost, label %for.middle.latch		br i1 %exitcond.innermost, label %for.innermost, label %for.middle.latch
}		}
No newline at end of file		No newline at end of file

llvm/test/Transforms/LoopInterchange/pr43326.ll

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
	; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa -stats 2>&1			; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa -stats 2>&1
	; RUN: FileCheck --input-file=%t --check-prefix=REMARKS %s			; RUN: FileCheck --input-file=%t --check-prefix=REMARKS %s

	@a = global i32 0			@a = global i32 0
	@b = global i8 0			@b = global i8 0
	@c = global i32 0			@c = global i32 0
	@d = global i32 0			@d = global i32 0
	@e = global [1 x [1 x i32]] zeroinitializer			@e = global [1 x [1 x i32]] zeroinitializer
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/pr43473-invalid-lcssa-phis-in-inner-exit.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-interchange -S < %s \| FileCheck %s			; RUN: opt -loop-interchange -cache-line-size=64 -S < %s \| FileCheck %s

	; Test cases for PR43473.			; Test cases for PR43473.

	; In the 2 test cases below, we have a LCSSA PHI in the inner loop exit, which			; In the 2 test cases below, we have a LCSSA PHI in the inner loop exit, which
	; is used in the outer loop latch. This is not supported.			; is used in the outer loop latch. This is not supported.

	define void @test1() {			define void @test1() {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/pr43797-lcssa-for-multiple-outer-loop-blocks.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-interchange -verify-loop-lcssa -S %s \| FileCheck %s			; RUN: opt -loop-interchange -cache-line-size=64 -verify-loop-lcssa -S %s \| FileCheck %s

	; Tests for PR43797.			; Tests for PR43797.

	@wdtdr = external dso_local global [5 x [5 x double]], align 16			@wdtdr = external dso_local global [5 x [5 x double]], align 16

	define void @test1() {			define void @test1() {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/pr45743-move-from-inner-preheader.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-interchange -S %s \| FileCheck %s			; RUN: opt -loop-interchange -cache-line-size=64 -S %s \| FileCheck %s

	@global = external local_unnamed_addr global [400 x [400 x i32]], align 16			@global = external local_unnamed_addr global [400 x [400 x i32]], align 16

	; We need to move %tmp4 from the inner loop pre header to the outer loop header			; We need to move %tmp4 from the inner loop pre header to the outer loop header
	; before interchanging.			; before interchanging.
	define void @test1() local_unnamed_addr #0 {			define void @test1() local_unnamed_addr #0 {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/pr48212.ll

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
	; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa 2>&1			; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa 2>&1
	; RUN: FileCheck --input-file=%t --check-prefix=REMARKS %s			; RUN: FileCheck --input-file=%t --check-prefix=REMARKS %s

	; REMARKS: --- !Passed			; REMARKS: --- !Passed
	; REMARKS-NEXT: Pass: loop-interchange			; REMARKS-NEXT: Pass: loop-interchange
	; REMARKS-NEXT: Name: Interchanged			; REMARKS-NEXT: Name: Interchanged
	; REMARKS-NEXT: Function: pr48212			; REMARKS-NEXT: Function: pr48212

	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/profitability.ll

	; RUN: opt < %s -loop-interchange -pass-remarks-output=%t -verify-dom-info -verify-loop-info \			; RUN: opt < %s -loop-interchange -cache-line-size=64 -pass-remarks-output=%t -verify-dom-info -verify-loop-info \
	; RUN: -pass-remarks=loop-interchange -pass-remarks-missed=loop-interchange			; RUN: -pass-remarks=loop-interchange -pass-remarks-missed=loop-interchange
	; RUN: FileCheck -input-file %t %s			; RUN: FileCheck -input-file %t %s

	;; We test profitability model in these test cases.			;; We test profitability model in these test cases.

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	@A = common global [100 x [100 x i32]] zeroinitializer			@A = common global [100 x [100 x i32]] zeroinitializer
	@B = common global [100 x [100 x i32]] zeroinitializer			@B = common global [100 x [100 x i32]] zeroinitializer

	;;---------------------------------------Test case 01---------------------------------			;;---------------------------------------Test case 01---------------------------------
	;; Loops interchange will result in code vectorization and hence profitable. Check for interchange.			;; Loops interchange will result in code vectorization and hence profitable. Check for interchange.
	;; for(int i=1;i<100;i++)			;; for(int i=1;i<100;i++)
	;; for(int j=1;j<100;j++)			;; for(int j=1;j<100;j++)
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll

	; RUN: opt < %s -basic-aa -loop-interchange -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \			; RUN: opt < %s -basic-aa -loop-interchange -cache-line-size=64 -pass-remarks-missed='loop-interchange' -pass-remarks-output=%t -S \
	; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa -stats 2>&1 \| FileCheck %s			; RUN: -verify-dom-info -verify-loop-info -verify-loop-lcssa -stats 2>&1 \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=REMARKS %s			; RUN: FileCheck --input-file=%t --check-prefix=REMARKS %s


	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	; REMARKS: --- !Passed			; REMARKS: --- !Passed
	; REMARKS-NEXT: Pass: loop-interchange			; REMARKS-NEXT: Pass: loop-interchange
	; REMARKS-NEXT: Name: Interchanged			; REMARKS-NEXT: Name: Interchanged
	; REMARKS-NEXT: Function: test1			; REMARKS-NEXT: Function: test1

	define i64 @test1([100 x [100 x i64]]* %Arr) {			define i64 @test1([100 x [100 x i64]]* %Arr) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/update-condbranch-duplicate-successors.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-interchange -S %s \| FileCheck %s			; RUN: opt -loop-interchange -cache-line-size=64 -S %s \| FileCheck %s


	@global = external dso_local global [1000 x [1000 x i32]], align 16			@global = external dso_local global [1000 x [1000 x i32]], align 16

	; Test that we support updating conditional branches where both targets are the same			; Test that we support updating conditional branches where both targets are the same
	; in the predecessor of the outer loop header.			; in the predecessor of the outer loop header.

	define void @foo(i1 %cmp) {			define void @foo(i1 %cmp) {
	▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopInterchange/vector-gep-operand.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-interchange -loop-interchange-threshold=-10 -S %s \| FileCheck %s			; RUN: opt -loop-interchange -cache-line-size=64 -loop-interchange-threshold=-10 -S %s \| FileCheck %s

	; The test contains a GEP with an operand that is not SCEV-able. Make sure			; The test contains a GEP with an operand that is not SCEV-able. Make sure
	; loop-interchange does not crash.			; loop-interchange does not crash.
	define void @test([256 x float]* noalias %src, float* %dst) {			define void @test([256 x float]* noalias %src, float* %dst) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[INNER_PREHEADER:%.*]]			; CHECK-NEXT: br label [[INNER_PREHEADER:%.*]]
	; CHECK: outer.header.preheader:			; CHECK: outer.header.preheader:
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] New cost model for loop interchangeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 438771

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

llvm/test/Transforms/LICM/lnicm.ll

llvm/test/Transforms/LoopInterchange/call-instructions.ll

llvm/test/Transforms/LoopInterchange/currentLimitation.ll

llvm/test/Transforms/LoopInterchange/debuginfo.ll

llvm/test/Transforms/LoopInterchange/inner-indvar-depend-on-outer-indvar.ll

llvm/test/Transforms/LoopInterchange/inner-only-reductions.ll

llvm/test/Transforms/LoopInterchange/innermost-latch-uses-values-in-middle-header.ll

llvm/test/Transforms/LoopInterchange/interchange-flow-dep-outer.ll

llvm/test/Transforms/LoopInterchange/interchange-insts-between-indvar.ll

llvm/test/Transforms/LoopInterchange/interchange-no-deps.ll

llvm/test/Transforms/LoopInterchange/interchangeable-innerloop-multiple-indvars.ll

llvm/test/Transforms/LoopInterchange/interchangeable-outerloop-multiple-indvars.ll

llvm/test/Transforms/LoopInterchange/interchangeable.ll

llvm/test/Transforms/LoopInterchange/interchanged-loop-nest-3.ll

llvm/test/Transforms/LoopInterchange/lcssa-preheader.ll

llvm/test/Transforms/LoopInterchange/lcssa.ll

llvm/test/Transforms/LoopInterchange/loop-interchange-optimization-remarks.ll

llvm/test/Transforms/LoopInterchange/not-interchanged-dependencies-1.ll

llvm/test/Transforms/LoopInterchange/not-interchanged-loop-nest-3.ll

llvm/test/Transforms/LoopInterchange/not-interchanged-tightly-nested.ll

llvm/test/Transforms/LoopInterchange/outer-header-jump-to-inner-latch.ll

llvm/test/Transforms/LoopInterchange/outer-only-reductions.ll

llvm/test/Transforms/LoopInterchange/perserve-lcssa.ll

llvm/test/Transforms/LoopInterchange/phi-ordering.ll

llvm/test/Transforms/LoopInterchange/pr43176-move-to-new-latch.ll

llvm/test/Transforms/LoopInterchange/pr43326-ideal-access-pattern.ll

llvm/test/Transforms/LoopInterchange/pr43326.ll

llvm/test/Transforms/LoopInterchange/pr43473-invalid-lcssa-phis-in-inner-exit.ll

llvm/test/Transforms/LoopInterchange/pr43797-lcssa-for-multiple-outer-loop-blocks.ll

llvm/test/Transforms/LoopInterchange/pr45743-move-from-inner-preheader.ll

llvm/test/Transforms/LoopInterchange/pr48212.ll

llvm/test/Transforms/LoopInterchange/profitability.ll

llvm/test/Transforms/LoopInterchange/reductions-across-inner-and-outer-loop.ll

llvm/test/Transforms/LoopInterchange/update-condbranch-duplicate-successors.ll

llvm/test/Transforms/LoopInterchange/vector-gep-operand.ll

[LoopInterchange] New cost model for loop interchange
ClosedPublic