This is an archive of the discontinued LLVM Phabricator instance.

[LoopPassManager] Ensure to construct loop nests with the outermost loop
ClosedPublic

Authored by congzhe on Aug 18 2022, 9:02 PM.

Details

Summary

This patch is to resolve the bug reported and discussed in https://reviews.llvm.org/D124926#3718761, https://reviews.llvm.org/D124926#3719876.

The bug is that when we run loop interchange twice with the new pass manager using -passes="loop(loop-interchange,loop-interchange)" on the IR attached in https://reviews.llvm.org/D124926#3718761, it hangs forever and consumes more and more memory. The IR is added as a new lit test file in this patch.

The underlying reason, as described in https://reviews.llvm.org/D124926#3719876, is that loop interchange is a loopnest pass under the new pass manager, but the loop nest is not constructed correctly by the loop pass manager after completing the first loop interchange pass and before running the second interchange pass. The loop in the IR is a triply nested loop. But after completing the first interchange pass, the loop nest constructed is a doubly nested loop which is incorrect and caused the trouble.

The reason that the loop nest is constructed incorrectly is that the outermost loop has changed after the first interchange, and what was the original outermost loop is not the current outermost loop anymore. For this bug (https://reviews.llvm.org/D124926#3718761) the original outermost loop L has actually become the middle loop. However the loop nest is still constructed based on L, that is why the loop nest is constructed as a doubly nested loop.

What this patch does is that, in the loop pass manager before running each pass, we always let L point to the current outermost loop, because loop nests should be constructed based on the outermost loop and it is only valid to run a loopnest pass when L is the outermost loop. Please refer to lines 89 to 94 in LoopPassManager.cpp in this patch.

Diff Detail

Event Timeline

congzhe created this revision.Aug 18 2022, 9:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2022, 9:02 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
congzhe requested review of this revision.Aug 18 2022, 9:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2022, 9:02 PM
congzhe edited the summary of this revision. (Show Details)Aug 18 2022, 9:06 PM
congzhe added reviewers: Whitney, bmahjour, Meinersbur, uabelho, Restricted Project.
congzhe added a project: Restricted Project.
congzhe edited the summary of this revision. (Show Details)Aug 18 2022, 9:10 PM

I have no idea if this is the right fix, but I've verified that it solves the problem that we saw.
Thanks!

bmahjour added inline comments.Aug 22 2022, 9:52 AM
llvm/lib/Transforms/Scalar/LoopPassManager.cpp
71

can we copy the address of L at the start of the function, so we don't have to change the reference to a pointer in the function's interface?

LGTM other than the comment from Bardia.

aeubanks added inline comments.
llvm/lib/Transforms/Scalar/LoopPassManager.cpp
93–94

this feels like a hack and might hide future issues around accidentally swapping loops around

augmenting LPMUpdater to have a method to set the loopnest and having loop-interchange use that feels like a more proper and explicit solution

thoughts?

LGTM other than the comment from Bardia.

Thanks for the review :)

llvm/lib/Transforms/Scalar/LoopPassManager.cpp
71

Thanks Bardia for your comment! I've updated the patch according to your comment.

93–94

Thanks for commenting! Following your comments I've updated the patch such that if "LoopNestAnalysis" is not preserved by a pass (which essentially results in IsLoopNestPtrValid=false), we re-construct the loop nest based on the current outermost loop. IMHO this logic is the same as your proposal that if a pass invalidates the loopnest, we'll set the loopnest again.

congzhe updated this revision to Diff 454630.Aug 22 2022, 3:21 PM
aeubanks added inline comments.Aug 24 2022, 1:46 PM
llvm/lib/Transforms/Scalar/LoopPassManager.cpp
93–94

first, the existing code seems to be overly complicated, https://reviews.llvm.org/D132581 cleans up the loop nest analysis code

but the bigger thing is that this still seems too magical. the ideology behind the loop pass manager (and the CGSCC pass manager) is that passes should explicitly tell the pass manager what happens to loops. changing L throughout the execution isn't really intended, we should be bailing out and letting the function->loop adaptor visit whatever we need to visit next via LPMUpdater::Worklist

IIUC, loop-interchange can arbitrarily change the nesting of a loop nest. the question is, when that happens, what should the loop pass manager do? I'd imagine that if loop-interchange makes a change, we'd want to bail out after it's done, then completely revisit all the loops in the loop nest inside-out to optimize them in their new nesting. but I'd like to hear your thoughts on this (and if I understand loop-interchange correctly).

I had something like https://reviews.llvm.org/D132599 in mind (currently it only revisits the new outermost loop, but something along those lines API-wise), but it seems to still hang on your repro. but that seems like an issue with loop-interchange not being idempotent, i.e. if you keep running it on a loop, it keeps generating IR which seems bad

congzhe added inline comments.Aug 25 2022, 3:17 PM
llvm/lib/Transforms/Scalar/LoopPassManager.cpp
93–94

Thanks for your input! I applied D132581 and D132599 and did some quick debugging. The reason that opt keeps generating IR is that we add the current outermost loop to the worklist via LPMUpdater at the end of loop interchange, so the worklist is never empty and we keep poping loops from it and run optimizations on it forever. The behavior occurs for both the new and legacy pass manager.

Regarding loop interchange behavior: under the new pass manager it is a loopnest pass, so we are not concerned about the inner or middle loops at all from loop pass manager's perspective. We won't visit all the loops in the loop nest inside-out but we only visit the outermost loops and construct loop nests based on outermost loops. It is my understanding that after the loop interchange pass makes a change (which can be not one but multiple interchanges running one iteration of the pass), we would need to construct the loop nest again based on the current outermost loop, before running the next loop interchange pass. I'd appreciate it if you could let me know if it sounds clear to you.

aeubanks added inline comments.Aug 29 2022, 10:48 AM
llvm/lib/Transforms/Scalar/LoopPassManager.cpp
93–94

D132599 only adds the loop nest back to the worklist if it made a change. It sounds like if you keep running loop-interchange on a loop nest, it can keep optimizing it forever, rather than eventually stopping. That seems bad.

In terms of visiting loops, I understand that that this is a loop nest pass and only runs on top level loops. But my question is, given that there are other loop passes that aren't loop nest passes in the same LPM, if we have loop-pass-A,loop-interchange,loop-pass-B and loop-interchange makes a change (say on L1(L2) where L1 is the outer loop, L2 is the inner loop -> L2(L1)), how should the LPM deal with the new loop nest? Should it revisit everything starting from the inner-most loop (e.g. the LPM started with L1(L2), visited L2 with loop-pass-A and loop-pass-B, then ran loop-pass-A on L1 and loop-interchange on the whole loop nest, interchanging the loops, now should it restart all the loop passes on L2(L1) starting with L1, then L2, because the structure changed)?

congzhe updated this revision to Diff 457479.Sep 1 2022, 7:50 PM
congzhe added inline comments.
llvm/lib/Transforms/Scalar/LoopPassManager.cpp
93–94

Thanks for this. The reason why loop interchange keeps optimizing the loopnest is that loop interchange can interchange loops either to achieve better locality, or to expose more vectorization opportunities. Following your terminology above, it may happen that for a loopnest L1(L2), loop interchange will interchange it to achieve better locality, resulting in L2(L1). Now if we run another loop interchange pass on it, it may still interchange the loopnest, resulting in L1(L2). This time the pass interchanges the loop because it might expose more vectorization opportunities. This behavior can go on and on.

In terms of visiting loops, I think what the pass manager does currently is as follows, which seems to be appropriate. With your example above, the LPM started with L1(L2), visited L2 with loop-pass-A and loop-pass-B, then ran loop-pass-A on L1 and loop-interchange on the whole loop nest, interchanging the loops which results in L2(L1). Then it would ran run loop-pass-B on L1. It looks appropriate because L1 is the one that we want to optimize with loop-pass-B at this point, even it becase the inner loop after interchange. Nevertheless, we can always "restart all the loop passes on L2(L1) starting with L1, then L2, because the structure changed". The potential drawback is that it causes compile time increase, as well as the infinite run problem we described in our first paragraph.

I do get your point that we may not want to change L in this function runWithLoopNestPasses(). I've provided another solution which does not change L but keeps a pointer to the outermost loop. It is based on your patch D132581. Whenever we need to run loopnest passes, we contruct the loopnest based on the outermost loop. The interface with the LPM::Updater (U.isLoopNestChanged() and U.markLoopNestChanged()) may not be necessary though. It suffices to only add the while loop while (auto *PL = OuterMostLoop->getParentLoop()) after PassPA = runSinglePass(LN, Pass, AM, AR, U, PI).

I'd appreciate it if you could let me know your comments on the most recent version. Thanks a lot.

Will rebase on D132581 once it is landed.

congzhe updated this revision to Diff 457680.EditedSep 2 2022, 1:00 PM

Rebased on D132581.

Subtleties of this update can be discussed upon, e.g., whether it is necessary to interact with the LPM::Updater, etc. Comments are appreciated :)

ping @aeubanks :)
I'm wondering if you have comments on my most recent change?

Sorry, I'd meant to reply but didn't find time, I'll try to be more prompt about responding to this

This is definitely better than the previous version.

IMO it would still be best if continually rerunning a pass on some IR would reach a fixed point. For example, when loop unswitching happens, we revisit the current loop (which means restarting the entire loop pipeline), and if we end up creating a new loop, we also add that new loop to the worklist. There has been care taken to ensure that unswitching cannot run forever. If interchange did this, we'd be able to just revisit the entire loop nest again. For the interchange example, you say that loop-interchange may continually swap between L1(L2) and L2(L1), one of them must be better than the other, why can't we converge on the better nesting? I don't quite understand the "locality vs vectorization" argument, there still must be one nesting that's ultimately optimal in the end. Then we can restart the pipeline on the loop nest, or just the loops that got swapped around.

But if that doesn't make sense, then something along these lines seems ok. I think it might be worth revisiting exactly how loop nest passes work, but I haven't thought too hard about it. But then again, this seems like a loop-interchange-specific issue, not a loop nest issue.

also you'll have to rebase again since my patch was reverted
also seems like we've dropped the test?

congzhe updated this revision to Diff 459900.EditedSep 13 2022, 3:33 PM

Sorry, I'd meant to reply but didn't find time, I'll try to be more prompt about responding to this

This is definitely better than the previous version.

IMO it would still be best if continually rerunning a pass on some IR would reach a fixed point. For example, when loop unswitching happens, we revisit the current loop (which means restarting the entire loop pipeline), and if we end up creating a new loop, we also add that new loop to the worklist. There has been care taken to ensure that unswitching cannot run forever. If interchange did this, we'd be able to just revisit the entire loop nest again. For the interchange example, you say that loop-interchange may continually swap between L1(L2) and L2(L1), one of them must be better than the other, why can't we converge on the better nesting? I don't quite understand the "locality vs vectorization" argument, there still must be one nesting that's ultimately optimal in the end. Then we can restart the pipeline on the loop nest, or just the loops that got swapped around.

But if that doesn't make sense, then something along these lines seems ok. I think it might be worth revisiting exactly how loop nest passes work, but I haven't thought too hard about it. But then again, this seems like a loop-interchange-specific issue, not a loop nest issue.

also you'll have to rebase again since my patch was reverted
also seems like we've dropped the test?

Thank you very much for the comment! I thought about it more and it looks that IsLoopNestPtrValid &= PassPA->getChecker<LoopNestAnalysis>().preserved() already serves as the interaction of a loopnest pass with the LPMUpdater. Although it does not explicitly interact with the LPMUpdater itself using something like LPMUpdater::isLoopNestChanged(), the fact that PA does not preserve LoopNestAnalysis already indicates that the loop nest is changed. Therefore, I've now added the re-construction of loop nest under if (!IsLoopNestPtrValid). I'd appreciate it if you could take a look.

Regarding reaching a fixed point for loop interchange: I fully agree with you that it would be ideal if we could reach a fixed point instead of running forever. And yes there should be one nesting that's ultimately optimal in the end. It's just with the current cost analysis in loop interchange, sometimes it has not yet been made clear which nesting (L1(L2) versus L2(L1), or locality versus vectorization) would be better. That is something that I think we can improve in the longer term.

I'm wondering if what I described above makes sense to you?

I think I'm ok with this sort of patch being a temporary workaround
could you add a FIXME pointing to the discussion here?

Using PreservedAnalyses to tell the loop pass manager that the loop nest structure has changed is still IMO an abuse of it. The whole point of PreservedAnalyses is that we determine whether or not some cached analysis for a given IR unit should be invalidated. The loop nest analysis shouldn't be modeled as a loop analysis since it's not at the same IR level (that's why https://reviews.llvm.org/D132581 is wrong, caused issues, and was reverted). This is essentially adding an extra bit to the pass's return value to tell the loop pass manager to change how it behaves, which is not what PreservedAnalyses is intended to be used for. But that's exactly why LPMUpdater exists, for this sort of thing. So I'd still like to go back to the LPMUpdater version.

I also think there may be issues with loop analyses not being invalidated in regards to loop-interchange. If an inner loop has a cached analysis, right now I believe interchanging does not invalidate the (original) inner loop's analyses, but we may end up running a loop nest pass on it with its old analyses once it's the outer loop, even though the loop structure has changed. But that can be a separate patch.

congzhe updated this revision to Diff 460807.EditedSep 16 2022, 9:49 AM

I think I'm ok with this sort of patch being a temporary workaround
could you add a FIXME pointing to the discussion here?

Using PreservedAnalyses to tell the loop pass manager that the loop nest structure has changed is still IMO an abuse of it. The whole point of PreservedAnalyses is that we determine whether or not some cached analysis for a given IR unit should be invalidated. The loop nest analysis shouldn't be modeled as a loop analysis since it's not at the same IR level (that's why https://reviews.llvm.org/D132581 is wrong, caused issues, and was reverted). This is essentially adding an extra bit to the pass's return value to tell the loop pass manager to change how it behaves, which is not what PreservedAnalyses is intended to be used for. But that's exactly why LPMUpdater exists, for this sort of thing. So I'd still like to go back to the LPMUpdater version.

I also think there may be issues with loop analyses not being invalidated in regards to loop-interchange. If an inner loop has a cached analysis, right now I believe interchanging does not invalidate the (original) inner loop's analyses, but we may end up running a loop nest pass on it with its old analyses once it's the outer loop, even though the loop structure has changed. But that can be a separate patch.

Thanks for your clarification regarding PreservedAnalyses and LPMUpdater, I see your point. I've updated the patch again trying to incorporate LPMUpdater for this purpose. The reason that I dropped the use of LPMUpdater in my last version is that, since D132581 is reverted, it becomes a bit less straightforward to directly use LPMUpdater instead of "IsLoopNestPtrValid" and I thought it might suffice just to use "IsLoopNestPtrValid" as the indicator whether the loop nest has been changed.

Now I added the use of LPMUpdater back to the patch, and I added a FIXME that says we should not rely on PreservedAnalyses in the long run and we should use LPMUpdater only (I guess once D132581 is re-landed we could remove "IsLoopNestPtrValid"?). I'd appreciate it if you could take a second look :)

aeubanks accepted this revision.Sep 16 2022, 2:22 PM

thanks, lgtm with one nit
sorry for the long back and forth

I do think that the loop nest infrastructure in general needs some work and doesn't fit in with the rest of the new pass manager infrastructure very well. there should have been an RFC (unless I missed it)

llvm/include/llvm/Transforms/Scalar/LoopPassManager.h
365

markLoopNestChanged

This revision is now accepted and ready to land.Sep 16 2022, 2:22 PM

btw perhaps another way of preventing infinite interchanging is to add some metadata to the loop, like unswitching does https://github.com/llvm/llvm-project/blob/7fb96fb5d33ee55fa5b65497c6074086f43babd2/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp#L3145

although perhaps you do want interchanging to potentially change its mind if we ever simplify a loop, assuming that loop passes will only ever simplify so we converge at some point

congzhe updated this revision to Diff 461943.Sep 21 2022, 10:36 AM

thanks, lgtm with one nit
sorry for the long back and forth

I do think that the loop nest infrastructure in general needs some work and doesn't fit in with the rest of the new pass manager infrastructure very well. there should have been an RFC (unless I missed it)

Thank you very much! I've done a minor update to address the minor comment. I'll land it shortly.

Regarding preventing infinite interchanging: thanks for your suggestion, this is definitely helpful and using metadata could be a way to resolving the infinite interchange. Another direction is to improve the cost analysis. I'll think more about it.

This revision was landed with ongoing or failed builds.Sep 21 2022, 9:00 PM
This revision was automatically updated to reflect the committed changes.