In D109958 it was noticed that we could optimise the pipeline and avoid rerunning LoopSimplify/LCSSA for LoopFlatten by moving it to a LoopPassManager. I think the reason why it currently is added to a FunctionPassManager, is because under the legacy-PM a loop pass manager couldn't handle a loop pass deleting/modifying a loop.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
My understanding was that the LoopFlattening was where it was in the pipeline because that gave better results than putting it elsewhere. Any ideas why?
I haven't moved LoopFlattening around in the past, and can't recall we have experimented with that much. Perhaps someone else did.... So I can't answer why, but I am also not sure much thought has gone into it in the pass. When it was function pass, my guess was this was a reasonable place where it was doing its job. My intention is to make it LoopNestPass, and have only moved it a bit to add it to the LPM2. I thought important was that Invar-simplify and simplify-cfg runs before LoopFlatten, which is still the case. I did check the benchmark that we care about, and actually saw a small improvement. No idea why, but it was confirmation enough that this move seems reasonable.
This removes the test changes, I had indeed incorrectly put this patch on top of D109958.
In D109958 I missed the fact that LoopFlatten does not preserve any analyses. This is, I assume, the reason it was not added alongside other Loop passes before and probably also why I'm seeing a crash when adding it as part of LPM2.
The pass should not misbehave on its own in a LoopPassManager because it's not using the Updater to readd loops for reprocessing, otherwise it would be incorrect alone too.
So either LoopFlatten needs to be remain in it's own LPM or taught to preserve analyses (LoopInfo, DominatorTree, etc)
I think there are 2 separate things going on here:
- A recent patch is causing a problem (as reported in D109958), which I am investigating and fixing. I suspect this is the same crash that you're seeing.
- In addition, we have the question of preserved analysis. LoopFlatten removes an inner-loop, so it sounds right to me that LoopFlatten doesn't preserve LoopInfo, DomTree, or readd loops, etc. I was assuming that when a pass doesn't preserve an analysis, the pass manager reruns it when a subsequent pass requires it. But please advise if LoopFlatten is fine where it is here in this patch, or if it needs moving to its own LPM.
Also having talked to @dmgreen , I am now more convinced that having LoopFlatten separate makes sense (because it doesn't preserve much info as it works on pairs of outer/inner loops and removes the inner if it can). So I think we can abandon this patch. I can look into preserving LCSSA though, perhaps that can be kept up to date avoiding rerunning the pass.
FWIW, there are other loop passes that drastically change the loop structure (e.g. SimpleLoopUnswitch), and preserve all analyses, including LoopInfo.
Even if LoopFlatten is in its own LPM, doesn't it still break the LPM contract by not preserving analyses? Wouldn't the function to loop pass adaptor still claim the standard analyses are preserved, even though they actually aren't?
I don't understand why this is not already triggering verification failures.
Ah, I think the reason is that it actually does preserve the analyses, or at least some subset of them: https://github.com/llvm/llvm-project/blob/d550930afcbb84740681c219ab13efd133143f88/llvm/lib/Transforms/Scalar/LoopFlatten.cpp#L645 https://github.com/llvm/llvm-project/blob/d550930afcbb84740681c219ab13efd133143f88/llvm/lib/Transforms/Scalar/LoopFlatten.cpp#L663-L665
Possibly it already preserves everything properly and just needs to report that it does? Plus report the deleted loop to the pass manager?
New pass manager loop pass invalidation isn't exactly my area of expertise. My main issue with moving the pass was that performance was worse. I don't now remember why, but I have a vague memory of widening IV's in the flattening pass, if they have already been partially widened in indvarsimplify.
It may be important to note that LoopFlatten isn't a Loop pass. It's a LoopNest Pass and has been since D102904.
Would it be possible to reevaluate whether that issue still exists? If it is important for performance to have LoopFlatten in a standalone LPM, we should have a PhaseOrdering test that demonstrates that (which would have failed with this patch and made the issue obvious).
I would like to pick this up, have reread the comments, but it's still a bit unclear to me what needs doing at this point.
First, returning to @nikic comment:
Possibly it already preserves everything properly and just needs to report that it does? Plus report the deleted loop to the pass manager?
Yeah, I think so, just wanted to confirm this: it uses forgetLoop from ScalarEvolution and markLoopAsDeleted from the LPMUpdater to invalidate the old loops, and updates the DomTree with deleteEdge.
So does this mean that this patch is fine as it is? Modulo "the report the deleted loop to the pass manage" perhaps, will need to look into that.
Can you please advise @nikic , @asbirlea ?