LCSSA Form documentation.
Details
Diff Detail
Event Timeline
I was thinking if it is a good idea to add @Meinersbur 's bad parts in this video: https://youtu.be/QpvZt9w-Jik?t=499 (at 8:20).
llvm/docs/LoopTerminology.rst | ||
---|---|---|
211–212 | Note that this applies to the new pass manager only (by FunctionToLoopPassAdaptor): This does not happen with the legacy (current default) pass manager. | |
248–251 | For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages. LLVM uses a linked linked list to list all the uses of an instruction. There is a convenience function replaceAllUsesWith (or short: RAUW) does this replacement. For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock. For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the exit block) as well. I think the primary reason is that loop analysis become more independent (3). Say we perform a loop analysis on all loops in a function that stores reference to llvm::Values used in the loops, such as ScalarEvolution. In a second step, we transform all loops using the analysis' result which does RAUW. Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops. However, if we have loop transformations that transform inner as well as outer loops, we still need to handle the case that transforming the inner (respectively outer) loop may invalidate an analysis on the other. Another reason is that with LCSSA, ScalarEvolution::getSCEV is sufficient (2). Otherwise, one needs to use getSCEVAtScope to specify whether we are using the SCEV inside or outside. Polly does not require LCSSA, hence uses mostly getSCEVAtScope. GCC has documentation about the advantages of LCSSA: https://gcc.gnu.org/onlinedocs/gccint/LCSSA.html . The numbers (1-3) correspond to the bullet list. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
211–212 | Ok, I'm not very familiar with pass managers. So, in the old pass manager, you have to invoke LCSSA automatically or something? | |
248–251 | Thank you for the explanation! Some questions / comments:
That was actually a note from Chris Lattner (I couldn't have thought that haha). What I understood isn't that we can avoid deep analysis completely. Rather, deep analysis will be done for LCSSA. But then, loop optimizations have to preserve LCSSA (which from what I understand should be relatively easy), so when the time comes to do loop unswitching, LCSSA is there and helps you do it quick.
"this replacement" being loop unswitching ?
Why exactly the distinction ? What I understand is that, when doing RAUW on a basic block or a bunch of basic blocks (or just one), no matter if they're a loop or not, if you know which values are live outside them (e.g. using single entry PHI nodes), you make your life easier. Because you replace the inside uses and those on the PHI nodes on the boundary. Compared to if you didn't have those,
Wow, that was very smart.
I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM). I guess because no instruction of the loop is used outside of it, so SCEV for a loop can be limited to the loop. Otherwise, a value may be used wherever and you have to include it in the scope if you are to do SCEV correctly. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
211–212 | In the legacy pass manger, those passes are added as dependency of the loop passes. The Helper function [[ https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopUtils.cpp#L141 | getLoopAnalysisUsage ]] add them them as requirements and should be called by all loop passes in getAnalysisUsage. Might be useful to mention that. | |
248–251 | Since I haven't implemented a loop unswitch pass myself, I can't really argue how difficult it it, so I don't request to change it. Interestingly, the source referenced by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr. Maybe you could add the other advantages as well?
Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops.
It's surely fewer values to replace (one in the PHI changes all uses outside the loop). Whether a value is used outside can be checked using L->contains(Use) IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally.
The SCEV for inside the loop references the induction variable in an AddRec expression. This is only valid to expand inside the loop. When outside the loop, you need the expression representing the exit value of the loop, or SCEVUnkown when the exit value cannot be computed, e.g. because the loop exit condition is not known. |
FYI: Here's the paper by @sebpop explaining loop-closed SSA http://cri.ensmp.fr/classement/doc/E-285.pdf hth.
FYI: Here's the paper by @sebpop explaining loop-closed SSA http://cri.ensmp.fr/classement/doc/E-285.pdf hth.
Thanks! The paper is very interesting but it seems to focus on an "extended" loop-closed SSA form that captures more
info than the single phi nodes (referred as close() nodes in the paper). Most of the reasoning of LCSSA with just phi nodes
is written in the start of part 4.1. I'm not sure if it makes a good argument for LCSSA with phi nodes but to anyone reading
this, please make a note if I miss anything.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
248–251 |
I hadn't seen that, thanks. What is a SECExpr?
Yes. You mean just the one I guess, about the induction variables.
Oh, I got what you mean. Even if you have LCSSA before the unswitching, after it, you still have to go and update all the uses
Are you sure? L->contains() I think doesn't get Use. Given a Value, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what replaceLoopInvariantUses() does.
I agree.
Got it, thanks! |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
181–184 | Maybe remove the if-statement here, it is specific to loop unswitching | |
248–251 |
ScalarEvolution::SCEVExpr. Sorry for the typo.
Why not all of them?
with the new PHINode that combines the 2 PHIs of the versioned loops. SimpleLoopUnswitch does as you described: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp#L255 for (...) { %val = ... } %lcssa = phi [%val] to if (c) { for (...) { %val1 = ... } %newlcssa1 = phi [%val1] // new PHI } else { for (...) { %val2 = ... } %newlcssa2 = phi [%val2] // new PHI } %lcssa = phi [%newlcssa1, %newlcssa2] // re-use old LCSSA node here, not an LCSSA node any more
You'd use it on the instruction that uses the value.
It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA. replaceLoopInvariantUses() uses it to only change something for users inside the loop, i.e. ignore LCSSA PHIs. You can do the same just to find outside users. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
181–184 | Yes, I'll give a simple example, then an example specific to loop unswitching. | |
211–212 | Ok, thanks, I can't really find much info about the pass managers. I'll add that. | |
248–251 |
Here, Zdenek Dvorak mentions 2 advantages I think: 1) Updating it during unrolling/peeling/versioning is trivial, since we do not need to care about the uses outside of the loop. 2) About induction variables I think I have already mentioned 1) with the example of loop unswitching, no? You may mean the other GCC page for LCSSA. Basically, the first 2 there are the same. The third I think is not an advantage. If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node.
SimpleLoopUnswitch does as you described: Great, now it's clear!
Ok, got it. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
248–251 | You may pick which advantages of LCSSA are worth mentioning.
If one only needs to look up outside users, one can also iterate over all the PHIs in the exit blocks. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
248–251 |
You added that as another advantage ? |
- Added other advantages.
- Rephrased Chris's wording a little bit as it was not very clear to me. Also, the deep analysis, the more I thought of it and as @Meinersbur mentioned,
does not seem to be needed any way. We could use RAUW() to achieve this effect (although it would still be more expensive).
2 questions:
- Is it "loop closed PHI node" or "loop closing PHI node"?
- Does LCSSA pass have to do this deep analysis? Because to me, it seems it can just use RAUW?
Not being an expert of the English language, I;d say "loop-closing PHI node"
- Does LCSSA pass have to do this deep analysis? Because to me, it seems it can just use RAUW?
The most complicated thing is finding the dominator nodes to add the PHI nodes in (especially with multiple loop exits).
If possible, could you avoid "deep analysis", but instead refer the kind of analysis needed?
llvm/docs/LoopTerminology.rst | ||
---|---|---|
261–264 | You you replace "deep analysis" by "data flow analysis"? Also mention this the use-def list is a property of SSA in general. Without mentioning these terms, I had to read this paragraph 3 times before I understood what ou want to say. | |
266 | If you think it's worth mentioning, you could add that it's also easy to find all values that are used outside the loop: iterate over the PHINodes in the exit nodes (instead looking of all use-lists of all instructions in the loop). | |
285–287 | Could you add a sentence introducing SCEV? In this sentence it appears without context. |
Yes. Could you explain the part about dominator nodes ? I understand how dominators help in the identification
of loops and how dominance frontiers help in the placement of normal PHI nodes (i.e. when 2 or more paths converge). But I don't see how dominators
help with the placement of LCSSA phi nodes.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
261–264 | Yes.
Ok, I'll remove the LLVM-specific part (internal data structure for uses etc.) and I'll focus on the things that we know from SSA anyway (def-use chain). | |
266 | I added it above. It's the first benefit. But I didn't add the "instead" part, I'll add it. | |
285–287 | Yes, I should somehow add it before these 2 paragraphs. |
It is simple if the loop has just a single exit, but much more involved with multiple exits. Example CFG:
Header: %val = ... Header->{Latch1,Latch2,Latch3}->Header Header->Latch1->Exit1 Header->Latch2->Exit2 Header->Latch3->Exit3 Exit->Successor1 Successor1->SharedSucc Exit2->SharedSucc SharedSucc->CommonSucc Exit3->CommonSucc CommonSucc: use(%val)
Header dominates all other BBs in this example, hence without LCSSA, this is well-formed. However, with LCSSA we have to insert phis into Exit1,Exit2,Exit3 such that the end result has to look like:
Exit1: %lcssa1 = phi [%val, %Latch1] Exit2: %lcssa2 = phi [%val, %Latch2] Exit3: %lcssa3 = phi [%val, %Latch3] SharedSucc: %sharedval = phi [%lcssa1 , %Exit1], [%lcssa2, %Exit2] CommonSucc: %commonval = phi [%lcssa3 , %Exit3], [%sharedval , %SharedSucc] use(%commonval)
For computing %sharedval and %commonval, DominanceFrontier is needed again, hence LCSSA is not less complicated than SSA itself.
Ok, I get it now, thanks! So with multiple exits one has to compute dominator info again.
- Various typos.
- Added def-use chain info as a footnote since its used twice. Also a footnote about LCSSA construction (I consider it interesting but in a footnote because it may be irrelevant in the main text).
- Added some info about SCEV.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
193 | "like so" is spoken English only | |
298–300 | cf "Howver, ..." I don't think this says what you want to say. Suggestion: "In LCSSA form, loop-variant values and and its value after the last iteration of the loop are represented by two different llvm::Instructions. Hence the llvm::Instruction itself disambiguates the context of evaluation such that it is safe to just use getSCEV()." |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
193 | Yes sorry. I didn't like the previous: "this ... like this:". I'll replace it with "as follows". | |
298–300 | I agree. But I want to connect it with the more abstract term "expression", because the problem is not LLVM specific. So, I wrote this paragraph multiple times trying to balance this. I'll add a little more to connect the abstract term (expression) with its representation in LLVM (Instruction) and you can tell me what you think. |
LGTM. Thanks for the effort!
Don't forget to add "Differential Revision:" to the commit message.
No, problem, thanks for the review and the info!
Don't forget to add "Differential Revision:" to the commit message.
Yes. :)
Maybe remove the if-statement here, it is specific to loop unswitching