This is an archive of the discontinued LLVM Phabricator instance.

[LoopTerminology] LCSSA Form
ClosedPublic

Authored by baziotis on Feb 27 2020, 2:08 AM.

Details

Summary

LCSSA Form documentation.

Diff Detail

Event Timeline

baziotis created this revision.Feb 27 2020, 2:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2020, 2:08 AM

I was thinking if it is a good idea to add @Meinersbur 's bad parts in this video: https://youtu.be/QpvZt9w-Jik?t=499 (at 8:20).

Meinersbur added inline comments.Feb 27 2020, 1:05 PM
llvm/docs/LoopTerminology.rst
211–212

Note that this applies to the new pass manager only (by FunctionToLoopPassAdaptor): This does not happen with the legacy (current default) pass manager.

248–251

For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages.

LLVM uses a linked linked list to list all the uses of an instruction. There is a convenience function replaceAllUsesWith (or short: RAUW) does this replacement. For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock. For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the exit block) as well.

I think the primary reason is that loop analysis become more independent (3). Say we perform a loop analysis on all loops in a function that stores reference to llvm::Values used in the loops, such as ScalarEvolution. In a second step, we transform all loops using the analysis' result which does RAUW. Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops. However, if we have loop transformations that transform inner as well as outer loops, we still need to handle the case that transforming the inner (respectively outer) loop may invalidate an analysis on the other.

Another reason is that with LCSSA, ScalarEvolution::getSCEV is sufficient (2). Otherwise, one needs to use getSCEVAtScope to specify whether we are using the SCEV inside or outside. Polly does not require LCSSA, hence uses mostly getSCEVAtScope.

GCC has documentation about the advantages of LCSSA: https://gcc.gnu.org/onlinedocs/gccint/LCSSA.html . The numbers (1-3) correspond to the bullet list.

baziotis marked 2 inline comments as done.Feb 28 2020, 1:18 PM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
211–212

Ok, I'm not very familiar with pass managers. So, in the old pass manager, you have to invoke LCSSA automatically or something?

248–251

Thank you for the explanation! Some questions / comments:

For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages.

That was actually a note from Chris Lattner (I couldn't have thought that haha). What I understood isn't that we can avoid deep analysis completely. Rather, deep analysis will be done for LCSSA. But then, loop optimizations have to preserve LCSSA (which from what I understand should be relatively easy), so when the time comes to do loop unswitching, LCSSA is there and helps you do it quick.

There is a convenience function replaceAllUsesWith (or short: RAUW) does this replacement.

"this replacement" being loop unswitching ?

For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock

Why exactly the distinction ? What I understand is that, when doing RAUW on a basic block or a bunch of basic blocks (or just one), no matter if they're a loop or not, if you know which values are live outside them (e.g. using single entry PHI nodes), you make your life easier. Because you replace the inside uses and those on the PHI nodes on the boundary. Compared to if you didn't have those,
where you would have to replace the uses across the whole function.

Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops

Wow, that was very smart.

Another reason is that with LCSSA, ScalarEvolution::getSCEV is sufficient (2). Otherwise, one needs to use getSCEVAtScope to specify whether we are using the SCEV inside or outside.

I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM). I guess because no instruction of the loop is used outside of it, so SCEV for a loop can be limited to the loop. Otherwise, a value may be used wherever and you have to include it in the scope if you are to do SCEV correctly.

Meinersbur added inline comments.Mar 1 2020, 11:42 AM
llvm/docs/LoopTerminology.rst
211–212

In the legacy pass manger, those passes are added as dependency of the loop passes. The Helper function [[ https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopUtils.cpp#L141 | getLoopAnalysisUsage ]] add them them as requirements and should be called by all loop passes in getAnalysisUsage. Might be useful to mention that.

248–251

Since I haven't implemented a loop unswitch pass myself, I can't really argue how difficult it it, so I don't request to change it.

Interestingly, the source referenced by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr.

Maybe you could add the other advantages as well?

"this replacement" being loop unswitching ?

Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops.

Why exactly the distinction ?

It's surely fewer values to replace (one in the PHI changes all uses outside the loop). Whether a value is used outside can be checked using L->contains(Use) IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally.

I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM).

The SCEV for inside the loop references the induction variable in an AddRec expression. This is only valid to expand inside the loop. When outside the loop, you need the expression representing the exit value of the loop, or SCEVUnkown when the exit value cannot be computed, e.g. because the loop exit condition is not known.

FYI: Here's the paper by @sebpop explaining loop-closed SSA http://cri.ensmp.fr/classement/doc/E-285.pdf hth.

baziotis marked an inline comment as done.Mar 1 2020, 4:51 PM

FYI: Here's the paper by @sebpop explaining loop-closed SSA http://cri.ensmp.fr/classement/doc/E-285.pdf hth.

Thanks! The paper is very interesting but it seems to focus on an "extended" loop-closed SSA form that captures more
info than the single phi nodes (referred as close() nodes in the paper). Most of the reasoning of LCSSA with just phi nodes
is written in the start of part 4.1. I'm not sure if it makes a good argument for LCSSA with phi nodes but to anyone reading
this, please make a note if I miss anything.

llvm/docs/LoopTerminology.rst
248–251

Interestingly, the source referenced by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr.

I hadn't seen that, thanks. What is a SECExpr?

Maybe you could add the other advantages as well?

Yes. You mean just the one I guess, about the induction variables.

Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops.

Oh, I got what you mean. Even if you have LCSSA before the unswitching, after it, you still have to go and update all the uses
with the new PHINode that combines the 2 PHIs of the versioned loops.

Whether a value is used outside can be checked using L->contains(Use)

Are you sure? L->contains() I think doesn't get Use. Given a Value, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what replaceLoopInvariantUses() does.

IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally.

I agree.

The SCEV for inside the loop references...

Got it, thanks!

Meinersbur added inline comments.Mar 2 2020, 11:34 AM
llvm/docs/LoopTerminology.rst
181–184

Maybe remove the if-statement here, it is specific to loop unswitching

248–251

What is a SECExpr?

ScalarEvolution::SCEVExpr. Sorry for the typo.

Yes. You mean just the one I guess, about the induction variables.

Why not all of them?

Even if you have LCSSA before the unswitching, after it, you still have to go and update all the uses

with the new PHINode that combines the 2 PHIs of the versioned loops.

SimpleLoopUnswitch does as you described: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp#L255

for (...) {
  %val = ...
} 
%lcssa = phi [%val]

to

if (c) {
  for (...) {
    %val1 = ...
  }
  %newlcssa1 = phi [%val1] // new PHI
} else {
  for (...) {
    %val2 = ...
  }
  %newlcssa2 = phi [%val2] // new PHI
}
%lcssa = phi [%newlcssa1, %newlcssa2] // re-use old LCSSA node  here, not an LCSSA node any more

Are you sure? L->contains() I think doesn't get Use.

You'd use it on the instruction that uses the value.

Given a Value, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what replaceLoopInvariantUses() does.

It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA. replaceLoopInvariantUses() uses it to only change something for users inside the loop, i.e. ignore LCSSA PHIs. You can do the same just to find outside users.

baziotis marked 3 inline comments as done.Mar 2 2020, 1:07 PM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
181–184

Yes, I'll give a simple example, then an example specific to loop unswitching.

211–212

Ok, thanks, I can't really find much info about the pass managers. I'll add that.

248–251

Why not all of them?

Here, Zdenek Dvorak mentions 2 advantages I think:

1) Updating it during unrolling/peeling/versioning is trivial, since
     we do not need to care about the uses outside of the loop.
2) About induction variables

I think I have already mentioned 1) with the example of loop unswitching, no?

You may mean the other GCC page for LCSSA. Basically, the first 2 there are the same. The third I think is not an advantage. If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node.

with the new PHINode that combines the 2 PHIs of the versioned loops.

SimpleLoopUnswitch does as you described:

Great, now it's clear!

It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA.

Ok, got it.

Meinersbur added inline comments.Mar 6 2020, 11:20 AM
llvm/docs/LoopTerminology.rst
248–251

You may pick which advantages of LCSSA are worth mentioning.

If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node.

If one only needs to look up outside users, one can also iterate over all the PHIs in the exit blocks.

baziotis marked an inline comment as done.Mar 6 2020, 11:55 AM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
248–251

If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node.

If one only needs to look up outside users, one can also iterate over all the PHIs in the exit blocks.

You added that as another advantage ?
Or as a correction to what I said it is the 3rd advantage that GCC mentions?

baziotis updated this revision to Diff 251522.Mar 19 2020, 6:30 PM
  • Added other advantages.
  • Rephrased Chris's wording a little bit as it was not very clear to me. Also, the deep analysis, the more I thought of it and as @Meinersbur mentioned,

does not seem to be needed any way. We could use RAUW() to achieve this effect (although it would still be more expensive).

2 questions:

  1. Is it "loop closed PHI node" or "loop closing PHI node"?
  2. Does LCSSA pass have to do this deep analysis? Because to me, it seems it can just use RAUW?

2 questions:

  1. Is it "loop closed PHI node" or "loop closing PHI node"?

Not being an expert of the English language, I;d say "loop-closing PHI node"

  1. Does LCSSA pass have to do this deep analysis? Because to me, it seems it can just use RAUW?

The most complicated thing is finding the dominator nodes to add the PHI nodes in (especially with multiple loop exits).

If possible, could you avoid "deep analysis", but instead refer the kind of analysis needed?

llvm/docs/LoopTerminology.rst
261–264

You you replace "deep analysis" by "data flow analysis"? Also mention this the use-def list is a property of SSA in general. Without mentioning these terms, I had to read this paragraph 3 times before I understood what ou want to say.

266

If you think it's worth mentioning, you could add that it's also easy to find all values that are used outside the loop: iterate over the PHINodes in the exit nodes (instead looking of all use-lists of all instructions in the loop).

285–287

Could you add a sentence introducing SCEV? In this sentence it appears without context.

baziotis marked 3 inline comments as done.Mar 26 2020, 9:31 AM

2 questions:

  1. Is it "loop closed PHI node" or "loop closing PHI node"?

Not being an expert of the English language, I;d say "loop-closing PHI node"

  1. Does LCSSA pass have to do this deep analysis? Because to me, it seems it can just use RAUW?

The most complicated thing is finding the dominator nodes to add the PHI nodes in (especially with multiple loop exits).

If possible, could you avoid "deep analysis", but instead refer the kind of analysis needed?

Yes. Could you explain the part about dominator nodes ? I understand how dominators help in the identification
of loops and how dominance frontiers help in the placement of normal PHI nodes (i.e. when 2 or more paths converge). But I don't see how dominators
help with the placement of LCSSA phi nodes.

llvm/docs/LoopTerminology.rst
261–264

Yes.

Without mentioning these terms, I had to read this paragraph 3 times before I understood what ou want to say.

Ok, I'll remove the LLVM-specific part (internal data structure for uses etc.) and I'll focus on the things that we know from SSA anyway (def-use chain).

266

I added it above. It's the first benefit. But I didn't add the "instead" part, I'll add it.

285–287

Yes, I should somehow add it before these 2 paragraphs.

Meinersbur added a comment.EditedMar 28 2020, 6:06 PM

Yes. Could you explain the part about dominator nodes ? I understand how dominators help in the identification
of loops and how dominance frontiers help in the placement of normal PHI nodes (i.e. when 2 or more paths converge). But I don't see how dominators
help with the placement of LCSSA phi nodes.

It is simple if the loop has just a single exit, but much more involved with multiple exits. Example CFG:

Header:
  %val = ...

Header->{Latch1,Latch2,Latch3}->Header

Header->Latch1->Exit1
Header->Latch2->Exit2
Header->Latch3->Exit3

Exit->Successor1

Successor1->SharedSucc
Exit2->SharedSucc

SharedSucc->CommonSucc
Exit3->CommonSucc

CommonSucc:
  use(%val)

Header dominates all other BBs in this example, hence without LCSSA, this is well-formed. However, with LCSSA we have to insert phis into Exit1,Exit2,Exit3 such that the end result has to look like:

Exit1:
  %lcssa1 = phi [%val, %Latch1]

Exit2:
  %lcssa2 = phi [%val, %Latch2]

Exit3:
  %lcssa3 = phi [%val, %Latch3]

SharedSucc:
  %sharedval = phi [%lcssa1 , %Exit1], [%lcssa2, %Exit2]

CommonSucc:
  %commonval = phi [%lcssa3 , %Exit3], [%sharedval , %SharedSucc]
  use(%commonval)

For computing %sharedval and %commonval, DominanceFrontier is needed again, hence LCSSA is not less complicated than SSA itself.

Yes. Could you explain the part about dominator nodes ? I understand how dominators help in the identification
of loops and how dominance frontiers help in the placement of normal PHI nodes (i.e. when 2 or more paths converge). But I don't see how dominators
help with the placement of LCSSA phi nodes.

It is simple if the loop has just a single exit, but much more involved with multiple exits. Example CFG:

Header:
  %val = ...

Header->{Latch1,Latch2,Latch3}->Header

Header->Latch1->Exit1
Header->Latch2->Exit2
Header->Latch3->Exit3

Exit->Successor1

Successor1->SharedSucc
Exit2->SharedSucc

SharedSucc->CommonSucc
Exit3->CommonSucc

CommonSucc:
  use(%val)

Header dominates all other BBs in this example, hence without LCSSA, this is well-formed. However, with LCSSA we have to insert phis into Exit1,Exit2,Exit3 such that the end result has to look like:

Exit1:
  %lcssa1 = phi [%val, %Latch1]

Exit2:
  %lcssa2 = phi [%val, %Latch2]

Exit3:
  %lcssa3 = phi [%val, %Latch3]

SharedSucc:
  %sharedval = phi [%lcssa1 , %Exit1], [%lcssa2, %Exit2]

CommonSucc:
  %commonval = phi [%lcssa3 , %Exit3], [%sharedval , %SharedSucc]
  use(%commonval)

For computing %sharedval and %commonval, DominanceFrontier is needed again, hence LCSSA is not less complicated that SSA itself.

Ok, I get it now, thanks! So with multiple exits one has to compute dominator info again.

baziotis updated this revision to Diff 253466.Mar 29 2020, 6:05 PM
  • Various typos.
  • Added def-use chain info as a footnote since its used twice. Also a footnote about LCSSA construction (I consider it interesting but in a footnote because it may be irrelevant in the main text).
  • Added some info about SCEV.
Meinersbur added inline comments.Mar 29 2020, 11:35 PM
llvm/docs/LoopTerminology.rst
193

"like so" is spoken English only

298–300

cf "Howver, ..."

I don't think this says what you want to say.

Suggestion: "In LCSSA form, loop-variant values and and its value after the last iteration of the loop are represented by two different llvm::Instructions. Hence the llvm::Instruction itself disambiguates the context of evaluation such that it is safe to just use getSCEV()."

baziotis marked 2 inline comments as done.Mar 30 2020, 5:43 AM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
193

Yes sorry. I didn't like the previous: "this ... like this:". I'll replace it with "as follows".

298–300

I agree. But I want to connect it with the more abstract term "expression", because the problem is not LLVM specific. So, I wrote this paragraph multiple times trying to balance this. I'll add a little more to connect the abstract term (expression) with its representation in LLVM (Instruction) and you can tell me what you think.

baziotis updated this revision to Diff 253577.Mar 30 2020, 6:04 AM
  • Addressed SCEV behavior in loop-variant expressions.
Meinersbur accepted this revision.Mar 30 2020, 9:56 PM

LGTM. Thanks for the effort!

Don't forget to add "Differential Revision:" to the commit message.

This revision is now accepted and ready to land.Mar 30 2020, 9:56 PM

LGTM. Thanks for the effort!

No, problem, thanks for the review and the info!

Don't forget to add "Differential Revision:" to the commit message.

Yes. :)

This revision was automatically updated to reflect the committed changes.