This is an archive of the discontinued LLVM Phabricator instance.

llvm/docs/LoopTerminology.rst
211–212	Note that this applies to the new pass manager only (by `FunctionToLoopPassAdaptor`): This does not happen with the legacy (current default) pass manager.
248–251	For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages. LLVM uses a linked linked list to list all the uses of an instruction. There is a convenience function `replaceAllUsesWith` (or short: RAUW) does this replacement. For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock. For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the exit block) as well. I think the primary reason is that loop analysis become more independent (3). Say we perform a loop analysis on all loops in a function that stores reference to `llvm::Value`s used in the loops, such as ScalarEvolution. In a second step, we transform all loops using the analysis' result which does RAUW. Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops. However, if we have loop transformations that transform inner as well as outer loops, we still need to handle the case that transforming the inner (respectively outer) loop may invalidate an analysis on the other. Another reason is that with LCSSA, `ScalarEvolution::getSCEV` is sufficient (2). Otherwise, one needs to use `getSCEVAtScope` to specify whether we are using the SCEV inside or outside. Polly does not require LCSSA, hence uses mostly `getSCEVAtScope`. GCC has documentation about the advantages of LCSSA: https://gcc.gnu.org/onlinedocs/gccint/LCSSA.html . The numbers (1-3) correspond to the bullet list.

baziotis marked 2 inline comments as done.Feb 28 2020, 1:18 PM

baziotis added inline comments.

llvm/docs/LoopTerminology.rst
211–212	Ok, I'm not very familiar with pass managers. So, in the old pass manager, you have to invoke LCSSA automatically or something?
248–251	Thank you for the explanation! Some questions / comments: For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages. That was actually a note from Chris Lattner (I couldn't have thought that haha). What I understood isn't that we can avoid deep analysis completely. Rather, deep analysis will be done for LCSSA. But then, loop optimizations have to preserve LCSSA (which from what I understand should be relatively easy), so when the time comes to do loop unswitching, LCSSA is there and helps you do it quick. There is a convenience function replaceAllUsesWith (or short: RAUW) does this replacement. "this replacement" being loop unswitching ? For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock Why exactly the distinction ? What I understand is that, when doing RAUW on a basic block or a bunch of basic blocks (or just one), no matter if they're a loop or not, if you know which values are live outside them (e.g. using single entry PHI nodes), you make your life easier. Because you replace the inside uses and those on the PHI nodes on the boundary. Compared to if you didn't have those, where you would have to replace the uses across the whole function. Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops Wow, that was very smart. Another reason is that with LCSSA, ScalarEvolution::getSCEV is sufficient (2). Otherwise, one needs to use getSCEVAtScope to specify whether we are using the SCEV inside or outside. I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM). I guess because no instruction of the loop is used outside of it, so SCEV for a loop can be limited to the loop. Otherwise, a value may be used wherever and you have to include it in the scope if you are to do SCEV correctly.

Meinersbur added inline comments.Mar 1 2020, 11:42 AM

llvm/docs/LoopTerminology.rst
211–212	In the legacy pass manger, those passes are added as dependency of the loop passes. The Helper function [[ https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopUtils.cpp#L141 \| `getLoopAnalysisUsage` ]] add them them as requirements and should be called by all loop passes in `getAnalysisUsage`. Might be useful to mention that.
248–251	Since I haven't implemented a loop unswitch pass myself, I can't really argue how difficult it it, so I don't request to change it. Interestingly, the source referenced by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr. Maybe you could add the other advantages as well? "this replacement" being loop unswitching ? Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops. Why exactly the distinction ? It's surely fewer values to replace (one in the PHI changes all uses outside the loop). Whether a value is used outside can be checked using `L->contains(Use)` IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally. I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM). The SCEV for inside the loop references the induction variable in an AddRec expression. This is only valid to expand inside the loop. When outside the loop, you need the expression representing the exit value of the loop, or SCEVUnkown when the exit value cannot be computed, e.g. because the loop exit condition is not known.

FYI: Here's the paper by @sebpop explaining loop-closed SSA http://cri.ensmp.fr/classement/doc/E-285.pdf hth.

FYI: Here's the paper by @sebpop explaining loop-closed SSA http://cri.ensmp.fr/classement/doc/E-285.pdf hth.

Thanks! The paper is very interesting but it seems to focus on an "extended" loop-closed SSA form that captures more
info than the single phi nodes (referred as close() nodes in the paper). Most of the reasoning of LCSSA with just phi nodes
is written in the start of part 4.1. I'm not sure if it makes a good argument for LCSSA with phi nodes but to anyone reading
this, please make a note if I miss anything.

llvm/docs/LoopTerminology.rst
248–251	Interestingly, the source referenced by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr. I hadn't seen that, thanks. What is a `SECExpr`? Maybe you could add the other advantages as well? Yes. You mean just the one I guess, about the induction variables. Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops. Oh, I got what you mean. Even if you have LCSSA before the unswitching, after it, you still have to go and update all the uses with the new PHINode that combines the 2 PHIs of the versioned loops. Whether a value is used outside can be checked using `L->contains(Use)` Are you sure? `L->contains()` I think doesn't get `Use`. Given a `Value`, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what replaceLoopInvariantUses() does. IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally. I agree. The SCEV for inside the loop references... Got it, thanks!

Meinersbur added inline comments.Mar 2 2020, 11:34 AM

llvm/docs/LoopTerminology.rst
181–184	Maybe remove the if-statement here, it is specific to loop unswitching
248–251	What is a SECExpr? `ScalarEvolution::SCEVExpr`. Sorry for the typo. Yes. You mean just the one I guess, about the induction variables. Why not all of them? Even if you have LCSSA before the unswitching, after it, you still have to go and update all the uses with the new PHINode that combines the 2 PHIs of the versioned loops. SimpleLoopUnswitch does as you described: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp#L255 for (...) { %val = ... } %lcssa = phi [%val] to if (c) { for (...) { %val1 = ... } %newlcssa1 = phi [%val1] // new PHI } else { for (...) { %val2 = ... } %newlcssa2 = phi [%val2] // new PHI } %lcssa = phi [%newlcssa1, %newlcssa2] // re-use old LCSSA node here, not an LCSSA node any more Are you sure? L->contains() I think doesn't get Use. You'd use it on the instruction that uses the value. Given a Value, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what replaceLoopInvariantUses() does. It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA. `replaceLoopInvariantUses()` uses it to only change something for users inside the loop, i.e. ignore LCSSA PHIs. You can do the same just to find outside users.

baziotis marked 3 inline comments as done.Mar 2 2020, 1:07 PM

baziotis added inline comments.

llvm/docs/LoopTerminology.rst
181–184	Yes, I'll give a simple example, then an example specific to loop unswitching.
211–212	Ok, thanks, I can't really find much info about the pass managers. I'll add that.
248–251	Why not all of them? Here, Zdenek Dvorak mentions 2 advantages I think: 1) Updating it during unrolling/peeling/versioning is trivial, since we do not need to care about the uses outside of the loop. 2) About induction variables I think I have already mentioned 1) with the example of loop unswitching, no? You may mean the other GCC page for LCSSA. Basically, the first 2 there are the same. The third I think is not an advantage. If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node. with the new PHINode that combines the 2 PHIs of the versioned loops. SimpleLoopUnswitch does as you described: Great, now it's clear! It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA. Ok, got it.

Meinersbur added inline comments.Mar 6 2020, 11:20 AM

llvm/docs/LoopTerminology.rst
248–251	You may pick which advantages of LCSSA are worth mentioning. If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node. If one only needs to look up outside users, one can also iterate over all the PHIs in the exit blocks.

baziotis marked an inline comment as done.Mar 6 2020, 11:55 AM

baziotis added inline comments.

llvm/docs/LoopTerminology.rst
248–251	If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node. If one only needs to look up outside users, one can also iterate over all the PHIs in the exit blocks. You added that as another advantage ? Or as a correction to what I said it is the 3rd advantage that GCC mentions?

Added other advantages.
Rephrased Chris's wording a little bit as it was not very clear to me. Also, the deep analysis, the more I thought of it and as @Meinersbur mentioned,

does not seem to be needed any way. We could use RAUW() to achieve this effect (although it would still be more expensive).

2 questions:

Is it "loop closed PHI node" or "loop closing PHI node"?
Does LCSSA pass have to do this deep analysis? Because to me, it seems it can just use RAUW?

In D75233#1932574, @baziotis wrote:

2 questions:

Is it "loop closed PHI node" or "loop closing PHI node"?

Not being an expert of the English language, I;d say "loop-closing PHI node"

Does LCSSA pass have to do this deep analysis? Because to me, it seems it can just use RAUW?

The most complicated thing is finding the dominator nodes to add the PHI nodes in (especially with multiple loop exits).

If possible, could you avoid "deep analysis", but instead refer the kind of analysis needed?

llvm/docs/LoopTerminology.rst
261–264	You you replace "deep analysis" by "data flow analysis"? Also mention this the use-def list is a property of SSA in general. Without mentioning these terms, I had to read this paragraph 3 times before I understood what ou want to say.
266	If you think it's worth mentioning, you could add that it's also easy to find all values that are used outside the loop: iterate over the PHINodes in the exit nodes (instead looking of all use-lists of all instructions in the loop).
285–287	Could you add a sentence introducing SCEV? In this sentence it appears without context.

In D75233#1943883, @Meinersbur wrote:

In D75233#1932574, @baziotis wrote:

2 questions:

Is it "loop closed PHI node" or "loop closing PHI node"?

Not being an expert of the English language, I;d say "loop-closing PHI node"

Does LCSSA pass have to do this deep analysis? Because to me, it seems it can just use RAUW?

The most complicated thing is finding the dominator nodes to add the PHI nodes in (especially with multiple loop exits).

If possible, could you avoid "deep analysis", but instead refer the kind of analysis needed?

Yes. Could you explain the part about dominator nodes ? I understand how dominators help in the identification
of loops and how dominance frontiers help in the placement of normal PHI nodes (i.e. when 2 or more paths converge). But I don't see how dominators
help with the placement of LCSSA phi nodes.

llvm/docs/LoopTerminology.rst
261–264	Yes. Without mentioning these terms, I had to read this paragraph 3 times before I understood what ou want to say. Ok, I'll remove the LLVM-specific part (internal data structure for uses etc.) and I'll focus on the things that we know from SSA anyway (def-use chain).
266	I added it above. It's the first benefit. But I didn't add the "instead" part, I'll add it.
285–287	Yes, I should somehow add it before these 2 paragraphs.

In D75233#1944176, @baziotis wrote:

Yes. Could you explain the part about dominator nodes ? I understand how dominators help in the identification
of loops and how dominance frontiers help in the placement of normal PHI nodes (i.e. when 2 or more paths converge). But I don't see how dominators
help with the placement of LCSSA phi nodes.

It is simple if the loop has just a single exit, but much more involved with multiple exits. Example CFG:

Header:
  %val = ...

Header->{Latch1,Latch2,Latch3}->Header

Header->Latch1->Exit1
Header->Latch2->Exit2
Header->Latch3->Exit3

Exit->Successor1

Successor1->SharedSucc
Exit2->SharedSucc

SharedSucc->CommonSucc
Exit3->CommonSucc

CommonSucc:
  use(%val)

Header dominates all other BBs in this example, hence without LCSSA, this is well-formed. However, with LCSSA we have to insert phis into Exit1,Exit2,Exit3 such that the end result has to look like:

Exit1:
  %lcssa1 = phi [%val, %Latch1]

Exit2:
  %lcssa2 = phi [%val, %Latch2]

Exit3:
  %lcssa3 = phi [%val, %Latch3]

SharedSucc:
  %sharedval = phi [%lcssa1 , %Exit1], [%lcssa2, %Exit2]

CommonSucc:
  %commonval = phi [%lcssa3 , %Exit3], [%sharedval , %SharedSucc]
  use(%commonval)

For computing %sharedval and %commonval, DominanceFrontier is needed again, hence LCSSA is not less complicated than SSA itself.

In D75233#1948305, @Meinersbur wrote:
In D75233#1944176, @baziotis wrote:

Yes. Could you explain the part about dominator nodes ? I understand how dominators help in the identification
of loops and how dominance frontiers help in the placement of normal PHI nodes (i.e. when 2 or more paths converge). But I don't see how dominators
help with the placement of LCSSA phi nodes.

It is simple if the loop has just a single exit, but much more involved with multiple exits. Example CFG:
Header:
  %val = ...

Header->{Latch1,Latch2,Latch3}->Header

Header->Latch1->Exit1
Header->Latch2->Exit2
Header->Latch3->Exit3

Exit->Successor1

Successor1->SharedSucc
Exit2->SharedSucc

SharedSucc->CommonSucc
Exit3->CommonSucc

CommonSucc:
  use(%val)
Header dominates all other BBs in this example, hence without LCSSA, this is well-formed. However, with LCSSA we have to insert phis into Exit1,Exit2,Exit3 such that the end result has to look like:
Exit1:
  %lcssa1 = phi [%val, %Latch1]

Exit2:
  %lcssa2 = phi [%val, %Latch2]

Exit3:
  %lcssa3 = phi [%val, %Latch3]

SharedSucc:
  %sharedval = phi [%lcssa1 , %Exit1], [%lcssa2, %Exit2]

CommonSucc:
  %commonval = phi [%lcssa3 , %Exit3], [%sharedval , %SharedSucc]
  use(%commonval)
For computing %sharedval and %commonval, DominanceFrontier is needed again, hence LCSSA is not less complicated that SSA itself.

Ok, I get it now, thanks! So with multiple exits one has to compute dominator info again.

Various typos.
Added def-use chain info as a footnote since its used twice. Also a footnote about LCSSA construction (I consider it interesting but in a footnote because it may be irrelevant in the main text).
Added some info about SCEV.

Meinersbur added inline comments.Mar 29 2020, 11:35 PM

llvm/docs/LoopTerminology.rst
193	"like so" is spoken English only
298–300	cf "Howver, ..." I don't think this says what you want to say. Suggestion: "In LCSSA form, loop-variant values and and its value after the last iteration of the loop are represented by two different llvm::Instructions. Hence the llvm::Instruction itself disambiguates the context of evaluation such that it is safe to just use getSCEV()."

baziotis marked 2 inline comments as done.Mar 30 2020, 5:43 AM

baziotis added inline comments.

llvm/docs/LoopTerminology.rst
193	Yes sorry. I didn't like the previous: "this ... like this:". I'll replace it with "as follows".
298–300	I agree. But I want to connect it with the more abstract term "expression", because the problem is not LLVM specific. So, I wrote this paragraph multiple times trying to balance this. I'll add a little more to connect the abstract term (expression) with its representation in LLVM (Instruction) and you can tell me what you think.

Addressed SCEV behavior in loop-variant expressions.

LGTM. Thanks for the effort!

Don't forget to add "Differential Revision:" to the commit message.

This revision is now accepted and ready to land.Mar 30 2020, 9:56 PM

In D75233#1951675, @Meinersbur wrote:

LGTM. Thanks for the effort!

No, problem, thanks for the review and the info!

Don't forget to add "Differential Revision:" to the commit message.

Yes. :)

Closed by commit rG229cda968cb2: [LoopTerminology] LCSSA form (authored by baziotis). · Explain WhyMar 31 2020, 6:04 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

docs/

LoopTerminology.rst

155 lines

Passes.rst

9 lines

Diff 253857

llvm/docs/LoopTerminology.rst

	Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines
	When it is successful, the loop has:			When it is successful, the loop has:

	* A preheader.			* A preheader.
	* A single backedge (which implies that there is a single latch).			* A single backedge (which implies that there is a single latch).
	* Dedicated exits. That is, no exit block for the loop			* Dedicated exits. That is, no exit block for the loop
	has a predecessor that is outside the loop. This implies			has a predecessor that is outside the loop. This implies
	that all exit blocks are dominated by the loop header.			that all exit blocks are dominated by the loop header.

				.. _loop-terminology-lcssa:

	Loop Closed SSA (LCSSA)			Loop Closed SSA (LCSSA)
	=======================			=======================

	TBD			A program is in Loop Closed SSA Form if it is in SSA form
				and all values that are defined in a loop are used only inside
				this loop.
				Programs written in LLVM IR are always in SSA form but not necessarily
				in LCSSA. To achieve the latter, single entry PHI nodes are inserted
				at the end of the loops for all values that are live
				across the loop boundary [#lcssa-construction]_.
				In particular, consider the following loop:

				.. code-block:: C

				c = ...;
				for (...) {
				if (c)
				X1 = ...
				else
				MeinersburUnsubmitted Not Done Reply Inline Actions Maybe remove the if-statement here, it is specific to loop unswitching Meinersbur: Maybe remove the if-statement here, it is specific to loop unswitching
				baziotisAuthorUnsubmitted Done Reply Inline Actions Yes, I'll give a simple example, then an example specific to loop unswitching. baziotis: Yes, I'll give a simple example, then an example specific to loop unswitching.
				X2 = ...
				X3 = phi(X1, X2); // X3 defined
				}

				... = X3 + 4; // X3 used, i.e. live
				// outside the loop

				In the inner loop, the X3 is defined inside the loop, but used
				outside of it. In Loop Closed SSA form, this would be represented as follows:
				MeinersburUnsubmitted Not Done Reply Inline Actions "like so" is spoken English only Meinersbur: "like so" is spoken English only
				baziotisAuthorUnsubmitted Done Reply Inline Actions Yes sorry. I didn't like the previous: "this ... like this:". I'll replace it with "as follows". baziotis: Yes sorry. I didn't like the previous: "this ... like this:". I'll replace it with "as follows".

				.. code-block:: C

				c = ...;
				for (...) {
				if (c)
				X1 = ...
				else
				X2 = ...
				X3 = phi(X1, X2);
				}
				X4 = phi(X3);

				... = X4 + 4;

				This is still valid LLVM; the extra phi nodes are purely redundant,
				but all LoopPass'es are required to preserve them.
				This form is ensured by the LCSSA (:ref:`-lcssa <passes-lcssa>`)
				pass and is added automatically by the LoopPassManager when
				MeinersburUnsubmitted Not Done Reply Inline Actions Note that this applies to the new pass manager only (by `FunctionToLoopPassAdaptor`): This does not happen with the legacy (current default) pass manager. Meinersbur: Note that this applies to the new pass manager only (by `FunctionToLoopPassAdaptor`): This does…
				baziotisAuthorUnsubmitted Done Reply Inline Actions Ok, I'm not very familiar with pass managers. So, in the old pass manager, you have to invoke LCSSA automatically or something? baziotis: Ok, I'm not very familiar with pass managers. So, in the old pass manager, you have to invoke…
				MeinersburUnsubmitted Not Done Reply Inline Actions In the legacy pass manger, those passes are added as dependency of the loop passes. The Helper function [[ https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopUtils.cpp#L141 \| `getLoopAnalysisUsage` ]] add them them as requirements and should be called by all loop passes in `getAnalysisUsage`. Might be useful to mention that. Meinersbur: In the legacy pass manger, those passes are added as dependency of the loop passes. The Helper…
				baziotisAuthorUnsubmitted Done Reply Inline Actions Ok, thanks, I can't really find much info about the pass managers. I'll add that. baziotis: Ok, thanks, I can't really find much info about the pass managers. I'll add that.
				scheduling a LoopPass.
				After the loop optimizations are done, these extra phi nodes
				will be deleted by :ref:`-instcombine <passes-instcombine>`.

				The major benefit of this transformation is that it makes many other
				loop optimizations simpler.

				First of all, a simple observation is that if one needs to see all
				the outside users, they can just iterate over all the (loop closing)
				PHI nodes in the exit blocks (the alternative would be to
				scan the def-use chain [#def-use-chain]_ of all instructions in the loop).

				Then, consider for example
				:ref:`-loop-unswitch <passes-loop-unswitch>` ing the loop above.
				Because it is in LCSSA form, we know that any value defined inside of
				the loop will be used either only inside the loop or in a loop closing
				PHI node. In this case, the only loop closing PHI node is X4.
				This means that we can just copy the loop and change the X4
				accordingly, like so:

				.. code-block:: C

				for (...) {
				c = ...;
				if (c) {
				for (...) {
				if (true)
				X1 = ...
				else
				X2 = ...
				X3 = phi(X1, X2);
				}
				} else {
				for (...) {
				if (false)
				X1' = ...
				else
				X2' = ...
				X3' = phi(X1', X2');
				MeinersburUnsubmitted Not Done Reply Inline Actions For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages. LLVM uses a linked linked list to list all the uses of an instruction. There is a convenience function `replaceAllUsesWith` (or short: RAUW) does this replacement. For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock. For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the exit block) as well. I think the primary reason is that loop analysis become more independent (3). Say we perform a loop analysis on all loops in a function that stores reference to `llvm::Value`s used in the loops, such as ScalarEvolution. In a second step, we transform all loops using the analysis' result which does RAUW. Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops. However, if we have loop transformations that transform inner as well as outer loops, we still need to handle the case that transforming the inner (respectively outer) loop may invalidate an analysis on the other. Another reason is that with LCSSA, `ScalarEvolution::getSCEV` is sufficient (2). Otherwise, one needs to use `getSCEVAtScope` to specify whether we are using the SCEV inside or outside. Polly does not require LCSSA, hence uses mostly `getSCEVAtScope`. GCC has documentation about the advantages of LCSSA: https://gcc.gnu.org/onlinedocs/gccint/LCSSA.html . The numbers (1-3) correspond to the bullet list. Meinersbur: For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating…
				baziotisAuthorUnsubmitted Done Reply Inline Actions Thank you for the explanation! Some questions / comments: For the placement of the PHINode, LCSSA needs to do the deep analysis (finding the dominating exit block) as well. LCSSA makes it a two-step process. I don't think it would be complicated to do it in a single step on-demand when unswitching. There are other advantages. That was actually a note from Chris Lattner (I couldn't have thought that haha). What I understood isn't that we can avoid deep analysis completely. Rather, deep analysis will be done for LCSSA. But then, loop optimizations have to preserve LCSSA (which from what I understand should be relatively easy), so when the time comes to do loop unswitching, LCSSA is there and helps you do it quick. There is a convenience function replaceAllUsesWith (or short: RAUW) does this replacement. "this replacement" being loop unswitching ? For loops however, it had to distinguish between inside/outside users (1), but this is no different that when versioning a non-loop BasicBlock Why exactly the distinction ? What I understand is that, when doing RAUW on a basic block or a bunch of basic blocks (or just one), no matter if they're a loop or not, if you know which values are live outside them (e.g. using single entry PHI nodes), you make your life easier. Because you replace the inside uses and those on the PHI nodes on the boundary. Compared to if you didn't have those, where you would have to replace the uses across the whole function. Without LCSSA it might replace values in other loops and invalidate the analysis already applied to the. With LCSSA, only value of the PHINode in the exit block is changed, but the PHINode is the same instance used in other loops Wow, that was very smart. Another reason is that with LCSSA, ScalarEvolution::getSCEV is sufficient (2). Otherwise, one needs to use getSCEVAtScope to specify whether we are using the SCEV inside or outside. I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM). I guess because no instruction of the loop is used outside of it, so SCEV for a loop can be limited to the loop. Otherwise, a value may be used wherever and you have to include it in the scope if you are to do SCEV correctly. baziotis: Thank you for the explanation! Some questions / comments: > For the placement of the PHINode…
				MeinersburUnsubmitted Not Done Reply Inline Actions Since I haven't implemented a loop unswitch pass myself, I can't really argue how difficult it it, so I don't request to change it. Interestingly, the source referenced by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr. Maybe you could add the other advantages as well? "this replacement" being loop unswitching ? Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops. Why exactly the distinction ? It's surely fewer values to replace (one in the PHI changes all uses outside the loop). Whether a value is used outside can be checked using `L->contains(Use)` IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally. I'm not sure I got that completely (I'm not familiar with the interface of SCEV in LLVM). The SCEV for inside the loop references the induction variable in an AddRec expression. This is only valid to expand inside the loop. When outside the loop, you need the expression representing the exit value of the loop, or SCEVUnkown when the exit value cannot be computed, e.g. because the loop exit condition is not known. Meinersbur: Since I haven't implemented a loop unswitch pass myself, I can't really argue how difficult it…
				baziotisAuthorUnsubmitted Done Reply Inline Actions Interestingly, the source referenced by the note just mentions it as "non-trivial update". It also adds another advantage of LCSSA: Induction variable uses in nested loop. Although SCEV should have no problem with, it's just two AddRecExpr in an SECExpr. I hadn't seen that, thanks. What is a `SECExpr`? Maybe you could add the other advantages as well? Yes. You mean just the one I guess, about the induction variables. Replacing uses of an instruction result outside the loop by the PHINode that 'combines' the result of the unswitched loops. Oh, I got what you mean. Even if you have LCSSA before the unswitching, after it, you still have to go and update all the uses with the new PHINode that combines the 2 PHIs of the versioned loops. Whether a value is used outside can be checked using `L->contains(Use)` Are you sure? `L->contains()` I think doesn't get `Use`. Given a `Value`, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what replaceLoopInvariantUses() does. IMHO this does not justify LCSSA which searches and replaces all outside uses with the PHI unconditionally. I agree. The SCEV for inside the loop references... Got it, thanks! baziotis: > Interestingly, the source referenced by the note just mentions it as "non-trivial update". It…
				MeinersburUnsubmitted Not Done Reply Inline Actions What is a SECExpr? `ScalarEvolution::SCEVExpr`. Sorry for the typo. Yes. You mean just the one I guess, about the induction variables. Why not all of them? Even if you have LCSSA before the unswitching, after it, you still have to go and update all the uses with the new PHINode that combines the 2 PHIs of the versioned loops. SimpleLoopUnswitch does as you described: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/SimpleLoopUnswitch.cpp#L255 for (...) { %val = ... } %lcssa = phi [%val] to if (c) { for (...) { %val1 = ... } %newlcssa1 = phi [%val1] // new PHI } else { for (...) { %val2 = ... } %newlcssa2 = phi [%val2] // new PHI } %lcssa = phi [%newlcssa1, %newlcssa2] // re-use old LCSSA node here, not an LCSSA node any more Are you sure? L->contains() I think doesn't get Use. You'd use it on the instruction that uses the value. Given a Value, it seems you have to loop through all the uses, get the User and check if this is inside the loop. Something like what replaceLoopInvariantUses() does. It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA. `replaceLoopInvariantUses()` uses it to only change something for users inside the loop, i.e. ignore LCSSA PHIs. You can do the same just to find outside users. Meinersbur: > What is a SECExpr? `ScalarEvolution::SCEVExpr`. Sorry for the typo. > Yes. You mean just…
				baziotisAuthorUnsubmitted Done Reply Inline Actions Why not all of them? Here, Zdenek Dvorak mentions 2 advantages I think: 1) Updating it during unrolling/peeling/versioning is trivial, since we do not need to care about the uses outside of the loop. 2) About induction variables I think I have already mentioned 1) with the example of loop unswitching, no? You may mean the other GCC page for LCSSA. Basically, the first 2 there are the same. The third I think is not an advantage. If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node. with the new PHINode that combines the 2 PHIs of the versioned loops. SimpleLoopUnswitch does as you described: Great, now it's clear! It is not that much overhead to use L->contains on each use, at least less than maintaining LCSSA. Ok, got it. baziotis: > Why not all of them? [[ https://gcc.gnu.org/ml/gcc-patches/2004-03/msg02212.html \| Here ]]…
				MeinersburUnsubmitted Not Done Reply Inline Actions You may pick which advantages of LCSSA are worth mentioning. If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node. If one only needs to look up outside users, one can also iterate over all the PHIs in the exit blocks. Meinersbur: You may pick which advantages of LCSSA are worth mentioning. > If I understood correctly, it…
				baziotisAuthorUnsubmitted Done Reply Inline Actions If I understood correctly, it basically says that even with LCSSA, we have to go and do the RAUW for the new phi node. If one only needs to look up outside users, one can also iterate over all the PHIs in the exit blocks. You added that as another advantage ? Or as a correction to what I said it is the 3rd advantage that GCC mentions? baziotis: > > If I understood correctly, it basically says that even with LCSSA, we have to go and do the…
				}
				}
				X4 = phi(X3, X3')

				Now, all uses of X4 will get the updated value (in general,
				if a loop is in LCSSA form, in any loop transformation,
				we only need to update the loop closing PHI nodes for the changes
				to take effect). If we did not have Loop Closed SSA form, it means that X3 could
				possibly be used outside the loop. So, we would have to introduce the
				X4 (which is the new X3) and replace all uses of X3 with that.
				However, we should note that because LLVM keeps a def-use chain
				[#def-use-chain]_ for each Value, we wouldn't need
				to perform data-flow analysis to find and replace all the uses
				MeinersburUnsubmitted Not Done Reply Inline Actions You you replace "deep analysis" by "data flow analysis"? Also mention this the use-def list is a property of SSA in general. Without mentioning these terms, I had to read this paragraph 3 times before I understood what ou want to say. Meinersbur: You you replace "deep analysis" by "data flow analysis"? Also mention this the use-def list is…
				baziotisAuthorUnsubmitted Done Reply Inline Actions Yes. Without mentioning these terms, I had to read this paragraph 3 times before I understood what ou want to say. Ok, I'll remove the LLVM-specific part (internal data structure for uses etc.) and I'll focus on the things that we know from SSA anyway (def-use chain). baziotis: Yes. > Without mentioning these terms, I had to read this paragraph 3 times before I…
				(there is even a utility function, replaceAllUsesWith(),
				that performs this transformation by iterating the def-use chain).
				MeinersburUnsubmitted Not Done Reply Inline Actions If you think it's worth mentioning, you could add that it's also easy to find all values that are used outside the loop: iterate over the PHINodes in the exit nodes (instead looking of all use-lists of all instructions in the loop). Meinersbur: If you think it's worth mentioning, you could add that it's also easy to find all values that…
				baziotisAuthorUnsubmitted Done Reply Inline Actions I added it above. It's the first benefit. But I didn't add the "instead" part, I'll add it. baziotis: I added it above. It's the first benefit. But I didn't add the "instead" part, I'll add it.

				Another important advantage is that the behavior of all uses
				of an induction variable is the same. Without this, you need to
				distinguish the case when the variable is used outside of
				the loop it is defined in, for example:

				.. code-block:: C

				for (i = 0; i < 100; i++) {
				for (j = 0; j < 100; j++) {
				k = i + j;
				use(k); // use 1
				}
				use(k); // use 2
				}

				Looking from the outer loop with the normal SSA form, the first use of k
				is not well-behaved, while the second one is an induction variable with
				base 100 and step 1. Although, in practice, and in the LLVM context,
				such cases can be handled effectively by SCEV. Scalar Evolution
				(:ref:`scalar-evolution <passes-scalar-evolution>`) or SCEV, is a
				MeinersburUnsubmitted Not Done Reply Inline Actions Could you add a sentence introducing SCEV? In this sentence it appears without context. Meinersbur: Could you add a sentence introducing SCEV? In this sentence it appears without context.
				baziotisAuthorUnsubmitted Done Reply Inline Actions Yes, I should somehow add it before these 2 paragraphs. baziotis: Yes, I should somehow add it before these 2 paragraphs.
				(analysis) pass that analyzes and categorizes the evolution of scalar
				expressions in loops.

				In general, it's easier to use SCEV in loops that are in LCSSA form.
				The evolution of a scalar (loop-variant) expression that
				SCEV can analyze is, by definition, relative to a loop.
				An expression is represented in LLVM by an
				`llvm::Instruction <https://llvm.org/doxygen/classllvm_1_1Instruction.html>`.
				If the expression is inside two (or more) loops (which can only
				happen if the loops are nested, like in the example above) and you want
				to get an analysis of its evolution (from SCEV),
				you have to also specify relative to what Loop you want it.
				Specifically, you have to use
				MeinersburUnsubmitted Not Done Reply Inline Actions cf "Howver, ..." I don't think this says what you want to say. Suggestion: "In LCSSA form, loop-variant values and and its value after the last iteration of the loop are represented by two different llvm::Instructions. Hence the llvm::Instruction itself disambiguates the context of evaluation such that it is safe to just use getSCEV()." Meinersbur: cf "Howver, ..." I don't think this says what you want to say. Suggestion: "In LCSSA form…
				baziotisAuthorUnsubmitted Done Reply Inline Actions I agree. But I want to connect it with the more abstract term "expression", because the problem is not LLVM specific. So, I wrote this paragraph multiple times trying to balance this. I'll add a little more to connect the abstract term (expression) with its representation in LLVM (Instruction) and you can tell me what you think. baziotis: I agree. But I want to connect it with the more abstract term "expression", because the problem…
				`getSCEVAtScope() <https://llvm.org/doxygen/classllvm_1_1ScalarEvolution.html#a21d6ee82eed29080d911dbb548a8bb68>`_.

				However, if all loops are in LCSSA form, each expression is actually
				represented by two different llvm::Instructions. One inside the loop
				and one outside, which is the loop-closing PHI node and represents
				the value of the expression after the last iteration (effectively,
				we break each loop-variant expression into two expressions and so, every
				expression is at most in one loop). You can now just use
				`getSCEV() <https://llvm.org/doxygen/classllvm_1_1ScalarEvolution.html#a30bd18ac905eacf3601bc6a553a9ff49>`_.
				and which of these two llvm::Instructions you pass to it disambiguates
				the context / scope / relative loop.

				.. rubric:: Footnotes

				.. [#lcssa-construction] To insert these loop-closing PHI nodes, one has to
				(re-)compute dominance frontiers (if the loop has multiple exits).

				.. [#def-use-chain] A property of SSA is that there exists a def-use chain
				for each definition, which is a list of all the uses of this definition.
				LLVM implements this property by keeping a list of all the uses of a Value
				in an internal data structure.

	"More Canonical" Loops			"More Canonical" Loops
	======================			======================

	.. _loop-terminology-loop-rotate:			.. _loop-terminology-loop-rotate:

	Rotated Loops			Rotated Loops
	-------------			-------------
	▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

llvm/docs/Passes.rst

Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines

``-regions``: Detect single entry single exit regions		``-regions``: Detect single entry single exit regions
-----------------------------------------------------		-----------------------------------------------------

The ``RegionInfo`` pass detects single entry single exit regions in a function,		The ``RegionInfo`` pass detects single entry single exit regions in a function,
where a region is defined as any subgraph that is connected to the remaining		where a region is defined as any subgraph that is connected to the remaining
graph at only two spots. Furthermore, a hierarchical region tree is built.		graph at only two spots. Furthermore, a hierarchical region tree is built.

		.. _passes-scalar-evolution:

``-scalar-evolution``: Scalar Evolution Analysis		``-scalar-evolution``: Scalar Evolution Analysis
------------------------------------------------		------------------------------------------------

The ``ScalarEvolution`` analysis can be used to analyze and catagorize scalar		The ``ScalarEvolution`` analysis can be used to analyze and catagorize scalar
expressions in loops. It specializes in recognizing general induction		expressions in loops. It specializes in recognizing general induction
variables, representing them with the abstract and opaque ``SCEV`` class.		variables, representing them with the abstract and opaque ``SCEV`` class.
Given this analysis, trip counts of loops and other important properties can be		Given this analysis, trip counts of loops and other important properties can be
obtained.		obtained.
▲ Show 20 Lines • Show All 364 Lines • ▼ Show 20 Lines	.. code-block:: c++
if () { ...		if () { ...
X = 4;		X = 4;
}		}
if (X < 3) {		if (X < 3) {

In this case, the unconditional branch at the end of the first if can be		In this case, the unconditional branch at the end of the first if can be
revectored to the false side of the second if.		revectored to the false side of the second if.

		.. _passes-lcssa:

``-lcssa``: Loop-Closed SSA Form Pass		``-lcssa``: Loop-Closed SSA Form Pass
-------------------------------------		-------------------------------------

This pass transforms loops by placing phi nodes at the end of the loops for all		This pass transforms loops by placing phi nodes at the end of the loops for all
values that are live across the loop boundary. For example, it turns the left		values that are live across the loop boundary. For example, it turns the left
into the right code:		into the right code:

.. code-block:: c++		.. code-block:: c++

for (...) for (...)		for (...) for (...)
if (c) if (c)		if (c) if (c)
X1 = ... X1 = ...		X1 = ... X1 = ...
else else		else else
X2 = ... X2 = ...		X2 = ... X2 = ...
X3 = phi(X1, X2) X3 = phi(X1, X2)		X3 = phi(X1, X2) X3 = phi(X1, X2)
... = X3 + 4 X4 = phi(X3)		... = X3 + 4 X4 = phi(X3)
... = X4 + 4		... = X4 + 4

This is still valid LLVM; the extra phi nodes are purely redundant, and will be		This is still valid LLVM; the extra phi nodes are purely redundant, and will be
trivially eliminated by ``InstCombine``. The major benefit of this		trivially eliminated by ``InstCombine``. The major benefit of this
transformation is that it makes many other loop optimizations, such as		transformation is that it makes many other loop optimizations, such as
``LoopUnswitch``\ ing, simpler.		``LoopUnswitch``\ ing, simpler. You can read more in the
		:ref:`loop terminology section for the LCSSA form <loop-terminology-lcssa>`.

.. _passes-licm:		.. _passes-licm:

``-licm``: Loop Invariant Code Motion		``-licm``: Loop Invariant Code Motion
-------------------------------------		-------------------------------------

This pass performs loop invariant code motion, attempting to remove as much		This pass performs loop invariant code motion, attempting to remove as much
code from the body of a loop as possible. It does this by either hoisting code		code from the body of a loop as possible. It does this by either hoisting code
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	for j.. for j..
code(i+3, j)		code(i+3, j)
remainder loop		remainder loop

Which can be seen as unrolling the outer loop and "jamming" (fusing) the inner		Which can be seen as unrolling the outer loop and "jamming" (fusing) the inner
loops into one. When variables or loads can be shared in the new inner loop, this		loops into one. When variables or loads can be shared in the new inner loop, this
can lead to significant performance improvements. It uses		can lead to significant performance improvements. It uses
:ref:`Dependence Analysis <passes-da>` for proving the transformations are safe.		:ref:`Dependence Analysis <passes-da>` for proving the transformations are safe.

		.. _passes-loop-unswitch:

``-loop-unswitch``: Unswitch loops		``-loop-unswitch``: Unswitch loops
----------------------------------		----------------------------------

This pass transforms loops that contain branches on loop-invariant conditions		This pass transforms loops that contain branches on loop-invariant conditions
to have multiple loops. For example, it turns the left into the right code:		to have multiple loops. For example, it turns the left into the right code:

.. code-block:: c++		.. code-block:: c++

▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopTerminology] LCSSA FormClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 253857

llvm/docs/LoopTerminology.rst

llvm/docs/Passes.rst

[LoopTerminology] LCSSA Form
ClosedPublic