diff --git a/llvm/docs/LoopTerminology.rst b/llvm/docs/LoopTerminology.rst --- a/llvm/docs/LoopTerminology.rst +++ b/llvm/docs/LoopTerminology.rst @@ -161,11 +161,164 @@ has a predecessor that is outside the loop. This implies that all exit blocks are dominated by the loop header. +.. _loop-terminology-lcssa: Loop Closed SSA (LCSSA) ======================= -TBD +A program is in Loop Closed SSA Form if it is in SSA form +and all values that are defined in a loop are used only inside +this loop. +Programs written in LLVM IR are always in SSA form but not necessarily +in LCSSA. To achieve the latter, single entry PHI nodes are inserted +at the end of the loops for all values that are live +across the loop boundary [#lcssa-construction]_. +In particular, consider the following loop: + +.. code-block:: C + + c = ...; + for (...) { + if (c) + X1 = ... + else + X2 = ... + X3 = phi(X1, X2); // X3 defined + } + + ... = X3 + 4; // X3 used, i.e. live + // outside the loop + +In the inner loop, the X3 is defined inside the loop, but used +outside of it. In Loop Closed SSA form, this would be represented as follows: + +.. code-block:: C + + c = ...; + for (...) { + if (c) + X1 = ... + else + X2 = ... + X3 = phi(X1, X2); + } + X4 = phi(X3); + + ... = X4 + 4; + +This is still valid LLVM; the extra phi nodes are purely redundant, +but all LoopPass'es are required to preserve them. +This form is ensured by the LCSSA (:ref:`-lcssa `) +pass and is added automatically by the LoopPassManager when +scheduling a LoopPass. +After the loop optimizations are done, these extra phi nodes +will be deleted by :ref:`-instcombine `. + +The major benefit of this transformation is that it makes many other +loop optimizations simpler. + +First of all, a simple observation is that if one needs to see all +the outside users, they can just iterate over all the (loop closing) +PHI nodes in the exit blocks (the alternative would be to +scan the def-use chain [#def-use-chain]_ of all instructions in the loop). + +Then, consider for example +:ref:`-loop-unswitch ` ing the loop above. +Because it is in LCSSA form, we know that any value defined inside of +the loop will be used either only inside the loop or in a loop closing +PHI node. In this case, the only loop closing PHI node is X4. +This means that we can just copy the loop and change the X4 +accordingly, like so: + +.. code-block:: C + + for (...) { + c = ...; + if (c) { + for (...) { + if (true) + X1 = ... + else + X2 = ... + X3 = phi(X1, X2); + } + } else { + for (...) { + if (false) + X1' = ... + else + X2' = ... + X3' = phi(X1', X2'); + } + } + X4 = phi(X3, X3') + +Now, all uses of X4 will get the updated value (in general, +if a loop is in LCSSA form, in any loop transformation, +we only need to update the loop closing PHI nodes for the changes +to take effect). If we did not have Loop Closed SSA form, it means that X3 could +possibly be used outside the loop. So, we would have to introduce the +X4 (which is the new X3) and replace all uses of X3 with that. +However, we should note that because LLVM keeps a def-use chain +[#def-use-chain]_ for each Value, we wouldn't need +to perform data-flow analysis to find and replace all the uses +(there is even a utility function, replaceAllUsesWith(), +that performs this transformation by iterating the def-use chain). + +Another important advantage is that the behavior of all uses +of an induction variable is the same. Without this, you need to +distinguish the case when the variable is used outside of +the loop it is defined in, for example: + +.. code-block:: C + + for (i = 0; i < 100; i++) { + for (j = 0; j < 100; j++) { + k = i + j; + use(k); // use 1 + } + use(k); // use 2 + } + +Looking from the outer loop with the normal SSA form, the first use of k +is not well-behaved, while the second one is an induction variable with +base 100 and step 1. Although, in practice, and in the LLVM context, +such cases can be handled effectively by SCEV. Scalar Evolution +(:ref:`scalar-evolution `) or SCEV, is a +(analysis) pass that analyzes and categorizes the evolution of scalar +expressions in loops. + +In general, it's easier to use SCEV in loops that are in LCSSA form. +The evolution of a scalar (loop-variant) expression that +SCEV can analyze is, by definition, relative to a loop. +An expression is represented in LLVM by an +`llvm::Instruction `. +If the expression is inside two (or more) loops (which can only +happen if the loops are nested, like in the example above) and you want +to get an analysis of its evolution (from SCEV), +you have to also specify relative to what Loop you want it. +Specifically, you have to use +`getSCEVAtScope() `_. + +However, if all loops are in LCSSA form, each expression is actually +represented by two different llvm::Instructions. One inside the loop +and one outside, which is the loop-closing PHI node and represents +the value of the expression after the last iteration (effectively, +we break each loop-variant expression into two expressions and so, every +expression is at most in one loop). You can now just use +`getSCEV() `_. +and which of these two llvm::Instructions you pass to it disambiguates +the context / scope / relative loop. + +.. rubric:: Footnotes + +.. [#lcssa-construction] To insert these loop-closing PHI nodes, one has to + (re-)compute dominance frontiers (if the loop has multiple exits). + +.. [#def-use-chain] A property of SSA is that there exists a def-use chain + for each definition, which is a list of all the uses of this definition. + LLVM implements this property by keeping a list of all the uses of a Value + in an internal data structure. "More Canonical" Loops ====================== diff --git a/llvm/docs/Passes.rst b/llvm/docs/Passes.rst --- a/llvm/docs/Passes.rst +++ b/llvm/docs/Passes.rst @@ -331,6 +331,8 @@ where a region is defined as any subgraph that is connected to the remaining graph at only two spots. Furthermore, a hierarchical region tree is built. +.. _passes-scalar-evolution: + ``-scalar-evolution``: Scalar Evolution Analysis ------------------------------------------------ @@ -711,6 +713,8 @@ In this case, the unconditional branch at the end of the first if can be revectored to the false side of the second if. +.. _passes-lcssa: + ``-lcssa``: Loop-Closed SSA Form Pass ------------------------------------- @@ -732,7 +736,8 @@ This is still valid LLVM; the extra phi nodes are purely redundant, and will be trivially eliminated by ``InstCombine``. The major benefit of this transformation is that it makes many other loop optimizations, such as -``LoopUnswitch``\ ing, simpler. +``LoopUnswitch``\ ing, simpler. You can read more in the +:ref:`loop terminology section for the LCSSA form `. .. _passes-licm: @@ -864,6 +869,8 @@ can lead to significant performance improvements. It uses :ref:`Dependence Analysis ` for proving the transformations are safe. +.. _passes-loop-unswitch: + ``-loop-unswitch``: Unswitch loops ----------------------------------