This is an archive of the discontinued LLVM Phabricator instance.

[LCSSA] Doc for special treatment of PHIs
ClosedPublic

Authored by baziotis on Oct 19 2020, 2:12 PM.

Details

Summary

Add comment and doc that the point of use in PHI nodes is special.

Diff Detail

Event Timeline

baziotis created this revision.Oct 19 2020, 2:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2020, 2:12 PM
baziotis requested review of this revision.Oct 19 2020, 2:12 PM
baziotis added inline comments.
llvm/lib/Transforms/Utils/LCSSA.cpp
177–178

Ah, I just saw that. Do you want me to put it in a separate diff ?

The additional test might be useful though.

llvm/lib/Transforms/Utils/LCSSA.cpp
128–129

This is not specific to LCSSA, but a general rule: The point of use of an incoming value is its incoming block. That is, the definition (the I) must dominate UserBB (and not be basic block where the PHI is located). As a consequence, a use of the PHI in the exit block of the loop is inside the loop.

This becomes more obvious by MLIR's basic block arguments: the basic block that branches to the destination specifies the arguments for the "PHI" basic block parameters.

This additional comment makes it more complicated and special than it is.

177–178

What does the comment addition intend to explain? Technically, not the ExitBB must be dominated by I, but its predecessors.

214–220

Same comment as before applies.

baziotis added inline comments.Oct 20 2020, 3:25 AM
llvm/lib/Transforms/Utils/LCSSA.cpp
128–129

Most of what you said arises, IIUC, from the dominance property of SSA and makes sense. In particular:
"If x is used in the ith entry of a PHI in a block B, then the ith predecessor of B should dominate B" (I have seen this being
stated as every predecessor of B..., which complicates the understanding).

But I didn't understand that this implies: "The point of use of an incoming value is its incoming block."
I have never come across this anywhere and now I specifically searched for it and I couldn't find it. Is there any reference ?
If someone doesn't know about that, such code is confusing.
I'd also like to add a small mention in the LCSSA doc.

177–178

True, but it's implied from the fact that I dominates ExitBB (which we know is true at this point because we can check above
and see that ExitBB's are picked based on that).
This comment intends to explain "why can I add from every predecessor?" or "why is I defined in all the predecessors?". That's not
as tricky as the other one, I could just remove it.

llvm/lib/Transforms/Utils/LCSSA.cpp
128–129

But I didn't understand that this implies: "The point of use of an incoming value is its incoming block."

My statement is somewhat incorrect, the "use" of a value by a PHI is its incoming edge:

https://llvm.org/docs/LangRef.html#phi-instruction

For the purposes of the SSA form, the use of each incoming value is deemed to occur on the edge from the corresponding predecessor block to the current block.

For practical purposes (e.g. to check dominance requirements), it is easier to view the incoming block as where the use occurs, as edges do not contain code itself.

This is a common technique, not just for LCSSA, such as in:
https://github.com/llvm/llvm-project/blob/c565f09f4b0d908f51aaf4a841285f39ef93bc8c/llvm/lib/Analysis/LoopInfo.cpp#L434
https://github.com/llvm/llvm-project/blob/c565f09f4b0d908f51aaf4a841285f39ef93bc8c/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1173
https://github.com/llvm/llvm-project/blob/4aa97e3dacf3bdf5636fbf89dd8c64f1e4648065/polly/lib/Support/ScopHelper.cpp#L665

177–178

To be useful, the comment should elaborate more on that.

That ExitBB is dominated by I is checked at the beginning of the loop (line #166). This implies that every incoming block/edge is dominated by I as well, i.e. we can add uses of I to those incoming edges/append to the the incoming blocks without violating the SSA dominance requirement.

I think you want and answer to the "This implies ..." part?

Some resources that might help:

Thanks, I totally missed that it is in the langref. Off-topic: In the cases targeted by that diff / fix to the verifier, I think it is also interesting to consider that PHI's are evaluated in parallel upon entry to a BB.

llvm/lib/Transforms/Utils/LCSSA.cpp
128–129

Thanks for the feedback, so this is not something imposed by SSA, just something that happens to be useful. Good.
I think it's better to shorten the comment and I'll move some of it in the LCSSA doc.

177–178

I'm not sure whether it should be that verbose to be useful (e.g. I had never read LCSSA and that part was quite clear when I realized that I dominates ExitBB). No problem though, I'm adding it.

baziotis updated this revision to Diff 299601.Oct 21 2020, 1:58 AM
baziotis edited the summary of this revision. (Show Details)

Shortened comments and moved a more detailed explanation in loop terminology/LCSSA doc.

Meinersbur added inline comments.Oct 26 2020, 10:41 AM
llvm/docs/LoopTerminology.rst
175 ↗(On Diff #299601)

specifically at its exit blocks

This wording is somewhat imprecise (where is 'at'? an exit block is after the loop), Suggestion: "PHI nodes with just a single incoming value/block are added into each of the loop's exit blocks ..."

223–225 ↗(On Diff #299601)

The edge is clearly outside the loop, since when the control-flow uses it, it will not be looping anymore.

Treating the exiting block as user moves the use into the loop. For the purpose of LCSSA, this is exactly be what we want, since ensures all values that are defined in a loop are also only used in the loop. Could you move that part to the following paragraph to separate its from the strict formalism?

llvm/lib/Transforms/Utils/LCSSA.cpp
181

[grammar] "the the"

baziotis added inline comments.Oct 26 2020, 12:50 PM
llvm/docs/LoopTerminology.rst
175 ↗(On Diff #299601)

Ok, let's remove "at the end of the loop". FWIW, I tried to avoid details here because an exit block can have multiple predecessors and some of them might be outside the loop. And this is still the start of the explanation.

223–225 ↗(On Diff #299601)

Hmm, it seems to me that this will throw people off. It is easy to think "if the edge is considered outside how do you consider the use inside?" Let's assume that we answer that with "it works for LCSSA's purpose". But then "this is used all over LLVM, not just LCSSA". The whole reasoning seems flawed and like a hack. I'll retry anyway.

(Note that this part of the paragraph tries to answer what the start of the paragraph questions; if I move it, I'll have to rewrite all of it, which I'll do).

baziotis updated this revision to Diff 300775.Oct 26 2020, 1:12 PM
baziotis marked an inline comment as done.

Addressed comments.

baziotis added inline comments.Oct 26 2020, 1:16 PM
llvm/docs/LoopTerminology.rst
457 ↗(On Diff #300775)

Hopefully the combination of this footnote with the last changes is a good result. First of all, it's not the place of the LCSSA doc to explain why the rest of LLVM uses this convention, yet I think this footnote adds value. Second, now the text above I think is clear in what happens with the usages in PHIs: both definition is referenced and convention is explained. Please let me know what you think.

461–465 ↗(On Diff #300775)

This comment about liveness helps me very much understand this convention, I hope I have understood it's on point.

Meinersbur added inline comments.Oct 27 2020, 3:08 PM
llvm/docs/LoopTerminology.rst
297 ↗(On Diff #300775)

verb before "single" missing?

463–464 ↗(On Diff #300775)

Mmmh, more precisely, the dominance frontier will not allow uses of the value past it, unless it is passed on by a PHI node. Mem2Reg tries to minimize the number of PHIs, only adding them at dominance frontiers, but is not strictly required to (I am not sure whether the current implementation does). With LCSSA, we intentionally add PHIs even though we are not crossing a dominance frontier.

I.e. it's not the PHI that kills a value and it is structurally valid (although optimizable) to use PHI as well as the incoming value as long as the dominance requirement is fulfilled. That is, both lifetimes may overlap (however if crossing a loop, that would obviously not be LCSSA normal form anymore).

223–225 ↗(On Diff #299601)

Note that treating the use to occur in the incoming block just "overapproximates" the number of uses. The incoming block could branch to a different block. In that case the incoming value hasn't been uses, but there is no side-effect(*).

(*) Theoretically it could extend the lifetime of that value, but we do not compute lifetimes in LCSSA.

baziotis added inline comments.Oct 28 2020, 3:27 AM
llvm/docs/LoopTerminology.rst
297 ↗(On Diff #300775)

Yeah thanks. Actually the best place is probably after "PHI nodes" i.e. "PHI nodes are inserted"

463–464 ↗(On Diff #300775)

Yes, I totally agree! Yes formally one is allowed to re-use the value if it dominates the block, but if e.g. a human was hand-writing the
code, they wouldn't probably use both values: Either they'd use the value because it dominates or they'd need to put a PHI (which
would kill in their head the previous value). Even if they put an artificial PHI for some reason (e.g. LCSSA), they would still probably
not use the previous value, but instead the PHI.

Bottom-line, intuitively, the human would consider the value "live" and / or "used" up to the predecessor block, which is why IMHO this analogy is helpful for intuition.

That said, if you like the analogy, I still have to rewrite that somehow, on the one hand avoiding much formal details (because the goal of this is intuition) OTOH without writing incorrect things since this is now incorrect. How does that sound?

223–225 ↗(On Diff #299601)

Oh that's another very helpful way to think about this, thanks!

Meinersbur added inline comments.Oct 28 2020, 11:04 AM
llvm/docs/LoopTerminology.rst
463–464 ↗(On Diff #300775)

The analogy is ok, when made clear that this is not a strict requirement, but just a goal/intention/idealization.

baziotis updated this revision to Diff 301381.Oct 28 2020, 12:20 PM

That might be too big of a footnote and it might even be more suitable to be added in the phi doc in the LangRef. Nevertheless, it feels complete and valuable now :)

Meinersbur accepted this revision.Oct 29 2020, 12:51 PM

LGTM

llvm/docs/LoopTerminology.rst
461 ↗(On Diff #301381)

[typo] incombing

This revision is now accepted and ready to land.Oct 29 2020, 12:51 PM
baziotis updated this revision to Diff 301728.Oct 29 2020, 1:26 PM

Small fixes

baziotis updated this revision to Diff 301734.Oct 29 2020, 1:47 PM

Remove unnecessary text

This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.Nov 2 2020, 2:57 PM