This is an archive of the discontinued LLVM Phabricator instance.

[docs] Revise loop terminology reference.
ClosedPublic

Authored by Meinersbur on Sep 28 2020, 5:25 AM.

Details

Summary

Motivated by D88183, this seeks to clarify the current loop nomenclature with added illustrations, examples for possibly unexpected situations (infinite loops not part of the "parent" loop, logical loops sharing the same header, ...), and clarification on what other sources may consider a loop. The current document also has multiple errors that are fixed here.

Some selected errors:

  • Loops a defined as strongly-connected components. A component a partition of all nodes, i.e. a subloop can never be a component. That is, the document as it currently is only covers top-level loops, even it also uses the term SCC for subloops.
  • "a block can be the header of two separate loops at the same time" (it is considered a single loop by LoopInfo)
  • "execute before some interesting event happens" (some interesting event is not well-defined)

Diff Detail

Event Timeline

Meinersbur created this revision.Sep 28 2020, 5:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2020, 5:25 AM
Meinersbur requested review of this revision.Sep 28 2020, 5:25 AM

Thanks for taking the time to write this! I think it will help, especially beginners. I remember when I first read the loop terminology, some things were not obvious that are cleared here (e.g. why do people use backedge taken count).

llvm/docs/LoopTerminology.rst
35

[nit] "a an" -> "is an"

51–91

[nit] differeny -> different

61

This is a bit unexpected here. The reader might not know what the LoopInfo is or especially, its printing representation. That said, the reader can also skip it. We might want to add a reference to LoopInfo doc here.

70

[nit] "or nested inside the other" -> "or one is nested inside the other"
[nit] "Only only" -> "only"

121

[nit] "os" -> "is"

125

Could you please add a reference to the LoopSimplify doc below?

130

Do you think that adding some info on where the term irreducible comes from would be beneficial?

Something like:
Irreducible control-flow is the opposite of reducible control-flow.
The term reducible results from several kinds of transformations that
can be applied to (control-)flow (sub-)graphs that collapse sub-graphs into single nodes and, hence,
"reduce" the CFG successively to simpler graphs, with a CFG considered
to be reducible if applying a sequence of such transformations ultimately reduces
it to a single node.

And maybe a reference if the reader is interested to learn more.

138

[nit] "unrechable" -> "unreachable"

141

[nit] "ot" -> "not"

I think this example is a little bit confusing. First of all, a branch is an instruction and a loop contains basic blocks. You may mean / like this: Any BB that contains a branch that *never* (in bold to show the difference with exiting blocks, maybe even mention it in parens) leads back to the loop (or, all its successors are outside of the loop), is not considered part of the loop.

158

[nit] "builtin_unrechable" -> "builtin_unreachable"

191

"i.e. an infinite loop."

You may like this better: "i.e. a loop can be infinite"

198

"when the exiting condition is false."

You may like this better: "when it can prove that the exiting condition is always false".

Also "Because the edge" -> "Because the exiting edge"

200

[nit] "infinte" -> "infinite"

203

I think that the reference to the loop guard here is out of the blue it raises questions. It may be better to just leave the loop rotate part reference it.

Also, we might want to add "loop trip count (also called iteration count)".

208

[nit] "immediatly" -> "immediately"

Meinersbur marked 13 inline comments as done.
Meinersbur edited the summary of this revision. (Show Details)
  • Address @baziotis comments
  • Lines breaks
  • Further fixes
  • Undefined when not reachable

What do you think of the revision? Does it clarify things? Is it more understandable?

llvm/docs/LoopTerminology.rst
130

I think I condensed you wording while still saying more. Please check whether it is still understandable.

141

Just "never" could also be "dynamically never" and still be part of the loop.

203

The existence of a loop guard is independent of the loop-rotated normal form, the same way a preheader may exist before LoopSimplify.

What do you think of the revision? Does it clarify things? Is it more understandable?

I think it is well-explained yes :) I'm trying to read it with a (more than my current) beginner knowledge and it makes sense. I added some more comments, I hope they help.

I'll also try to render this. Basically, I liked this list of terms in bold in the previous loop terminology. It just made it very easy to locate things quickly. But I think it should be as easy when they're placed
inside the text but still in bold (also as I found out, one needs to read them only a couple of times just to get acquainted, so reading them in a coherent textbook-style explanation may be better).

llvm/docs/LoopTerminology.rst
54

[nit] I suggest "is called" instead of just "is" (the reader might think "what is an exiting / exit block ?" while this makes clear that what they're reading is the definition).

130

I think it is. Because reducibility is more of a sidenote in this doc, as long as the text conveys that "reducibility is not something that you're supposed to somehow know nor is it essential to learn a lot more about it" it's good. And I think the text conveys that.

141

Right, my bad. Your updated wording is better.

203

Yes, but does LoopInfo recognize it if it's not from LoopRotate ? e.g. if I write if (cond) { do { ... } while (cond); } in the source, will it recognize that it has a guard ? I thought that for the compiler, a guard is something only the compiler inserts while with the above wording, a guard can also exist already.

I think I explained badly my confusion when I read this: I think it's not clear what is a loop guard, who inserts it (if anyone) and when (partly this resulted from the use of "should not" and "must" above). Maybe something like that?
A guard is any block that dominates the header (it can be the header) and guards the body of the loop from not being entered if the loop condition is false before the first iteration. The compiler may insert such a guard after some transformation to ensure correctness (e.g. LoopRotate).

The wording could be better but please tell me what you think.

211

[nit] "that does not anything" -> "that does not do anything" (?)

212

This is one of the best explanations for the removal of an infinite loop :)

This paragraph may raise questions though. e.g. what is the formal reasoning ? (forward progress guarantee). I think it's ok to mention here that the optimizer can do things like that without further formalities. But maybe also link to another place that explains the formalities if anyone's interested (and I don't pretend to have a good place to link for LLVM IR. Anything I know, which is not much, I've seen it on llvm-dev here and there).

Meinersbur marked 6 inline comments as done.
  • Address further comments by @baxiotis

I think the document is more useful as a reference than a text-book reading.

llvm/docs/LoopTerminology.rst
203

LoopRotation does not recognize guards. Guards being created is just the typical sideeffect of moving the first execution of the loop header (which in a typical for-loop contains the loop condition) before the loop, thus creating a guard. There could also be multiple "guards" skipping the loop checking for different conditions.

I used introduced the loop guard here to illustrate that a loop trip count of 1 does not mean the the loop body (in the sense of the input language) has been executed, and how that is avoided (possibly also by frontend IR generators), and to show and name a pattern one often finds in the wild.

A loop guard cannot be the header, the point of the guard is to skip the loop. The loop is not skipped if one first has to enter the loop to check the guard condition.
For the point of optimization, the advantage of not entering the loop at all is stronger assumptions for the preheader (e.g. more things can be hoisted into the preheader), but I would prefer to not go into optimization details here; there is a link to the LoopRotation section. I did not even explain what the preheader use useful for.

212

Indeed I'd prefer to not spat out all the details here. A good place to link to instead would be D65718, but is has been abandoned. There is still a discussion about this on the mailing-list and I don't want to introduce these into the fundamentals document.

Not even the LangRef about llvm.sideeffect is more detailed! That's where I'd expect those clarifications.

baziotis accepted this revision.Sep 29 2020, 4:19 AM

I think the document is more useful as a reference than a text-book reading.

Yes, that's why the item list was very useful IMO and I think this still is as useful.

llvm/docs/LoopTerminology.rst
203

Thanks for the explanation, let's leave "why insert a loop guard ?" out. How about making a little bit more explicit what is a loop guard (btw I realized that I too don't make it explicit in LoopRotation) ? With the current wording, the reader may wonder "what is a loop guard?". For example:
"... must skip the entire loop. A loop guard is a block that dominates the header, it is not part of the loop, and tests the loop condition before the first iteration in order to potentially skip the loop entirely".

LoopRotation does not recognize guards.

Yes, I asked if LoopInfo recognizes them. Mainly my question is: Does the compiler realize if a loop guard is already there
or only because it inserts one ? (e.g. as part of LoopRotation) But anyway, feel free to ignore this.

This revision is now accepted and ready to land.Sep 29 2020, 4:19 AM

Btw, do you think it's good to desribe what is a critical edge in this doc ? I think it's mentioned quite often in anything loop-related.

Btw, do you think it's good to describe what is a critical edge in this doc ? I think it's mentioned quite often in anything loop-related.

It would be part of the LoopSimplify normal form, ensured by the preheader and dedicated exit blocks. Otherwise, it's more a concept for CFGs than loops.

llvm/docs/LoopTerminology.rst
203

Yes, I asked if LoopInfo recognizes them.

Sorry, seems I misread the question. A block can be a loop guard even without LoopRotate. For instance, the following C code might be recognized as a loop guard:

if (n > 0) {
  do { // The loop being guarded
    ...
  } while (...)
} else {
  // Loop-skipping edge, no code here
}

(I have no checked whether it actually does, will require some normalization such a -simplifycfg before to remove the empty else-block)

It is also possible that LoopRotate creates a block that looks like a loop guard, but is not recognized by Loop::getLoopGuardBranch(). Its definition is intentionally strict. We don't have a formal definition of a loop guard, so getLoopGuardBranch only returns something for patterns we definitely can agree is a loop guard. As mentioned, one might also allow multiple loop guards to check different conditions. Is it possible to have any code executed when skipping the loop? For instance, that code might need to compute the the return value for the zero iterations case, but we must not allow any side-effect. Maybe we allow arbitrary code in the loop guard skip branch (even other loops?), but then what is the concept useful for? The comment on Loop::getLoopGuardBranch() itself mentions its definition might change in the future.

For this reason I would prefer not to give a normative definition of a loop guard here.

It would be part of the LoopSimplify normal form, ensured by the preheader and dedicated exit blocks. Otherwise, it's more a concept for CFGs than loops.

Ok, good.

llvm/docs/LoopTerminology.rst
203

Oh ok, thanks for the detailed explanation, it clears things up. I looked up Loop::getLoopGuardBranch(), which is what does the recognition (and not LoopInfo).

It's ok from me then as it is.

Whitney accepted this revision.Sep 29 2020, 9:35 AM

Thanks for writing this up.

This revision was automatically updated to reflect the committed changes.