Page MenuHomePhabricator

Define some basic terminology around loops in our documentation
ClosedPublic

Authored by reames on Jul 23 2019, 1:23 PM.

Details

Summary

I've noticed a lot of confusion around this area recently with key terms being misused in a number of threads. To help reign that in, let's go ahead and document the current terminology and meaning thereof.

Note that I'm deliberately only documenting current meaning. If we wish to change or extend at a later point, we can do so then. It is explicitly out of scope for this review.

My hope is to grow this over time into a broader discussion of canonical loop forms - yes, there are more than one ... many more than one - but for the moment, simply having the key terminology is a good stopping place.

Diff Detail

Repository
rL LLVM

Event Timeline

reames created this revision.Jul 23 2019, 1:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2019, 1:23 PM
fhahn added a reviewer: fhahn.Jul 23 2019, 1:32 PM
fhahn added a subscriber: fhahn.

Thanks for writing this up!

docs/LoopTerminology.rst
91 ↗(On Diff #211351)

LoopSimplify Form and LCSSA are somewhat documented https://llvm.org/docs/Passes.html#loop-simplify-canonicalize-natural-loops and https://llvm.org/docs/Passes.html#lcssa-loop-closed-ssa-form-pass

Maybe we should move the description here and link it from Passes.rst to avoid those getting out-of-sync?

p.s. I can't appear to build the docs any long in my environment. Can someone with a working docs build check for syntax errors?

reames marked an inline comment as done.Jul 23 2019, 2:28 PM
reames added inline comments.
docs/LoopTerminology.rst
91 ↗(On Diff #211351)

Sounds like a great idea. Once this lands, I'll do that move and link. We definitely need to expand those descriptions quite a bit though.

Some comments, clarifications, and proposed additions.

Thanks for doing this!

docs/LoopTerminology.rst
23 ↗(On Diff #211351)

Could we add something like:

commonly referred to as irreducible control flow, or irreducible loops.

35 ↗(On Diff #211351)
  • The loop header identifies a loop.
  • Two loops are either disjoint or one is properly contained in the other.
  • LoopInfo organizes loops in a tree structure with an artificial top-level loop in each function that contains all loops not contained in other loops.
48 ↗(On Diff #211351)

"The basic block", not "a".

63 ↗(On Diff #211351)

Let's make "the source of a backedge" a sentence:

Thus, q latch block is the source of a backedge.

81 ↗(On Diff #211351)

Why "interesting event happens"?

The backedge*s* are traversed *without leaving the loop*

89 ↗(On Diff #211351)

Again, we should mention "executed before the loop is left".

90 ↗(On Diff #211351)

A basic block can be a exiting block and a latch block at the same time.
A basic block can also be loop predecessor and a exit block at the same time.

Irreducible Control Flow - Also known as irreducible loops, are non-loop cycle in the CFG. They differ from the LLVM definition of loops because they do not have a header block, thus no block in the cycle dominates all blocks in the cycle. Warning: LoopInfo does not identify irreducible control flow which means that even in a non-recursive function without loops blocks can be executed multiple times.

Thanks for taking the initiative.

docs/LoopTerminology.rst
23 ↗(On Diff #211351)

Also, such cycles are called loops in any literature that I know. It's just that LoopInfo does not create a loop object for them since they do not have a dominating header.

25–26 ↗(On Diff #211351)

The concept of a backedge should have been introduced for this. A cycle inside a loop can either be

  • A backedge
  • A backedge of a nested loop
  • An irreducible loop
83–84 ↗(On Diff #211351)

Non-rotated loops often have headers that only check the loop condition. The header executed, but if the condition evaluated to false, arguably nothing interesting happened, in particular the source language's loop body did not execute.

I think this is really good - thanks for taking the time to start this!
Most of my comments are minor/grammatical.

If we do have access to other markups, specifically for different typesettings (bold, italics, etc.) I will suggest some more (cosmetic) changes.

docs/LoopTerminology.rst
17 ↗(On Diff #211351)

which -> that

29 ↗(On Diff #211351)

which -> that

30 ↗(On Diff #211351)

Question: will the asterisks around must cause it to be typeset as bold in the documentation?
If so, are there other typesets available to use (e.g., italics and underline)?

46 ↗(On Diff #211351)

which -> that

50 ↗(On Diff #211351)

which -> that

53 ↗(On Diff #211351)

which -> that

56 ↗(On Diff #211351)

which -> that

57 ↗(On Diff #211351)

which -> that

58 ↗(On Diff #211351)

Would a short example, instead of "That is, ..." be useful here in these definitions? If so, I can try to come up with some.

71 ↗(On Diff #211351)

I find this sentence confusing. Is this the same as: the predecessor block of the loop header that is not contained within the loop?

I don't know that we've defined the concept of "contained within the loop" at this point, so this might not be any clearer in the long run. But, is that the general idea here?

74 ↗(On Diff #211351)

which -> that

79 ↗(On Diff #211351)

I think it's better to write this in the present tense, instead of past tense (but I don't know why I think that).

"The number of times the backedge will execute before some interesting event happens. "

83 ↗(On Diff #211351)

Same comment as above: "The number of times the header will execute before some interesting event happens."

84 ↗(On Diff #211351)

Can you expand w/o to without?

89 ↗(On Diff #211351)

I think the last sentence is very important, and should not be a parenthetical. It would also be good to make Warning bold, if that's possible.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 24 2019, 4:23 PM
This revision was automatically updated to reflect the committed changes.
reames marked 11 inline comments as done.
reames marked 3 inline comments as done.Jul 24 2019, 4:29 PM

Landed with most of the comments applied, but let's keep discussion going. I'm happy to iterate further in separate patches.

docs/LoopTerminology.rst
17 ↗(On Diff #211351)

I ended up ignoring the which -> that suggestions. I made the change and it seemed much less readable to me. (And yes, I know it's technically correct.)

23 ↗(On Diff #211351)

Can anyone provide me a citation for the definition of irreducible control flow? I want to link to it when adding this bit.

As for the "everyone else calls them loops" points, I'm happy to mention that. Have a suggestion for a survey paper or wikipedia which defines it?

p.s. Patch welcome to do this.

25–26 ↗(On Diff #211351)

All of our definitions are inherently circular. I tried to adjust this a bit, what do you think of the wording I landed with?

30 ↗(On Diff #211351)

*bold*, and _italics_ are the two I'm familiar with, but there are others.

35 ↗(On Diff #211351)

The first point is is incorrect. A single loop header can be the header of multiple nested loops.

I incorporated the second.

We should probably add a LoopInfo section; this didn't really feel like it belonged in the definition of the loops.

58 ↗(On Diff #211351)

Feel free to suggest additions/changes. You can do that as it's own patch if you want.

71 ↗(On Diff #211351)

I redrafted this completely. What do you think of the new version?

81 ↗(On Diff #211351)

I was trying to be somewhat generic. Technically, we can describe the BTC of any behavioural transition, not just leaving the loop. This shows up in some of the SCEV code today.

83–84 ↗(On Diff #211351)

@Meinersbur - neither point is relevant for the llvm definition of a loop, but maybe there deserves to be a "mapping C to LLVM" section?

jdoerfert reopened this revision.EditedJul 24 2019, 9:31 PM

The first point is is incorrect. A single loop header can be the header of multiple nested loops.

I fear we disagree on the definition of "a loop".

I was thinking we describe the llvm::Loop here as it is discovered by LoopInfo.
It seems you want to go for something different (or I am too tired to see my mistake).
[For the record, I think it often make sense to use a different definition than the one
I want to see here. For now, I am worried about inconsistencies in our definitions and
wrt. to the implementation.]


Let's try to unravel this (I think this is fun). I'm curious, what loops would you identify in the examples below.
I will discuss two "loop definitions" that seem natural to me but if you have something else in mind I would like to hear it.

ah:  br ..., label %al1, label %al2
al1: br %ah;
al2: br %ah;
bh:  br ..., label %bh, label %bl2
bl2: br %bh;
ch:  br ..., label %ch, label %ch

For me, all three examples have a single loop:
L_A: (header: ah, latches: al1, al2)
L_B: (header: bh, latches: bh, bl2)
L_C: (header: ch, latches: ch)
Basically, a loop is a header and a set of blocks dominated by the header such that you can reach the header from every one of them without leaving the set.

Now I could see why one would like to define it differently, e.g., via cycles in the CFG, so based on (back)edges.
That should result in two loops per example:
L_A1: (header: ah, latch: al1)
L_A2: (header: ah, latch: al2)
L_B1: (header: bh, latch: bh)
L_B2: (header: bh, latch: bl1)
L_C1: (header: ch, latch: ch)
L_C2: (header: ch, latch: ch)

What I dislike about this definition, except that it is not what loop info reports, is:

  • L_A1 and L_A2 are neither disjoint nor is one properly nested in the other.
  • L_C1 and L_C2 are identical.

The first point is is incorrect. A single loop header can be the header of multiple nested loops.

I fear we disagree on the definition of "a loop".

I was thinking we describe the llvm::Loop here as it is discovered by LoopInfo.
It seems you want to go for something different (or I am too tired to see my mistake).
[For the record, I think it often make sense to use a different definition than the one

I want to see here. For now, I am worried about inconsistencies in our definitions and
wrt. to the implementation.]

It turns out you're absolutely right here. I was *trying* to describe llvm::Loop, but apparently my own mental model was wrong here.

The case I had in mind was:
$ cat nested-example.ll

define void @nested() {
entry:

br label %header

header:

br i1 undef, label %header, label %outer_latch

outer_latch:

br i1 undef, label %header, label %exit

exit:

ret void

}

I was thinking of this as two loops, one containing the other.

But apparently I was wrong, the actually output was:
$ ../build/bin/opt nested-example.ll -analyze -loops
Printing analysis 'Natural Loop Information' for function 'nested':
Loop at depth 1 containing: %header<header><latch>,%outer_latch<latch><exiting>

An attempt at fixing the wording around the ambiguity just discussed is here: https://reviews.llvm.org/D65299

reames accepted this revision.Jul 25 2019, 2:16 PM

Reaccept a review so that I can re-close, as the patch was submitted.

This revision is now accepted and ready to land.Jul 25 2019, 2:16 PM
reames closed this revision.Jul 25 2019, 2:16 PM

I am unhappy about this having been committed already. Now we have official documentation at https://llvm.org/docs/LoopTerminology.html which is partially wrong.

docs/LoopTerminology.rst
23 ↗(On Diff #211351)

Here are the relevant excerpts from the dragon book, second edition:



It does not use "loop" to avoid giving a definition, but defines "natural loops" only. As given by the definition and as seen in the example, a natural loop is defined per backedge. However, LLVM follows the book's suggestion and treats them as a single loop.

In another book I looked up (Michael Wolfe: High Performance Compilers For Parallel Computing) uses a graph's strongly connected components to search for cycles, side-stepping the problem of irreducible loops. However, I wouldn't use it since it ignores loop nesting.

25–26 ↗(On Diff #211351)

The definition of a backedge is independent of the definition of a loop. Here is the definition from the dragon book:

A back edge is an edge a → b whose head b dominates its tail a.

35 ↗(On Diff #211351)

The first point is is incorrect. A single loop header can be the header of multiple nested loops.

I think this has already been resolved.

83–84 ↗(On Diff #211351)

"some interesting event" is just not a good way to make things clearer. Please avoid.

llvm/trunk/docs/LoopTerminology.rst
15–16

Mathematically, a cycle is a path in a graph where the start and end nodes are the same. That is, any loop has infinitely many cycles by e.g. repeating the same path. I'd avoid defining loops using cycles or define what a cycle as used here is.

reames marked 5 inline comments as done.Jul 31 2019, 10:10 AM

I am unhappy about this having been committed already. Now we have official documentation at https://llvm.org/docs/LoopTerminology.html which is partially wrong.

In retrospect, I probably landed this a bit quickly. I interpreted the comments as incremental refinement, and missed the one about cycles vs SCCs being correctness related. I believe the sole correctness issue has been resolved. If you see any others, please point them out.

docs/LoopTerminology.rst
23 ↗(On Diff #211351)

Thanks for the citation here. Reading through the excerpts, I want to draw your attention specifically to the second paragraph of the third image. Per that wording, the definitions used in the book allow multiple natural loops to share a single header block (if one is a subloop of the other). As per previous discussion in this thread and in https://reviews.llvm.org/D65299, our loop definition does not.

Do you spot the inconsistency in the definitions? I think it comes down to ours requiring a maximal SCC w/a backedge, and the "natural loop" term in the book not. Do you agree? If so, I'll try to draft something which compares and contrasts the definitions.

25–26 ↗(On Diff #211351)

While in principle I agree, I'm not sure defining it that way actually clarifies the document. In particular, we're almost always talking about the backedge *of a particular loop*, and thus the stand alone definition is somewhat confusing. Feel free to suggest a patch w/a wording change if you want to wordsmith this.

35 ↗(On Diff #211351)

Should be now fixed, let me know if the version after https://reviews.llvm.org/D65299 still needs polishing.

83–84 ↗(On Diff #211351)

Concrete suggestions on how to word this? I'm happy to improve, but it's the least awkward wording I could find.

We need to be able to say things such as:

  1. On iteration N, this branch switches from always taken to always untaken.
  2. On iteration N, we exit through exit block X.
  3. On iteration N, the loop will exit.
  4. After a maximum of N iterations, the loop will exit.
  5. Function F may throw, but only after iteration N.

(All are real examples from previous discussions.)

llvm/trunk/docs/LoopTerminology.rst
15–16

Should be now fixed, let me know if the version after https://reviews.llvm.org/D65299 still needs polishing.

This broke the sphinx bot:

http://lab.llvm.org:8011/builders/llvm-sphinx-docs/builds/33875/steps/docs-llvm-html/logs/stdio

Warning, treated as error:
/home/buildbot/llvm-build-dir/llvm-sphinx-docs/llvm/src/docs/LoopTerminology.rst:: WARNING: document isn't included in any toctree

Can you add a reference to this somewhere?

This broke the sphinx bot:
...
Can you add a reference to this somewhere?

Speculative fix in revision 367487.