This is an archive of the discontinued LLVM Phabricator instance.

[LoopTerminology] Rotated Loops
ClosedPublic

Authored by baziotis on Feb 22 2020, 1:02 PM.

Diff Detail

Event Timeline

baziotis created this revision.Feb 22 2020, 1:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2020, 1:02 PM

So, I'm sure I'm not the best to write this one since I have not understood it completely, but first of all, after trying a lot to understand it, I feel it needs doc.
Second, I hope that you can help me write it correctly. :)

The point in which I'm perplexed is the latch. First of all, one of the goals of this transformation seems to be that the latch be also an exiting. That makes sense.
But in the videos I reference above, one says "canonicalize latch to have one predecessor" (Writing Loop Optimizations in LLVM) while the other "... a single successor" (Loop Fusion etc.).
And I don't get what is correct and what is not. First, predecessors of the latch don't seem to be important. Second, if the successors are what we care about, then in the examples
given in the videos, in the left example (before the transformation) the latch does have a single successor. While in the right (the transformed), it has 2.

Meinersbur added a comment.EditedFeb 22 2020, 6:10 PM
This comment has been deleted.
llvm/docs/LoopTerminology.rst
176–178

Converting to do/while style is a description of loop-rotation. One of the purposes is to allow hoisting invariant loads into the preheader. In non-rotated loops, a load in the preaheader would be executed even if the memory was never accessed in the original loop.

212

The LoopRotate pass was already mentioned at the beginning. Maybe you could also link to https://llvm.org/docs/Passes.html#loop-rotate-rotate-loops.

baziotis marked 2 inline comments as done.Feb 23 2020, 4:20 AM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
176–178

Oh, because you might do no iterations at all right? So, with the do-while, you'll do at least once. Regarding the wording, is this better?
"Loops are rotated by the loop-rotate pass, which converts loops into do/while style loops."
Then follow with examples and then mention the purposes. Also, should I add to the purposes that a latch block is an exiting block (and that this is useful to loop fusion) ? Although, as I stated in the comment, I have not completely understood this part with the latch.

212

Yes, my bad, I forgot I had mentioned in the beginning, I'll put the link there.

Meinersbur added inline comments.Feb 23 2020, 3:26 PM
llvm/docs/LoopTerminology.rst
176–178

Yes, it is better.

The latch being also an exiting block is kind-of the definition of loop rotation: the latch contains the condition on whether to execute another iteration or leave the loop, i.e. the condition if the do part of a while/do loop.

/// Return true if the loop is in rotated form.
///
/// This does not check if the loop was rotated by loop rotation, instead it
/// only checks if the loop is in rotated form (has a valid latch that exists
/// the loop).
bool isRotatedForm() const {
  assert(!isInvalid() && "Loop not in a valid state!");
  BasicBlock *Latch = getLoopLatch();
  return Latch && isLoopExiting(Latch);
}
210–211

[serious] By definition, a latch has a backedge to the header. If the latch had just a single successor, there could not be another edge to outside the loop.

Using C code to describe the effect is higher level than the IR-level it is actually performed on. What a latch is is not obvious in the C code. LoopRotate also copies the header block, which might be interesting to mention.

baziotis marked an inline comment as done.Feb 23 2020, 4:00 PM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
210–211

[serious] By definition, a latch has a backedge to the header. If the latch had just a single successor, there could not be another edge to outside the loop.

Yes, given that the latch contains the condition (as in a do-while loop), that was not so smart on my part. :P
But honestly, the 2 videos confused me in this part:

  • On the one video it says that the latch has a single successor. I assume was a typo.
  • On the other, it says that it has a single predecessor. Which, in the example given, I don't think is true. Besides that,

I don't see why having a single predecessor is important. What do I miss?

Using C code to describe the effect is higher level than the IR-level it is actually performed on. What a latch is is not obvious in the C code. LoopRotate also copies the header block, which might be interesting to mention.

Yes, I'll change it to IR. How about if I put view-cfg-produced image?

baziotis marked an inline comment as done.Feb 23 2020, 4:01 PM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
176–178

Ok, thanks! I won't put it then.

Meinersbur added inline comments.Feb 24 2020, 12:14 PM
llvm/docs/LoopTerminology.rst
210–211

Could you point me to where in the videos this statement is made? It is totally possible that we make mistakes as well.

I also would not see why the latch would need to have a single predecessor.

How about if I put view-cfg-produced image?

In image illustration would be nice.

baziotis marked an inline comment as done.Feb 24 2020, 2:17 PM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
210–211

Could you point me to where in the videos this statement is made? It is totally possible that we make mistakes as well.
I also would not see why the latch would need to have a single predecessor.

Of course, no bad intention. I'm not the best to identify the mistake anyway. :)
So, in the video Writing Loop Optimizations, at around 4:47, there's a slide mentioning the single predecessor.
Also, in the video Loop Fusion, Loop Distribution and their Place in the Loop Optimization Pipeline, at around 5:31, there's a slide mentioning the single successor.

In image illustration would be nice.

Great.

Meinersbur added inline comments.Feb 24 2020, 3:06 PM
llvm/docs/LoopTerminology.rst
210–211

The first presentation talks about a single predecessor and the second about single successor. I think it might refer to this:

// Preserve canonical loop form, which means that 'Exit' should have only
// one predecessor. Note that Exit could be an exit block for multiple
// nested loops, causing both of the edges to now be critical and need to
// be split.

However, AFAIK it is not directly a requirement for either loop simplified or rotated form, at least it is not checked in isLoopSimplfyForm or isRotatedForm. That is, I think it is legal (but somewhat strange) in both canonical forms for have a switch in the latch with three cases, one going to the header and two going to (the same) exit block. Looks like LoopSimplify even reverses this change.

Maybe @kbarton can elaborate what was meant.

baziotis marked an inline comment as done.Feb 24 2020, 4:16 PM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
210–211

Thanks for the references! The loop rotation excerpt was not very clear to me. I understood the LoopSimplify one, but seeing the bigger picture I don't understand the reasoning. I'll check tomorrow. Hopefully @kbarton can help.
Btw, reading through these, it seems we should add to the docs what is a deoptimizing edge and what is a critical edge.
It then might be clearer to read the commits: https://github.com/llvm/llvm-project/commit/2f6987ba61cc31c16c64f511e5cbc76b52dc67b3
and
https://github.com/llvm/llvm-project/commit/c8ca49659ac479b5be0349528d42179f40fc553b#diff-7430b177373b789cad7153ebb942f708R11

Hi @baziotis. First, thank you very much for working on this - I agree that more description of loop rotation is very important (and badly needed).
Regarding the specific questions you ask, I don't recall off hand, but I will echo @Meinersbur comment that it's possible we made mistakes in those presentations also. I'll try to review/remember what we were thinking in those presentations and reply here tomorrow.

An aside - did you consider writing at least some of the examples using LLVM IR instead of C? I agree with @Meinersbur that it is hard to represent some of these concepts in C, and it might be easier to illustrate using LLVM IR. I personally have found it's good to learn these transformations based on what the IR looks like, not the corresponding C or C++.

Hi @kbarton, thanks for your time to give feedback.

An aside - did you consider writing at least some of the examples using LLVM IR instead of C? I agree with @Meinersbur that it is hard to represent some of these concepts in C, and it might be easier to illustrate using LLVM IR. I personally have found it's good to learn these transformations based on what the IR looks like, not the corresponding C or C++.

Definitely, I agree, it's coming in the next update. I think each has their place. For example, when talking about more high-level concepts, like transforming for to do-while or the guard, C helps. But when it comes to latches, headers etc. it indeed doesn't get the point across. I hope that I will also be able to put some view-cfg style images.

baziotis updated this revision to Diff 247946.Mar 3 2020, 10:03 AM
  • General rewrite.

The 3 images that were used are:
Initial loop: https://imgur.com/a/6hSIPPi
Rotated loop: https://imgur.com/a/Ub3eLnd
Guarded loop: https://imgur.com/a/aOECGDf

I tried to read a bunch of source code but I didn't find any more benefits to the loop rotated form.

I went through the recordings you cited, and as you expected these are mistakes.
First, the comment I made about loop latch having a single predecessor was a typo introduced when converting slides. That should have been successor, to match with the slides I presented at EuroLLVM.
The comment about single successor I got from the original patch for loop rotation (https://reviews.llvm.org/D22630). However, as @Meinersbur pointed out earlier, this comment doesn't make sense because the latch has to have two successors: the loop header and the loop exit block.

I apologize for the confusion. Hopefully things are more clear now.

llvm/docs/LoopTerminology.rst
206

As a way to articulate this warning concretely, in my experience when you use unsigned as the parameter type, LLVM is able to prove the guard is not necessary and thus will generate the example you give above (i.e., it knows that n is always greater than zero and thus can fold the guard away.

On the other hand, if you use int, it generally cannot prove the guard is not necessary (in my experience) and thus will always create the guard.

222

nit: control-flow graphs -> Control Flow Graphs (CFG)
I tend to capitalize the words that are used in acronyms. I don't know if this is grammatically correct or just common practice.

289

A couple comments on this:

  1. My understanding was that loop rotate will always generate the guard, and if it can prove the guard is not needed then it will remove it.
  2. Even if it is not the case, I personally don't like giving an example and then saying in text below it "this isn't actually legal, we need to do this also". Would it be possible to rewrite this to give the complete example below, and then follow-up with something along the lines of: "if the compiler can prove that the guard condition always evaluates to true, it will remove the guard altogether."?
332

What are you referring to by "this form"? Is this rotated loops in general, or rotated loops that have a guard?

I think this could be made more concrete by referring back to the original example. In the original loop, entry: is the preheader for the loop, but trying to hoist an instruction from the loop body is incorrect if the loop body never executes. With a guarded loop, on the other hand, moving an instruction from body: to loop.preheader: is legal because loop.preheader: is only executed if body: is executed at least once (figure GuardedLoop). When the compiler can prove that the guard is not needed, it is still legal to hoist instructions from body: to entry: because it is known that body: will execute at least once (figure RotatedLoop).

baziotis marked 3 inline comments as done.Mar 4 2020, 3:29 AM

I went through the recordings you cited, and as you expected these are mistakes.
First, the comment I made about loop latch having a single predecessor was a typo introduced when converting slides. That should have been successor, to match with the slides I presented at EuroLLVM.
The comment about single successor I got from the original patch for loop rotation (https://reviews.llvm.org/D22630). However, as @Meinersbur pointed out earlier, this comment doesn't make sense because the latch has to have two successors: the loop header and the loop exit block.

I apologize for the confusion. Hopefully things are more clear now.

Yes, things are quite clearer now, thanks for the clarifications!

llvm/docs/LoopTerminology.rst
206

Yes, I'll follow with this clarification.

289

My understanding was that loop rotate will always generate the guard, and if it can prove the guard is not needed then it will remove it.

Thanks, I didn't know that.

Regarding the rest, generally I agree but in this case it may be a big jump from the for version to the loop-rotate result because it will also loo-simplify it, what do you think ?
We could do this: After the for LLVM IR, follow with "this is how we could convert that in do-while style loop", basically implying that this is not what loop-rotate will do. Follow with the notes and then "here's what loop-rotate will do. It inserted a guard because ... and also simplified it ..."

332

"this form" being the result of loop-rotate. This can be without a guard under circumstances right?
Given that loop-rotate has deduced whether we need a guard or not, moving things in the preheader should be valid whether it has a guard or not (i.e. if it doesn't have a guard, it is because loop-rotate has deduced that the loop will execute at least once).

Which is basically what you referred to so I'll try to make clear that moving things in the preheader after loop-rotate is always correct whether it has a guard or not.

The comment about single successor I got from the original patch for loop rotation (https://reviews.llvm.org/D22630).

This is not the original implementation of LoopRotate and wasn't even committed. IIUC, current LoopRotate only clones the loop header (i.e. the loop exit condition must be its terminator), D22630 would also rotate loops where the loop exit condition is somewhere else.

llvm/docs/LoopTerminology.rst
210–211

I understood the LoopSimplify one, but seeing the bigger picture I don't understand the reasoning.

for (int i = 0; i < n; ++i) {
  auto v = *p;
  use(v);
}

The value of p is loop-invariant, but the following transformation is illegal (to avoid repeatedly loading the same value from memory):

auto v = *p;
for (int i = 0; i < n; ++i) {
  use(v);
}

because if n==0, the address in p is dereferenced while it was not in the original loop. p might be null and the transform code would segfault while the original code did not. Solution is to insert a check whether the loop body is executed at least once:

if (0 < n) { // loop guard
  auto v = *p;
  for (int i = 0; i < n; ++i) {
    use(v);
  }
}

Now the loop condition before entering the first iteration is redundant, we already checked it in the guard. We can move it to after the first iteration:

if (0 < n) { // loop guard
  auto v = *p;
  for (int i = 0; ; ++i) {
    use(v);
    if (i+1>=n)
      break;
  }
}

We just invented loop rotation.

The name 'rotation' comes from gradually peeling/unrolling (with an infinite spiral)) statements out of the loop until everything in the loop is executed at least once:

int i = 0;
while (1) {
  foo(i);
  bar(i);
  if (i >= n) return;
  ++i;
  xyzzy(i);
}

🡆

int i = 0;
foo(i);
while (1) {
  bar(i);
  if (i >= n) return;
  ++i;
  xyzzy(i);
  foo(i);
}

🡆

int i = 0;
foo(i);
bar(i);
while (1) {
  if (i >= n) return;
  ++i;
  xyzzy(i);
  foo(i);
  bar(j);
}

🡆

int i = 0;
foo(i);
bar(i);
if (i >= n) return;
while (1) {
  ++i;
  xyzzy(i);
  foo(i);
  bar(i);
  if (i >= n) return;
}
baziotis marked an inline comment as done.Mar 4 2020, 12:32 PM

IIUC, current LoopRotate only clones the loop header (i.e. the loop exit condition must be its terminator)

It might be good to mention that.

llvm/docs/LoopTerminology.rst
210–211

Wow, that was a very neat explanation, thank you! Along with this, I was able to understand most of the code that implements it.
Apart from the implementation details, what I understand is that the point of loop rotation is to hoist invariant instructions (not just loads) to the preheader (and it may fold some instructions along the way).

baziotis updated this revision to Diff 248973.Mar 7 2020, 5:11 PM
  • Ease into what loop-rotate does with better phrasing.
  • Added example that presents the intuition of loop rotation (thanks Michael).
Meinersbur accepted this revision.Mar 9 2020, 5:38 PM

This is somewhat more verbose than I'd expect for a terminology document (more like a tutorial). I also don't see why not: LGTM.

You have commit right now, correct? Before committing, please fix the remaining typos/nits.

llvm/docs/LoopTerminology.rst
221

[typo] reprensentation

277

The backedge "T -> body" looks displaced. Not sure what to do about it.

286

This is something typically done by simplify-cfg, I don't think loop-rotate itself merges BBs.

325

[nit] please make clear that -loop-simplify runs before -loop-rotate.

357–358

[nit] The performance reason is really minor. When optimistically assuming that in most cases, a loop will "loop" lots of times, the occasional single execution of a load whose result is not used is minor. The additional control-flow introduced by the guard might cause more slowdown (not necessarily because of the addition branch instruction, branch predictors are really good these days). I'd just document the semantic reason.

It would be interesting to explore whether the guard still makes sense if p has the attribute dereferenceable (i.e. we know that the load will not cause an error)

382

[grammar] "That" -> "This"

This revision is now accepted and ready to land.Mar 9 2020, 5:38 PM
baziotis marked 4 inline comments as done.Mar 9 2020, 6:42 PM

This is somewhat more verbose than I'd expect for a terminology document (more like a tutorial). I also don't see why not: LGTM.

It seemed to me as well, but it looked useful. I'm glad you like it.

You have commit right now, correct? Before committing, please fix the remaining typos/nits.

Yes and yes.

llvm/docs/LoopTerminology.rst
277

Believe me, I tried.. :p
I thought about creating the image from view-cfg, erase the arrows, then create again the arrows.
But then the image is not reproducible. I don't know how much we care about that.

286

Yes, me too. LoopRotate has no such thing.

357–358

I'd just document the semantic reason.

Ok, yes.

It would be interesting to explore whether the guard still makes sense if p has the attribute dereferenceable (i.e. we know that the load will not cause an error)

I haven't thought about that, thanks. Not relevant to this doc I guess but otherwise an interesting thing I could experiment with.

392

I should make clear here that such movement of code does not happen from -loop-rotate (but e.g. LICM). Although we should be careful on the wording because LoopRotate does move some code. As matter of fact though, I don't understand this: https://godbolt.org/z/hgYhHX
%invcond doesn't write to / read from memory, has invariant operands, is not a terminator and is not debuginfo intrinsic.
So, I don't get why it isn't moved outside, I'll take a look tomorrow.

Meinersbur added inline comments.Mar 9 2020, 7:15 PM
llvm/docs/LoopTerminology.rst
392

I should make clear here that such movement of code does not happen from -loop-rotate

Yes, that would be good.

because LoopRotate does move some code.

good catch!

%invcond doesn't write to / read from memory, has invariant operands, is not a terminator and is not debuginfo intrinsic.

The current implementation is limited to only look loop header. If moving %invconv to the header, we get: https://godbolt.org/z/hmodWG

baziotis marked an inline comment as done.Mar 10 2020, 7:26 AM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
392

The current implementation is limited to only look loop header. If moving %invconv to the header, we get: https://godbolt.org/z/hmodWG

Oh yes, I forgot this:
BasicBlock::iterator I = OrigHeader->begin(), E = OrigHeader->end();

And actually this is an interesting example because the loop is simplified pre-loop-rotate and so the instructions are moved to the preheader of the original. Then apparently loop-rotate breaks the canonical form, loop-simplify is called again and so a different preheader is created.

baziotis marked an inline comment as done.Mar 10 2020, 8:10 AM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
325

Actually, related to the other comment I wrote, it seems to me that running LoopSimplify before LoopRotate will have no effect. Probably this effect was done because LoopSimplify was ran after. And since LoopRotate broke the form, the after LoopSimplify is the one that changed it.

Meinersbur added inline comments.Mar 10 2020, 11:36 AM
llvm/docs/LoopTerminology.rst
325

LoopRotation, at least in the legacy pass manager, depends on LoopSimplify (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LoopRotation.cpp#L86), so it will have run when entering LoopRotation. There might be other executions of LoopSimplify added to the pass manager (which then will have no effect)

Also LoopRotation bails out if not in simplified form: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp#L324

Moreover, LoopRotation ensures simplified form after rotation: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp#L561

baziotis marked an inline comment as done.Mar 10 2020, 4:02 PM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
325

Yes, I agree with all of that! Actually, I probably stated it incorrectly:
I did not mean that a LoopSimplify won't run before since it is added automatically by the pass manager for LoopPasses.
And because LoopRotate works only on simplified loops.
But in this example, it will have no effect. So, there has to be an after LoopSimplify for this effect. And specifically, I didn't see this, thanks:

Moreover, LoopRotation ensures simplified form after rotation: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp#L561

So, I'll update it to make that clear.

baziotis updated this revision to Diff 249537.Mar 10 2020, 6:57 PM
  • Changed loop-rotate to LoopRotate as the latter is more linked with the pass while the former seems like a cmd arguments.
  • Clarification on the hoisting of instructions.
  • Removed the reference to LoopSimplify. I think it is better. I tried to write it a couple of times it introduces duplicate doc. i.e. LoopRotate is a LoopPass and so LoopSimplify is ran before it, but this is mentioned in the doc about Loop Simplify Form.
baziotis marked an inline comment as done.Mar 11 2020, 1:23 PM
baziotis added inline comments.
llvm/docs/LoopTerminology.rst
357–358

I tried to implement that today but it seems it's already done. First of all we can see it by adding the attribute. But we can also see that by following: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LICM.cpp#L806
Here, simplistically, the first 2 parts of the condition check for "invariance" (actually probably having the first is too conservative but anyway, off-topic) and the third for "semantics preservation". Without dereferenceable the first is satisfied while the second not. If we follow it we end up in here:
https://github.com/llvm/llvm-project/blob/master/llvm/lib/Analysis/ValueTracking.cpp#L4229
With dereferenceable is still satisfied because we don't suppress speculation and is anyway aligned.

Meinersbur accepted this revision.Mar 16 2020, 2:45 PM

My LGTM still stands.

llvm/docs/LoopTerminology.rst
290

[typo] transforms

357–358

Yes, but it's part of LICM, not LoopRotation. It would still be hoisted even if not rotated. isSafeToExecuteUnconditionally also checks isGuaranteedToExecute.

My comment was about the loop guard (and loop rotation) potentially being unnecessary if all memory accesses are derefenceable.

393

"Warning" is seems to be over-the-top. Why would someone think LoopRotate would do hoisting?

baziotis marked 2 inline comments as done.EditedMar 16 2020, 3:44 PM

My LGTM still stands.

Ok thanks, I'll fix the nits and I'll commit.

llvm/docs/LoopTerminology.rst
357–358

Hmm, ok my bad. I'll try it then. With a quick look, it seems that your assumption holds and we can do it. But I'll try to put up a patch where we can discuss it more.

393

Yes, ok

baziotis updated this revision to Diff 250645.Mar 16 2020, 3:55 PM

Fixed nits. Ready for commit.

Fixed nits. Ready for commit.

Sorry, I forgot again to add the differential revision in the end. At least I'll add here and to the commit in Phabricator (when the latter is ready).