Details
- Reviewers
Meinersbur kbarton fhahn etiotto
Diff Detail
Event Timeline
So, I'm sure I'm not the best to write this one since I have not understood it completely, but first of all, after trying a lot to understand it, I feel it needs doc.
Second, I hope that you can help me write it correctly. :)
The point in which I'm perplexed is the latch. First of all, one of the goals of this transformation seems to be that the latch be also an exiting. That makes sense.
But in the videos I reference above, one says "canonicalize latch to have one predecessor" (Writing Loop Optimizations in LLVM) while the other "... a single successor" (Loop Fusion etc.).
And I don't get what is correct and what is not. First, predecessors of the latch don't seem to be important. Second, if the successors are what we care about, then in the examples
given in the videos, in the left example (before the transformation) the latch does have a single successor. While in the right (the transformed), it has 2.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
176–178 | Converting to do/while style is a description of loop-rotation. One of the purposes is to allow hoisting invariant loads into the preheader. In non-rotated loops, a load in the preaheader would be executed even if the memory was never accessed in the original loop. | |
212 | The LoopRotate pass was already mentioned at the beginning. Maybe you could also link to https://llvm.org/docs/Passes.html#loop-rotate-rotate-loops. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
176–178 | Oh, because you might do no iterations at all right? So, with the do-while, you'll do at least once. Regarding the wording, is this better? | |
212 | Yes, my bad, I forgot I had mentioned in the beginning, I'll put the link there. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
176–178 | Yes, it is better. The latch being also an exiting block is kind-of the definition of loop rotation: the latch contains the condition on whether to execute another iteration or leave the loop, i.e. the condition if the do part of a while/do loop. /// Return true if the loop is in rotated form. /// /// This does not check if the loop was rotated by loop rotation, instead it /// only checks if the loop is in rotated form (has a valid latch that exists /// the loop). bool isRotatedForm() const { assert(!isInvalid() && "Loop not in a valid state!"); BasicBlock *Latch = getLoopLatch(); return Latch && isLoopExiting(Latch); } | |
210–211 | [serious] By definition, a latch has a backedge to the header. If the latch had just a single successor, there could not be another edge to outside the loop. Using C code to describe the effect is higher level than the IR-level it is actually performed on. What a latch is is not obvious in the C code. LoopRotate also copies the header block, which might be interesting to mention. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
210–211 |
Yes, given that the latch contains the condition (as in a do-while loop), that was not so smart on my part. :P
I don't see why having a single predecessor is important. What do I miss?
Yes, I'll change it to IR. How about if I put view-cfg-produced image? |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
176–178 | Ok, thanks! I won't put it then. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
210–211 | Could you point me to where in the videos this statement is made? It is totally possible that we make mistakes as well. I also would not see why the latch would need to have a single predecessor.
In image illustration would be nice. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
210–211 |
Of course, no bad intention. I'm not the best to identify the mistake anyway. :)
Great. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
210–211 | The first presentation talks about a single predecessor and the second about single successor. I think it might refer to this: // Preserve canonical loop form, which means that 'Exit' should have only // one predecessor. Note that Exit could be an exit block for multiple // nested loops, causing both of the edges to now be critical and need to // be split. However, AFAIK it is not directly a requirement for either loop simplified or rotated form, at least it is not checked in isLoopSimplfyForm or isRotatedForm. That is, I think it is legal (but somewhat strange) in both canonical forms for have a switch in the latch with three cases, one going to the header and two going to (the same) exit block. Looks like LoopSimplify even reverses this change. Maybe @kbarton can elaborate what was meant. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
210–211 | Thanks for the references! The loop rotation excerpt was not very clear to me. I understood the LoopSimplify one, but seeing the bigger picture I don't understand the reasoning. I'll check tomorrow. Hopefully @kbarton can help. |
Hi @baziotis. First, thank you very much for working on this - I agree that more description of loop rotation is very important (and badly needed).
Regarding the specific questions you ask, I don't recall off hand, but I will echo @Meinersbur comment that it's possible we made mistakes in those presentations also. I'll try to review/remember what we were thinking in those presentations and reply here tomorrow.
An aside - did you consider writing at least some of the examples using LLVM IR instead of C? I agree with @Meinersbur that it is hard to represent some of these concepts in C, and it might be easier to illustrate using LLVM IR. I personally have found it's good to learn these transformations based on what the IR looks like, not the corresponding C or C++.
Hi @kbarton, thanks for your time to give feedback.
An aside - did you consider writing at least some of the examples using LLVM IR instead of C? I agree with @Meinersbur that it is hard to represent some of these concepts in C, and it might be easier to illustrate using LLVM IR. I personally have found it's good to learn these transformations based on what the IR looks like, not the corresponding C or C++.
Definitely, I agree, it's coming in the next update. I think each has their place. For example, when talking about more high-level concepts, like transforming for to do-while or the guard, C helps. But when it comes to latches, headers etc. it indeed doesn't get the point across. I hope that I will also be able to put some view-cfg style images.
- General rewrite.
The 3 images that were used are:
Initial loop: https://imgur.com/a/6hSIPPi
Rotated loop: https://imgur.com/a/Ub3eLnd
Guarded loop: https://imgur.com/a/aOECGDf
I tried to read a bunch of source code but I didn't find any more benefits to the loop rotated form.
I went through the recordings you cited, and as you expected these are mistakes.
First, the comment I made about loop latch having a single predecessor was a typo introduced when converting slides. That should have been successor, to match with the slides I presented at EuroLLVM.
The comment about single successor I got from the original patch for loop rotation (https://reviews.llvm.org/D22630). However, as @Meinersbur pointed out earlier, this comment doesn't make sense because the latch has to have two successors: the loop header and the loop exit block.
I apologize for the confusion. Hopefully things are more clear now.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
206 | As a way to articulate this warning concretely, in my experience when you use unsigned as the parameter type, LLVM is able to prove the guard is not necessary and thus will generate the example you give above (i.e., it knows that n is always greater than zero and thus can fold the guard away. On the other hand, if you use int, it generally cannot prove the guard is not necessary (in my experience) and thus will always create the guard. | |
222 | nit: control-flow graphs -> Control Flow Graphs (CFG) | |
289 | A couple comments on this:
| |
332 | What are you referring to by "this form"? Is this rotated loops in general, or rotated loops that have a guard? I think this could be made more concrete by referring back to the original example. In the original loop, entry: is the preheader for the loop, but trying to hoist an instruction from the loop body is incorrect if the loop body never executes. With a guarded loop, on the other hand, moving an instruction from body: to loop.preheader: is legal because loop.preheader: is only executed if body: is executed at least once (figure GuardedLoop). When the compiler can prove that the guard is not needed, it is still legal to hoist instructions from body: to entry: because it is known that body: will execute at least once (figure RotatedLoop). |
Yes, things are quite clearer now, thanks for the clarifications!
llvm/docs/LoopTerminology.rst | ||
---|---|---|
206 | Yes, I'll follow with this clarification. | |
289 |
Thanks, I didn't know that. Regarding the rest, generally I agree but in this case it may be a big jump from the for version to the loop-rotate result because it will also loo-simplify it, what do you think ? | |
332 | "this form" being the result of loop-rotate. This can be without a guard under circumstances right? Which is basically what you referred to so I'll try to make clear that moving things in the preheader after loop-rotate is always correct whether it has a guard or not. |
This is not the original implementation of LoopRotate and wasn't even committed. IIUC, current LoopRotate only clones the loop header (i.e. the loop exit condition must be its terminator), D22630 would also rotate loops where the loop exit condition is somewhere else.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
210–211 |
for (int i = 0; i < n; ++i) { auto v = *p; use(v); } The value of p is loop-invariant, but the following transformation is illegal (to avoid repeatedly loading the same value from memory): auto v = *p; for (int i = 0; i < n; ++i) { use(v); } because if n==0, the address in p is dereferenced while it was not in the original loop. p might be null and the transform code would segfault while the original code did not. Solution is to insert a check whether the loop body is executed at least once: if (0 < n) { // loop guard auto v = *p; for (int i = 0; i < n; ++i) { use(v); } } Now the loop condition before entering the first iteration is redundant, we already checked it in the guard. We can move it to after the first iteration: if (0 < n) { // loop guard auto v = *p; for (int i = 0; ; ++i) { use(v); if (i+1>=n) break; } } We just invented loop rotation. The name 'rotation' comes from gradually peeling/unrolling (with an infinite spiral)) statements out of the loop until everything in the loop is executed at least once: int i = 0; while (1) { foo(i); bar(i); if (i >= n) return; ++i; xyzzy(i); } 🡆 int i = 0; foo(i); while (1) { bar(i); if (i >= n) return; ++i; xyzzy(i); foo(i); } 🡆 int i = 0; foo(i); bar(i); while (1) { if (i >= n) return; ++i; xyzzy(i); foo(i); bar(j); } 🡆 int i = 0; foo(i); bar(i); if (i >= n) return; while (1) { ++i; xyzzy(i); foo(i); bar(i); if (i >= n) return; } |
IIUC, current LoopRotate only clones the loop header (i.e. the loop exit condition must be its terminator)
It might be good to mention that.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
210–211 | Wow, that was a very neat explanation, thank you! Along with this, I was able to understand most of the code that implements it. |
- Ease into what loop-rotate does with better phrasing.
- Added example that presents the intuition of loop rotation (thanks Michael).
This is somewhat more verbose than I'd expect for a terminology document (more like a tutorial). I also don't see why not: LGTM.
You have commit right now, correct? Before committing, please fix the remaining typos/nits.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
221 | [typo] reprensentation | |
277 | The backedge "T -> body" looks displaced. Not sure what to do about it. | |
286 | This is something typically done by simplify-cfg, I don't think loop-rotate itself merges BBs. | |
325 | [nit] please make clear that -loop-simplify runs before -loop-rotate. | |
357–358 | [nit] The performance reason is really minor. When optimistically assuming that in most cases, a loop will "loop" lots of times, the occasional single execution of a load whose result is not used is minor. The additional control-flow introduced by the guard might cause more slowdown (not necessarily because of the addition branch instruction, branch predictors are really good these days). I'd just document the semantic reason. It would be interesting to explore whether the guard still makes sense if p has the attribute dereferenceable (i.e. we know that the load will not cause an error) | |
382 | [grammar] "That" -> "This" |
It seemed to me as well, but it looked useful. I'm glad you like it.
You have commit right now, correct? Before committing, please fix the remaining typos/nits.
Yes and yes.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
277 | Believe me, I tried.. :p | |
286 | Yes, me too. LoopRotate has no such thing. | |
357–358 |
Ok, yes.
I haven't thought about that, thanks. Not relevant to this doc I guess but otherwise an interesting thing I could experiment with. | |
392 | I should make clear here that such movement of code does not happen from -loop-rotate (but e.g. LICM). Although we should be careful on the wording because LoopRotate does move some code. As matter of fact though, I don't understand this: https://godbolt.org/z/hgYhHX |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
392 |
Yes, that would be good.
good catch!
The current implementation is limited to only look loop header. If moving %invconv to the header, we get: https://godbolt.org/z/hmodWG |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
392 |
Oh yes, I forgot this: And actually this is an interesting example because the loop is simplified pre-loop-rotate and so the instructions are moved to the preheader of the original. Then apparently loop-rotate breaks the canonical form, loop-simplify is called again and so a different preheader is created. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
325 | Actually, related to the other comment I wrote, it seems to me that running LoopSimplify before LoopRotate will have no effect. Probably this effect was done because LoopSimplify was ran after. And since LoopRotate broke the form, the after LoopSimplify is the one that changed it. |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
325 | LoopRotation, at least in the legacy pass manager, depends on LoopSimplify (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LoopRotation.cpp#L86), so it will have run when entering LoopRotation. There might be other executions of LoopSimplify added to the pass manager (which then will have no effect) Also LoopRotation bails out if not in simplified form: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp#L324 Moreover, LoopRotation ensures simplified form after rotation: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp#L561 |
llvm/docs/LoopTerminology.rst | ||
---|---|---|
325 | Yes, I agree with all of that! Actually, I probably stated it incorrectly:
So, I'll update it to make that clear. |
- Changed loop-rotate to LoopRotate as the latter is more linked with the pass while the former seems like a cmd arguments.
- Clarification on the hoisting of instructions.
- Removed the reference to LoopSimplify. I think it is better. I tried to write it a couple of times it introduces duplicate doc. i.e. LoopRotate is a LoopPass and so LoopSimplify is ran before it, but this is mentioned in the doc about Loop Simplify Form.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
357–358 | I tried to implement that today but it seems it's already done. First of all we can see it by adding the attribute. But we can also see that by following: https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/Scalar/LICM.cpp#L806 |
My LGTM still stands.
llvm/docs/LoopTerminology.rst | ||
---|---|---|
290 | [typo] transforms | |
357–358 | Yes, but it's part of LICM, not LoopRotation. It would still be hoisted even if not rotated. isSafeToExecuteUnconditionally also checks isGuaranteedToExecute. My comment was about the loop guard (and loop rotation) potentially being unnecessary if all memory accesses are derefenceable. | |
393 | "Warning" is seems to be over-the-top. Why would someone think LoopRotate would do hoisting? |
Sorry, I forgot again to add the differential revision in the end. At least I'll add here and to the commit in Phabricator (when the latter is ready).
Converting to do/while style is a description of loop-rotation. One of the purposes is to allow hoisting invariant loads into the preheader. In non-rotated loops, a load in the preaheader would be executed even if the memory was never accessed in the original loop.