Page MenuHomePhabricator

[LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, and loop induction variable.
ClosedPublic

Authored by Whitney on Apr 11 2019, 7:10 AM.

Details

Summary

This PR extends the loop object with more utilities to get loop bounds, step, and loop induction variable. There already exists passes which try to obtain the loop induction variable in their own pass, e.g. loop interchange. It would be useful to have a common area to get these information.

/// Example:
/// for (int i = lb; i < ub; i+=step)
///   <loop body>
/// --- pseudo LLVMIR ---
/// beforeloop:
///   guardcmp = (lb < ub)
///   if (guardcmp) goto preheader; else goto afterloop
/// preheader:
/// loop:
///   i1 = phi[{lb, preheader}, {i2, latch}]
///   <loop body>
///   i2 = i1 + step
/// latch:
///   cmp = (i2 < ub)
///   if (cmp) goto loop
/// exit:
/// afterloop:
///
/// getBounds
///   getInitialIVValue      --> lb
///   getStepInst            --> i2 = i1 + step
///   getStepValue           --> step
///   getFinalIVValue        --> ub
///   getCanonicalPredicate  --> '<'
///   getDirection           --> Increasing
/// getInductionVariable          --> i1
/// getAuxiliaryInductionVariable --> {i1}
/// isCanonical                   --> false

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Whitney marked 37 inline comments as done.May 1 2019, 7:34 AM

Addressed all the easier review comments first.

Whitney added inline comments.May 1 2019, 7:35 AM
llvm/include/llvm/Analysis/LoopInfo.h
620 ↗(On Diff #196634)

No, because isInductionPHI() takes a non const IndVar.

llvm/lib/Analysis/LoopInfo.cpp
290 ↗(On Diff #196634)

exit block -> ....-> the other successor
exit block dominates the other successor...
why should it change to successor of loop exit block (trivially) post dominates the other successor?

Whitney updated this revision to Diff 197592.May 1 2019, 10:45 AM
Whitney marked 6 inline comments as done.
Whitney updated this revision to Diff 197644.May 1 2019, 3:52 PM
Whitney marked 3 inline comments as done.
Whitney updated this revision to Diff 197770.May 2 2019, 6:50 AM
Whitney marked an inline comment as done.
Whitney updated this revision to Diff 197841.May 2 2019, 12:14 PM
Whitney updated this revision to Diff 198903.May 9 2019, 2:13 PM
Whitney updated this revision to Diff 199171.May 12 2019, 11:39 AM
kbarton requested changes to this revision.May 13 2019, 12:55 PM
kbarton added inline comments.
llvm/include/llvm/Analysis/LoopInfo.h
416 ↗(On Diff #198903)

Minor comment, but I think it's better to make the names as specific as possible. So, this could be getGuardBranch, instead of just getGuard, to distinguish between getGuardBlock for example.

422 ↗(On Diff #198903)

How is this method different from getCanonicalInductionVariable?
If these are fundamentally different, then the comments need to be clear what the differences are and when one should be used over the other.

575 ↗(On Diff #198903)

This is minor, but have you tried running this through doxygen to see what the output looks like?
It may come out looking much different in the doxygen pages then it does here.

682 ↗(On Diff #198903)

Minor comment -the parenthetical should be before the period, not after.

606 ↗(On Diff #196634)

I think this comment needs to be clarified further as I find it confusing the way it is written.
In particular, the mixing of itemization (before the example) and enumeration (after the example) . Can you rewrite it to make it more clear?

664 ↗(On Diff #196634)

Yes, I saw that now. Perhaps I'm just being too pedantic on the terminology that is being used.
I would have expected a method that returns a "Variable" to be a Value, not a PHINode (an instruction). However, you can get to the Value from the PHINode, so it's probably OK.

If others have strong feelings about this on way or another, please comment.

llvm/lib/Analysis/LoopInfo.cpp
249 ↗(On Diff #198903)

Should this be dyn_cast_or_null instead, to account for the possibility that getTerminator could returns nullptr?

226 ↗(On Diff #196634)

Can you make it explicit in the comments for getCanonicalPredicate that it will always return the signed predicate, even if the latch uses the unsigned version?

240 ↗(On Diff #199171)

If neither of these conditions are true, LoopBounds will be initialized with a null StepValue. I'm assuming this is intended behaviour.

Can you please add a comment to the getStepValue method that says it will return nullptr if the StepValue cannot be determined?

348 ↗(On Diff #199171)

I don't understand the comments below, specifically what the --> is meant to indicate.
I think it should be sufficient to say the method returns a branch instruction that branches around the specified loop.
Also, the GuardBB, SkipIndex, SkipToBlock, etc., are all local variables that don't escape this method, so they are typically not documented in the function comments.

355 ↗(On Diff #199171)

Can you rename this method to indicate you are returning a branch instruction (e.g., getPotentialGuardBranch).

418 ↗(On Diff #199171)

In the comments you describe the guard in terms of dominance, but you don't use dominance anywhere in the function.

I would suggest trying to refactor this as follows:

Change this method to getPotentialGuardBlock, that returns the basic block that (potentially) guards the loop body.
I think you can determine the guard block using dominators and post-dominators. Specifically, the guard block should be the closest predecessor that dominates the loop preheader. And the "other" successor of the potential guard block should post-dominate all of the blocks inside the loop. I believe that should be sufficient to guarantee the block is a potential guard.

Then you can get the branch from the guard block in the getGuard method, and do the necessary analysis on the branch itself to verify it conforms with the requirements for a guard branch. Does this make sense?

423 ↗(On Diff #199171)

Again, try to make it clear in the name what you are specifically getting - in this case the guard branch.

488 ↗(On Diff #199171)

It's probably reasonable here to assert that Header and Latch are not nullptr.

llvm/unittests/Analysis/LoopInfoTest.cpp
752 ↗(On Diff #199171)

These tests look good. I don't see any tests for the following - could you add some:

  1. Loop Bounds and associated methods you have added
  2. Unguarded loops (i.e., loops with constant upper bounds should not get a guard).
  3. Loop nests
  4. Functions with control flow that do not "guard" the loop.
This revision now requires changes to proceed.May 13 2019, 12:55 PM
jdoerfert added inline comments.May 13 2019, 6:23 PM
llvm/lib/Analysis/LoopInfo.cpp
418 ↗(On Diff #199171)

And the "other" successor of the potential guard block should post-dominate all of the blocks inside the loop.

Not all blocks but just the header I think. That should allow multi-exit loops (if it matters).

I believe that should be sufficient to guarantee the block is a potential guard.

Agreed.

Whitney marked 3 inline comments as done.May 13 2019, 6:39 PM
Whitney added inline comments.
llvm/include/llvm/Analysis/LoopInfo.h
575 ↗(On Diff #198903)

Tried this with doxygen, looks as expected.

llvm/lib/Analysis/LoopInfo.cpp
249 ↗(On Diff #198903)

Can a basic block not have a terminator? I thought it must have one.

488 ↗(On Diff #199171)

This will definitely be true, because it just checked that the loop is in isLoopSimplifyForm()..I can add asserts.

Whitney updated this revision to Diff 199539.May 14 2019, 5:39 PM
Whitney marked 24 inline comments as done.
Whitney added inline comments.
llvm/include/llvm/Analysis/LoopInfo.h
416 ↗(On Diff #198903)

changed.

422 ↗(On Diff #198903)

Added comment to clarify that.

682 ↗(On Diff #198903)

moved.

606 ↗(On Diff #196634)

I updated the comment again, hopefully is more clear this time.

664 ↗(On Diff #196634)

an instruction is a value as well.

llvm/lib/Analysis/LoopInfo.cpp
226 ↗(On Diff #196634)

That's not totally true, it returns the signed predicate for ne or eq which doesn't have a sign. Updated the comment to reflect that.

240 ↗(On Diff #199171)

added.

348 ↗(On Diff #199171)

Updated the comment.

355 ↗(On Diff #199171)

done.

418 ↗(On Diff #199171)

I changed the code to use the post dominator, but there is no need to use the dominator, as by finding the closest conditional branch, we know it dominates the loop.

423 ↗(On Diff #199171)

done

Whitney updated this revision to Diff 199653.May 15 2019, 11:57 AM
Whitney marked 2 inline comments as done.
Whitney added inline comments.
llvm/unittests/Analysis/LoopInfoTest.cpp
752 ↗(On Diff #199171)

Added more test cases

Whitney updated this revision to Diff 199908.May 16 2019, 2:35 PM
kbarton marked an inline comment as done.May 17 2019, 11:41 AM

I discussed this with @Whitney and I think at this point it makes sense to remove the getGuard and getPotentialGuard methods as there still seems to be some discussions about the right approach for that.
Aside from that, I think everything else looks pretty good, aside from some specific style-related comments.

llvm/include/llvm/Analysis/LoopInfo.h
574 ↗(On Diff #199901)

a -> an

673 ↗(On Diff #199901)

I think this condition needs to be reworded to:
The other successor post-dominates all nodes in the loop.

llvm/lib/Analysis/LoopInfo.cpp
249 ↗(On Diff #198903)

From: http://llvm.org/doxygen/classllvm_1_1BasicBlock.html

A well formed basic block is formed of a list of non-terminating instructions followed by a single terminator instruction. Terminator instructions may not occur in the middle of basic blocks, and must terminate the blocks. The BasicBlock class allows malformed basic blocks to occur because it may be useful in the intermediate stage of constructing or modifying a program. However, the verifier will ensure that basic blocks are "well formed".

So, while I agree it is unlikely to happen, it is still possible, depending on when this is called wrt other optimizations.

294 ↗(On Diff #199901)

From what I have seen, a more common way to write this method in LLVM-style would be:

if (SCEVAddRecExpr *StepAddRecExpr = dyn_cast<SCEVAddRecExpr>(SE.getSCEV(&getStepInst()))) 
  if (const SCEV *StepRecur = StepAddRecExpr->getStepRecurrence(SE)) {
    if (SE.isKnownPositive(StepRecur))
      return Direction::Increasing;
    if (SE.isKnownNegative(StepRecur))
      return Direction::Decreasing;
  }
return Direction::Unknown;

That said, I don't know if this is explicitly documented anywhere.

311 ↗(On Diff #199901)

Similarly, this is typically written as:

if (PHINode *IndVar = getInductionVariable(SE))
  return LoopBounds::getBounds(*this, *IndVar, SE);
return None;
528 ↗(On Diff #199901)

Same as above. This can be rewritten by initializing IndVar in the if condition and then calling isInductionPHI.

Whitney marked 17 inline comments as done.May 17 2019, 12:01 PM
Whitney updated this revision to Diff 200090.May 17 2019, 1:34 PM
Whitney marked 6 inline comments as done.
Whitney added inline comments.
llvm/lib/Analysis/LoopInfo.cpp
294 ↗(On Diff #199901)
kbarton accepted this revision.May 22 2019, 1:25 PM

Aside from a few minor comments, this LGTM.

llvm/include/llvm/Analysis/LoopInfo.h
535 ↗(On Diff #200090)

Need to remove guard branch from the comments, as that is not included in this patch anymore.

llvm/lib/Analysis/LoopInfo.cpp
197 ↗(On Diff #200090)

This should also be dyn_cast_or_null, since Latch->getTerminator() can return nullptr in some cases.

294 ↗(On Diff #199901)

Yes, that's true. But that is dealing more with large bodies of code and/or complicated conditionals where many things need to be satisfied. I think the purpose of that is it is very hard to keep all of the logic straight in your head while reading the code, so separating out common conditions or early exits makes the code easier to read. These functions, on the other hand, are very small and do not have a lot of conditions or context that needs to be remembered.

I agree this is somewhat subjective, and I don't have a strong preference either way. However, after looking at a lot of code in LLVM, I find the more compact style easier to read for small functions.

This revision is now accepted and ready to land.May 22 2019, 1:25 PM
fhahn added a comment.May 22 2019, 2:01 PM

I still think there is some duplication with InductionDescriptor that could be unified, which would also make it easier to use this new utility in places that currently use InductionDescriptor.

Some comments inline, also InductionDescriptor already stores & provides accessors for InitialIVValue, StepValue (although just the SCEV/ConstantInt - is the generic value actually what you need?) and StepInst. With my earlier comment, I was going more in the direction of using IVDescriptor, instead of duplicating the fields here and using the existing functions to detect induction PHIs.

llvm/include/llvm/Analysis/LoopInfo.h
61 ↗(On Diff #200090)

Unused?

636 ↗(On Diff #200090)

This is basically the same as InductionDescriptor::getConsecutiveDirection

657 ↗(On Diff #200090)

I am not sure what this means exactly. Do you have checks to exclude loops with multiple exit blocks?

Also, I am not sure what you intend on using this value for. Wouldn't it be less fragile to use SCEV to reason about the number of iterations? You can also evaluate the AddRec expression of the induction PHI at the final loop iteration.

llvm/lib/Analysis/LoopInfo.cpp
319 ↗(On Diff #200090)

Isn't the main part of this function the same as InductionDescriptor::isInductionPHI() ? http://llvm.org/doxygen/classllvm_1_1InductionDescriptor.html#a1974b8e8a8bae482a772ebfab1e9c794

fhahn added inline comments.May 22 2019, 2:04 PM
llvm/lib/Analysis/LoopInfo.cpp
319 ↗(On Diff #200090)

at leas conceptually

Whitney updated this revision to Diff 200821.May 22 2019, 2:32 PM
Whitney marked 4 inline comments as done.
Whitney marked 4 inline comments as done.Tue, May 28, 1:49 PM
Whitney added inline comments.
llvm/include/llvm/Analysis/LoopInfo.h
61 ↗(On Diff #200090)

right, will remove.

636 ↗(On Diff #200090)

I believe getConsecutiveDirection only work for constant step.

657 ↗(On Diff #200090)

This is the focus on the loop latch as the exiting block.

llvm/lib/Analysis/LoopInfo.cpp
319 ↗(On Diff #200090)

getInductionVariable() should return an induction phi, but an induction variable may not be a loop induction variable. A loop induction variable is used direct/indirectly in loop latch to determine if the loop continues.
I will look into if I can further simplify this function.

Whitney updated this revision to Diff 201890.Wed, May 29, 6:19 AM
Whitney marked an inline comment as done.Wed, May 29, 6:23 AM

@fhahn Thanks for your review. Do you think this is better now? Is there some areas you would like me to look into more?

Whitney retitled this revision from [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, induction variable, and guard branch. to [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, and loop induction variable..Wed, May 29, 11:25 AM
Whitney edited the summary of this revision. (Show Details)

If there are no further comments or concerns, I will commit this patch on this Wed.

Whitney updated this revision to Diff 202979.Tue, Jun 4, 10:27 AM
fhahn added a comment.Tue, Jun 4, 10:30 AM

If there are no further comments or concerns, I will commit this patch on this Wed.

Sorry for holding this up, I'll try to take another look today.

fhahn added a comment.Tue, Jun 4, 11:14 PM

LGTM, thanks! Let's get this in and iterate on it as we get more places to use it.

This revision was automatically updated to reflect the committed changes.