This is an archive of the discontinued LLVM Phabricator instance.

[LOOPINFO] Introduce the loop guard API.
ClosedPublic

Authored by Whitney on Jun 27 2019, 9:29 AM.

Details

Summary

This is the first patch for the loop guard. We introduced getLoopGuardBranch() and isGuarded().

For rotated loops, the guard branch determines if the loop executes at least one iteration.
That is, a rotated loop with a guard branch will execute 0 or more times, depending on the evaluation of the guard.
On the other hand, a rotated loop with no guard branch will execute 1 or more times.
This currently only works on simplified loop, as it requires a preheader and a latch to identify the guard.
It will work on loops of the form:

/// GuardBB:
///   br cond1, Preheader, ExitSucc <== GuardBranch
/// Preheader:
///   br Header
/// Header:
///  ...
///   br Latch
/// Latch:
///   br cond2, Header, ExitBlock
/// ExitBlock:
///   br ExitSucc
/// ExitSucc:

Prior discussions leading upto the decision to introduce the loop guard API:
http://lists.llvm.org/pipermail/llvm-dev/2019-May/132607.html

Diff Detail

Repository
rL LLVM

Event Timeline

Whitney created this revision.Jun 27 2019, 9:29 AM
Whitney updated this revision to Diff 206881.Jun 27 2019, 9:32 AM

clang-format

I'm not sure an API this general is the most relevant. I would have expected something more tightly tied to the loop itself instead of the region of code which contains the loop. Thinking about how transforms might want to use this, I suspect a much tighter definition would be useful.

Consider a transform which figures out a condition C under which loop L is a noop. Modifying the guard (as defined here) would not be valid, but a guard which controls the preheader directly w/no side effects possible inside the guarded region, but outside the loop would be exactly what was needed. Not sure what other transforms need off the top of my head, but I think that needs to be explored.

llvm/include/llvm/Analysis/LoopInfo.h
733

PostDom is not widely available in the optimizer. I strongly suspect that a narrower, but more widely usable, API would be better in practice.

738

Your definition would seem to disallow:
if (C) return;
for (...) {}

That is C wouldn't be considered a guard condition. Intentional?

llvm/lib/Analysis/LoopInfo.cpp
372

Your skipping over arbitrary throws here. Is that intended?
if (C) {

foo(); //may throw
for (...) {}

}

392

A quick version of this which doesn't need PDT would be to check if the second successor was one of the loops exit blocks.

Whitney marked 4 inline comments as done.Jun 29 2019, 4:54 AM

@reames Thanks for the review! I agree that there are still much more to consider for how we want to define a loop guard, to make the most use of it.

What's others opinion on disallow instructions with side effects in between the loop guard block and the loop header?

llvm/include/llvm/Analysis/LoopInfo.h
733

I heard that there is an intension to make PostDom more widely available, and I think that's the right thing to do. What do you think?

738

I think this is fine, in most cases, LLVM tries to common the return blocks, so if (C) will jump to the common exit block, and this algorithm will work. Although we can intensionally write LLVMIR with multiple return blocks, I don't think it will be the common case.

llvm/lib/Analysis/LoopInfo.cpp
372

This is a good point to discuss, what basic blocks are allowed between loop guard block and the loop header...If we decide to disallow basic blocks that may throw, we can use isGuaranteedToTransferExecutionToSuccessor() to check.

392

This will almost never work if the loop is in simplified form, as it needs to have dedicated exits. We could check if the second successor one of the loops exit blocks's successor, but this will only work for the most simple cases, e.g. doesn't allow any basic block between the exit block to the 'second successor'. I think instead of not using PDT, we should consider making PDT more widely available. What do you think?

This comment was removed by etiotto.
etiotto added a comment.EditedJun 29 2019, 5:54 AM

@reames Thanks for the review! I agree that there are still much more to consider for how we want to define a loop guard, to make the most use of it.

What's others opinion on disallow instructions with side effects in between the loop guard block and the loop header?

My opinion is that we should extend the proposed API to identify 'proper' loop guards in addition to weak loop guards. A proper loop guard would protect the execution of the loop and nothing else. So it would not allow any instruction with side effects between the guard and the loop header, or other weak guards.

Whitney updated this revision to Diff 207211.Jun 29 2019, 1:36 PM

Strengthen the definition of loop guard by disallowing non safe instructions guarded by the branch but not in the loop.

My opinion is that we should extend the proposed API to identify 'proper' loop guards in addition to weak loop guards.

I like your distinction between weak and proper guards. We probably need better terminology, but both are useful. For the moment, I'd suggest we start with the stronger form and add the weaker one later if needed.

llvm/include/llvm/Analysis/LoopInfo.h
733

It has "been the intention" for years. Unless someone has committed to doing the work in the next six months, I'd advice not relying on it.

738

Put two calls on the return paths to different targets and we don't merge them.

Whitney marked 2 inline comments as done.Jul 3 2019, 6:20 AM

I like your distinction between weak and proper guards. We probably need better terminology, but both are useful. For the moment, I'd suggest we start with the stronger form and add the weaker one later if needed.

For the moment, I strengthened the definition of loop guard by disallowing non safe instructions guarded by the branch but not in the loop, so there is only the stronger form. I agree that we should add the weaker form later only when needed.

llvm/include/llvm/Analysis/LoopInfo.h
733

@Meinersbur @jdoerfert @kbarton Do you know if there is anyone working / plan to work on it?

738

Do you mind giving me an example? I tried the code below, and that doesn't work.

void bar1();
void bar2();
int foo(bool cond, int *A) {
  if (cond) return 1;
  for (long i = 0; i < 100; ++i) {
    A[i] = 0;
  }
  bar1();
  bar2();
  return 0;
}

Strengthen the definition of loop guard by disallowing non safe instructions guarded by the branch but not in the loop.

OK, that works for me.

reames added inline comments.Jul 3 2019, 9:31 AM
llvm/include/llvm/Analysis/LoopInfo.h
738

void bar1();
void bar2();
int foo(bool cond, int *A) {

if (cond) { bar1(); return 1; }
for (long i = 0; i < 100; ++i) {
  A[i] = 0;
}
bar2();
return 0;

}

Whitney marked an inline comment as done.Jul 3 2019, 11:00 AM
Whitney added inline comments.
llvm/include/llvm/Analysis/LoopInfo.h
738

With the code above, I ran it through clang (noopt), and got:

  br i1 %tobool, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  call void @_Z4bar1v()
  store i32 1, i32* %retval, align 4
  br label %return

if.end:                                           ; preds = %entry
  store i64 0, i64* %i, align 8
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %if.end
  %1 = load i64, i64* %i, align 8
  %cmp = icmp slt i64 %1, 100
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %2 = load i32*, i32** %A.addr, align 8
  %3 = load i64, i64* %i, align 8
  %arrayidx = getelementptr inbounds i32, i32* %2, i64 %3
  store i32 0, i32* %arrayidx, align 4
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %4 = load i64, i64* %i, align 8
  %inc = add nsw i64 %4, 1
  store i64 %inc, i64* %i, align 8
  br label %for.cond

for.end:                                          ; preds = %for.cond
  call void @_Z4bar2v()
  store i32 0, i32* %retval, align 4
  br label %return

return:                                           ; preds = %for.end, %if.then
  %5 = load i32, i32* %retval, align 4
  ret i32 %5

Looks like the two returns are common into one return block.

Meinersbur added inline comments.Jul 8 2019, 6:14 PM
llvm/include/llvm/Analysis/LoopInfo.h
733

I don't know anyone currently working on it.

However, if you intend to go into the direction of a loop-guard normal form, why not assuming that the loop guard can only be the block branching to the loop header. Not (Post-)dominator tree needed.

736–738

Format each enumeration item on its own line?

Apart from the conditions tested here, what is your informal definition/purpose of a loop guard?
With these conditions, anything that skips the loop without executing any additional code could be a loop guard. Imagine two blocks dominating the header; this definition would make the 'closest' the loop guard. Let's say the compiler is smart and optimizes it away. Now the other is the loop guard? Also, if was the loop guard of another loop, now it is the loops guard of both?

What is an 'unsafe' instruction and why does it stop a block being a loop guard?

Do you intend to refine the conditions of a loop guards over time?

llvm/lib/Analysis/LoopInfo.cpp
400–401

What if BeginBB is (not) the header? If not, it's never added to the worklist. If it is, we are exploring the blocks inside the loop.

405–406

Early exit? Couldn't it be possible that there are unsafe instructions in the blocks still in the worklist?

415

If you want to disallow throwing instructions, how did you conclude that isGuaranteedToTransferExecutionToSuccessor is what you want and not ->mayThrow?

There might be dynamically endless loops as well that isGuaranteedToTransferExecutionToSuccessor does not check (that is, only if the endless loop was inside a call).

Whitney marked 5 inline comments as done.Jul 8 2019, 7:22 PM
Whitney added inline comments.
llvm/include/llvm/Analysis/LoopInfo.h
733

I asked in the last loop group meeting. People mentioned that with the DomTreeUpdater, making PostDom more widely available should be easier than before.

Loop-guard normal form may not be the direction we want to go, we are still trying to explore what makes the most sense.

736–738

Format each enumeration item on its own line?

Will do.

Now the other is the loop guard?

Yes, getLoopGuardBranch() returns the 'closest' loop guard. But both branches are consider loop guards to begin with, isLoopGuardBranch() returns true for both branches if they both satisfy the requirements.

Also, if was the loop guard of another loop, now it is the loops guard of both?

hmm...only if there is no unsafe instructions guarded by that branch and outside of the loop in consideration.

What is an 'unsafe' instruction and why does it stop a block being a loop guard?

For example, instructions that may throw. In general, I would like to make sure if entered the guard region, then the loop is definitely going to be executed.

Do you intend to refine the conditions of a loop guards over time?

Yes, depending on the use cases. We will like to refine the conditions to benefit more users.

llvm/lib/Analysis/LoopInfo.cpp
400–401

I should assert that BeginBB dominate Header, and EndBB post dominate Header. And right, I was only thinking of the case when there is a preheader, I should also consider when there is no preheader, which means BeginBB can be Header.

405–406

Good catch! Will fix it.

415

I chose isGuaranteedToTransferExecutionToSuccessor instead of ->mayThrow is because it consider more cases, for example it checks if memory operation trap. isGuaranteedToTransferExecutionToSuccessor is a superset of ->mayThrow.

What is dynamically endless loops?

Whitney updated this revision to Diff 208672.Jul 9 2019, 7:01 AM
Whitney marked 2 inline comments as done.

Addressed code review comments from @Meinersbur

Whitney marked an inline comment as done.Jul 9 2019, 7:03 AM
Meinersbur added inline comments.Jul 9 2019, 11:37 AM
llvm/lib/Analysis/LoopInfo.cpp
415

What is dynamically endless loops?

header:
  %cmp = icmp ule i32 0, %a ; anything that always returns true but LLVM cannot figure out
  br i1 %cmp, label %header, label %exit

exit:
Whitney marked 3 inline comments as done.Jul 11 2019, 7:46 AM
Whitney added inline comments.
llvm/include/llvm/Analysis/LoopInfo.h
738

Put two calls on the return paths to different targets and we don't merge them.

@reames Do you have another example which I should pay attention on?

llvm/lib/Analysis/LoopInfo.cpp
415

I would like to make sure I understand the concern correctly. Is the concern that the guard guards an dynamically endless loops before the loop of interested?

Meinersbur added inline comments.Jul 11 2019, 8:48 AM
llvm/lib/Analysis/LoopInfo.cpp
415

Previously, you mentioned:

In general, I would like to make sure if entered the guard region, then the loop is definitely going to be executed.

isGuaranteedToTransferExecutionToSuccessor is not sufficient for this goal:

  1. There might be such an endless loop between the guard's branch target and the loop of interest.
  1. There can be other branches between, including, but not limited to, other loop guards.
Whitney marked an inline comment as done.Jul 11 2019, 10:10 AM
Whitney added inline comments.
llvm/lib/Analysis/LoopInfo.cpp
415

Thanks for the explanation. That's true. Thinking to disallow conditional branches in the guarded region but not in the loop. Not sure if that's the best way of solving this problem. What do you think?

Meinersbur added inline comments.Jul 11 2019, 1:04 PM
llvm/lib/Analysis/LoopInfo.cpp
415

Following the idea to first provide a narrow but meaningful definition, I'd disallow these things at the start. When we have users of loop guards, we can discuss how much we could relax the definition.

Whitney updated this revision to Diff 209390.Jul 11 2019, 6:06 PM

Disallow blocks with non unique successors in the guarded region but outside of the loop of interest.

Whitney marked 3 inline comments as done.Jul 11 2019, 6:06 PM

@Meinersbur As discussed, I strengthened the definition of the loop guard which disallow conditional terminator instructions inside the guarded region outside of the loop of interest.
@Meinersbur @reames @etiotto Do you think this patch is now good enough to land as the first patch? We can loosen or strengthen the definition of the loop guard in later patches when applicable.

Meinersbur added inline comments.Jul 15 2019, 11:50 AM
llvm/lib/Analysis/LoopInfo.cpp
418–419

With this added, we don't need a DominatorTree anymore. Any guard will directly branch into the header (or a series of unconditional branches such as the preheader).

Whitney marked an inline comment as done.Jul 15 2019, 12:07 PM
Whitney added inline comments.
llvm/lib/Analysis/LoopInfo.cpp
418–419

Without DominatorTree, how can we know a given branch will directly/indirectly branch into the header? It could be a branch after the loop. Did I miss something?

Meinersbur added inline comments.Jul 15 2019, 12:15 PM
llvm/lib/Analysis/LoopInfo.cpp
418–419

By following the loop header's predecessors until there is a non-unconditional branch. With the restriction of not allowing conditional branches between guard and header, it's the only candidate for a loop guard.

Whitney updated this revision to Diff 209991.Jul 15 2019, 4:18 PM

Removed the use of Dominator Tree, and removed isLoopGuardBranch() as the loop guard definition is now the same as getLoopGuardBranch().

Whitney marked 2 inline comments as done.Jul 15 2019, 4:18 PM

I'm not sure an API this general is the most relevant. I would have expected something more tightly tied to the loop itself instead of the region of code which contains the loop. Thinking about how transforms might want to use this, I suspect a much tighter definition would be useful.

Consider a transform which figures out a condition C under which loop L is a noop. Modifying the guard (as defined here) would not be valid, but a guard which controls the preheader directly w/no side effects possible inside the guarded region, but outside the loop would be exactly what was needed. Not sure what other transforms need off the top of my head, but I think that needs to be explored.

@reames Could you expand a bit more on what you mean by "a guard which controls the preheader directly"? At first I took that to mean the guard should be the immediate dominator of the preheader, but now I'm not sure if that is what you meant.
There have been a lot of changes since this comment to check for side effects, but nothing that addresses the "controls the preheader directly".

I'm also not convinced that checking for side effects at the same time as trying to identify the guard is a good idea. To me they are very separate concepts, and combining them ends up confusing things.

I'm not sure an API this general is the most relevant. I would have expected something more tightly tied to the loop itself instead of the region of code which contains the loop. Thinking about how transforms might want to use this, I suspect a much tighter definition would be useful.

Consider a transform which figures out a condition C under which loop L is a noop. Modifying the guard (as defined here) would not be valid, but a guard which controls the preheader directly w/no side effects possible inside the guarded region, but outside the loop would be exactly what was needed. Not sure what other transforms need off the top of my head, but I think that needs to be explored.

@reames Could you expand a bit more on what you mean by "a guard which controls the preheader directly"? At first I took that to mean the guard should be the immediate dominator of the preheader, but now I'm not sure if that is what you meant.

I meant that we should focus on finding a guarding branch which directly controls entry into the loop preheader, not some arbitrary earlier branch which happens to avoid the whole section of the CFG. I believe from the discussion, that the patch has evolved in this direction.

I'm also not convinced that checking for side effects at the same time as trying to identify the guard is a good idea. To me they are very separate concepts, and combining them ends up confusing things.

I find the opposite very confusing, so we're even.

I'm about to do a run through the patch and review "fresh". I have not been following all of the discussion, so it's possible I'll miss something which has already been discussed, but that way it's approachable without having to catch up on a bunch of back and forth.

reames requested changes to this revision.Jul 17 2019, 9:59 AM

Took a fresh look at the patch. The currently posted revision still uses PostDominatorTree. As previously described, I feel doing so is a very bad idea. As I will not LGTM the patch with this present, I stopped reading there and did not bother to review further.

To be clear here, I am not *objecting* to *someone else* reviewing the patch. I'm simply stating that *I* will not sign off on this patch in the current form as it doesn't achieve an objective I find useful. (i.e. It doesn't result in something I can use or build upon in existing transforms passes.)

This revision now requires changes to proceed.Jul 17 2019, 9:59 AM
Whitney updated this revision to Diff 210383.Jul 17 2019, 11:47 AM
Whitney edited the summary of this revision. (Show Details)

@reames @Meinersbur As discussed in the loop group meeting, I changed PDT to optionally require. The functions will work for the more general form if PDT is given.

So far, we have the following guarantees of a loop guard:

  1. If branching into the direction of the loop, the loop will be executed
  2. If branching to the other direction, the loop will not be executed.
  3. If branching to the other direction, no code will be executed that would not be executed if going to the direction of the loop.
  4. Loop guard and the post-dominating 'other' block form a Single-Entry-Single-Exit region. [edit: not true; there can be edges from the outside to a loop exit block]

What purpose does checking for unsafe instructions after the loop serve? Do we also need to exclude possible infinite loops after the loop exit(s)? [branches out of the region including unreachable, ret are already ruled out be the postdominator construction].

For 2., if that's the goal, could there be an alternative formulation, such as the loop exit block dominates the "other direction block"?

Please, please, please, reduce the complexity of this patch. Start with the simplest possible implementation which handles one trivial case. Get that in with the interface, then incrementally build on top of that. You are stuck in a review cycle where there's enough different aspects of this patch that we will keep finding problems for many more iterations. You best (and possibly only) way out of that is to reduce scope until there's nothing left that anyone can possibly object to, then build incrementally off the agreed baseline!

llvm/lib/Analysis/LoopInfo.cpp
409

This code is overly general and confusingly so. Walking arbitrary successors is exploring an overly large portion of the CFG, even for the PDT case!

It's possible there's something about the way it's called which makes this not so, but a quick skim of the code doesn't make that obvious, so there's a readability problem here at least.

417

This appears to be trying to handle the case where a loop has multiple predecessors outside of a loop. DONT. Use getPredecessor and let this all simplify.

432

Loops frequently have more than one exit block, at least an early return is needed here.

Whitney marked 9 inline comments as done.Jul 17 2019, 3:11 PM
Whitney added inline comments.
llvm/lib/Analysis/LoopInfo.cpp
409

I will think about how to simplified this further at least for the first patch.

417

I am trying to handle the case where there doesn't exists a preheader. I can modified the function to only allow loop with preheader.

432

I thought nullptr is allowed in comparison. I still want to continue when ExitBlock is nullptr, as it maybe able to use PDT (if given) for multi exit blocks loop.

Whitney updated this revision to Diff 210960.Jul 20 2019, 6:08 AM
Whitney edited the summary of this revision. (Show Details)

Changed this patch to only work on the simplest form.

reames accepted this revision.Jul 22 2019, 9:41 AM

LGTM w/required changes before submit.

p.s. Your next step after this lands needs to be finding a consumer. I'm willing to approve this speculatively, but if it sits in tree unused for long, it will be deleted.

llvm/include/llvm/Analysis/LoopInfo.h
734

Before submit, remove the text about number of times the loop executes. This is confusing as you don't define which portion of the loop you mean, and the terminology needs word smithed. This can be added in a follow on patch.

llvm/lib/Analysis/LoopInfo.cpp
364

In a follow on patch, these restrictions should be converted to early returns. (i.e. there's no need to have this be a strict precondition)

387

Branches only have 2 successors, so remove that check.
And replace the dyn_cast_or_null with a dyn_cast. getTerminator can never return null on properly constructed IR.

This revision is now accepted and ready to land.Jul 22 2019, 9:41 AM
Whitney updated this revision to Diff 211183.Jul 22 2019, 1:53 PM
Whitney marked 5 inline comments as done.

Next step after this lands is modifying Loop Fusion to use the loop guard. @kbarton

llvm/lib/Analysis/LoopInfo.cpp
387

A branch instruction can have 1 successor if it is not conditional. This function is a utility function, so it can be called anywhere, which include basic block in the middle of construction, so it is possible that getTerminator return nullptr.

Whitney marked 3 inline comments as done.Jul 22 2019, 1:54 PM
reames added inline comments.Jul 22 2019, 4:21 PM
llvm/lib/Analysis/LoopInfo.cpp
387

It's idiomatic to use isConditional for the former, and no to the later. Utilities assume well constructed IR unless there is a dang good motivating example not to. Please fix.

Whitney updated this revision to Diff 211272.Jul 23 2019, 3:51 AM
Whitney marked an inline comment as done.Jul 23 2019, 3:54 AM

LGTM w/required changes before submit.

p.s. Your next step after this lands needs to be finding a consumer. I'm willing to approve this speculatively, but if it sits in tree unused for long, it will be deleted.

As @Whitney mentioned below, I'm working on a patch for LoopFusion to use these interfaces. I will hopefully have it posted by the end of this week.

This revision was automatically updated to reflect the committed changes.