This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Introduce a finer granularity option to compute early exits.
Needs RevisionPublic

Authored by trentxintong on May 1 2017, 4:03 PM.

Details

Summary

Instead of keeping a variable indicating whether there are early exits
in the loop. We keep all the early exits. This improves LICM's ability to
move instructions out of the loop based on is-guaranteed-to-execute.

The reason i do not want to compute early exits in LIR and LoopUnswitch is that its
probably not as beneficial as in LICM and its a pain to update the SafetyInfo
in case of LoopUnswitch, i.e. instructions can get deleted.

Event Timeline

trentxintong created this revision.May 1 2017, 4:03 PM

Fix comments

dberlin added inline comments.May 1 2017, 4:23 PM
include/llvm/Transforms/Utils/LoopUtils.h
60

Would it trouble you too much to move this OBBMAP into Transforms/Utils as a class called "OrderedInstructions" or something.

(IE provides a dominates call, handles calling dt->dominates if they are in different bbs, or checking ordering if they are in the same bb?)

We should encapsulate this.

I'm happy to clean up the existing users if you are willing to encapsulate it :)

trentxintong added inline comments.May 1 2017, 4:26 PM
include/llvm/Transforms/Utils/LoopUtils.h
60

I am thinking about that as well. I will do it

Add a dominanance check interface that uses caching for instructions within same basic block.

I do not particularly like the name. I will see whether i can come up with a better one later.

Address comments

efriedma added inline comments.Jun 13 2017, 2:01 PM
include/llvm/Transforms/Utils/OrderedInstructions.h
40

Not sure I like this constructor... easy to misuse.

lib/Transforms/Utils/LoopUtils.cpp
1085

See the FIXME below... but I guess there's another way to do this check: if Inst dominates the backedge, then it's guaranteed to execute even if the loop is infinite. A bit more general than checking that the parent is exactly the header.

efriedma added inline comments.Jun 13 2017, 2:48 PM
lib/Transforms/Utils/LoopUtils.cpp
1085

Err, wait, no, sorry, the check would be that Inst dominates the backedge, and there isn't any infinite loop between the start of the loop and the backedge. I guess just checking whether Inst is in the header is good enough for now.

Minor comment and code modifications. NFC.

Edit comments

Remove the default constructor for OrderedInstructions.

Update more comments

efriedma added inline comments.Jun 19 2017, 2:01 PM
include/llvm/Transforms/Utils/LoopUtils.h
58

Use AssertingVH here?

test/Transforms/LICM/loop-early-exits.ll
83

Could you add a testcase with a udiv after a nothrow call, to make it clear that an early exit doesn't necessarily involve unwinding?

Address comments

Fix a bug in how early exits are updated. Apparently, LICM could simplify or delete early exits,
e.g. some calls that are simplifiable, but we treat as early exit.

Build the deleteExit interface for the early exit to be updated. EarlyExits are now kept in a DenseSet
which means iterating over is not deterministic. But this is ok as we need to make sure the interested
instruction dominates ALL.

The test case which the bug is uncovered is Transforms/LICM/pr32129.ll.

efriedma edited edge metadata.Jun 20 2017, 2:06 PM

AssertingVH catching a bug. Cool. :)

Some places you're adding calls to invalidateBlock; other places you're calling deleteExit. Why do we need to call one, but not the other, in LICM?

trentxintong updated this revision to Diff 103608.EditedJun 22 2017, 11:35 AM

Address comments by reworking when SafetyInfo->deleteExit and
SafetyInfo->OrderInstruction.invalidateBlock are called.

The general rule to update OrderedInstruction and EarlyExits in SafetyInfo is as
follow: Before we delete an instruction in the loop, we invalidate the OrderedInstruction
for the block it is in, as OrderedInstruction may still hold pointer to the instruction.
In addition, we check whether it is an early exit. If it is, we remove it from the
early exit list.

We can delete instructions in the loop for various reasons in LICM including : instruction
is simplified and removed, instruction is sunk out of the loop and instruction is
hoisted out of the loop.

Minor cosmetic change

Minor assert fix

Minor change in OrderedInstruction invalidation.

Hmm ... last version did not compile. This should be good now.

To reiterate:

The general rule to update OrderedInstruction and EarlyExits in SafetyInfo is as
follow: Before we delete an instruction in the loop, we invalidate the OrderedInstruction
for the block it is in, as OrderedInstruction may still hold pointer to the instruction.

In case we move an instruction, i.e. hoisting, we invalidate the block it is moved to as well.

Additionally, we check whether it is an early exit. If it is, we remove it from the
early exit list.

We can delete instructions in the loop for various reasons in LICM including : instruction
is simplified and removed, instruction is sunk out of the loop and instruction is
hoisted out of the loop.

@efriedma can you please finish the review when you have time ? Thanks !

efriedma accepted this revision.Jun 28 2017, 12:16 PM

LGTM.

As a followup to this, could you look into make OrderedInstructions and OrderedBasicBlock use AssertingVH, to try and catch mistakes?

This revision is now accepted and ready to land.Jun 28 2017, 12:16 PM
efriedma requested changes to this revision.Jun 28 2017, 12:21 PM
efriedma added inline comments.
lib/Transforms/Utils/LoopUtils.cpp
1052

Actually, one more issue we probably need to address.

This is potentially linear in the number of calls in the loop, since we don't try to prune EarlyExits at all. This makes LICM potentially O(n^2) overall. Can we store this information in some more efficient way?

This revision now requires changes to proceed.Jun 28 2017, 12:21 PM
trentxintong added inline comments.Jun 28 2017, 1:42 PM
lib/Transforms/Utils/LoopUtils.cpp
1052

We could certainly prune the early exits by discovering them walking the dominator tree. If an instruction dominates an early exit (Inst A), it certainly dominates all the other early exits dominated by Inst A. So we do not need to include those early exits in the list.

The drawback with this is that if an early exit is deleted, which is possible but unlikely. We need to invalidate the early exit list (we could try to recompute everything, but probably not worth it).

By doing this, we potentially save a lot of OI.dominates checks. Worst case is still O(n^2) in case of a loop with many basic blocks that do not dominate each other and each one of them have early exits ... but this happens infrequently in practice IMO.

Thinking about it a bit more, instead of storing a list of instructions which can exit the loop early, you could store a list of basic blocks which are guaranteed to execute, and a mapping from those basic blocks to the first instruction in each basic block which could exit early. That would allow "constant-time" lookups, at the expense of a bit more computation upfront.

I agree it doesn't make sense to try to recompute if an early exit is erased; that should be very rare. (Granted, it might be a little bit more common than it should be: it looks like isGuaranteedToTransferExecutionToSuccessor is missing a few important cases which isInstructionTriviallyDead catches.)

Thinking about it a bit more, instead of storing a list of instructions which can exit the loop early, you could store a list of basic blocks which are guaranteed to execute, and a mapping from those basic blocks to the first instruction in each basic block which could exit early. That would allow "constant-time" lookups, at the expense of a bit more computation upfront.

I agree it doesn't make sense to try to recompute if an early exit is erased; that should be very rare. (Granted, it might be a little bit more common than it should be: it looks like isGuaranteedToTransferExecutionToSuccessor is missing a few important cases which isInstructionTriviallyDead catches.)

I think thats a good idea. We need to walk the DT and keep pushing the blocks that are guaranteed to execute into a list. When we hit an early exit instruction, we do not need to process its children on the DT, and we keep a pointer to the instruction.

This way, we only need to lookup this DenseSet of basic blocks and possibly do some fiddling with basic blocks that are partially guaranteed to execute.

dberlin added inline comments.Jun 29 2017, 7:29 PM
lib/Transforms/Utils/LoopUtils.cpp
1052

Unless i'm misunderstanding, and maybe i am (!), I believe this can be made optimal:

Here's the normal way:

The ordering is completely described by the DFS in/out numbers of the dom tree + local numbers.

It should completely suffice to, instead of just ordering the in/out of the DT nodes, model in/out of the entire block.

IE

Count = 0
number(root of DT)

number(DTNode):
   In = Count
   for each DT child:
     number(child)
    for each instruction in this block
      InstNum[I] = ++Count
   Out = Count

(or something similar)

The early exits you dominate is, in the same block, any early exit with a number greater than yours, and for other blocks, any block with an in > your in and out < your out.

This is trivially stored in a sorted smallvector.

You dominate all the early exits only if using the above comparator, lower bound says you would appear first in the smallvector.

I believe you can get away with a lot less, but not positive ATM:

By definition, anything that dominates all the early exits must be in a block that has a DT level number <= than all of them.

If all the exits are terminators, than you don't need the local numbering or DFS numbering at all, i think (I rely on Eli to point out any flaw in my reasoning, since i'm dashing this off pretty quickly).

You just need the entries with the lowest DT level (in case they are siblings in the dominator tree)

Then it would just be:

if DT level number < lowest DT level number of an exit, you dominate all exits
if DT level number == lowest DT level number then:
   if vector.size() == 1 && exit block == instruction block 
    you dominate all exits
   else
    you do not (because there must be a sibling you don't dominate)

I think one or the other of the above should work.

The DT level numbers are now available in the DomTree :)

dberlin edited edge metadata.Jun 29 2017, 7:34 PM

Actually, that won't entirely work without the DFS numbers.
You can shortcut the false cases using the level numbers (IE it's not possible for Inst to have level number 5, an early exit have level number 3, and you dominate it), but you still need to know what part of the dom tree you are in.

So you still have to store and stabbing query the dom tree node DFS intervals.

Thanks for the suggestions and review so far. I am in middle of a project. I will put this on hold for a couple of weeks and will come back to it.

sanjoy resigned from this revision.Jan 29 2022, 5:44 PM

FYI, all the tests in this patch are already handled with the current licm. Do we still need this patch?

$ opt -S -licm loop-early-exits.ll

; ModuleID = 'licm.ll'
source_filename = "licm.ll"

declare void @use(i64)

declare void @use_nothing()

; Function Attrs: nounwind
declare void @call_nothrow() #0

define void @throw_header1(i64 %x, i64 %y, i1* %cond) {
entry:
  %div = udiv i64 %x, %y
  br label %loop

loop:                                             ; preds = %loop, %entry
  call void @use(i64 %div)
  br label %loop
}

define void @throw_header2(i64 %x, i64 %y, i1* %cond) {
entry:
  br label %loop

loop:                                             ; preds = %loop, %entry
  call void @use_nothing()
  %div = udiv i64 %x, %y
  call void @use(i64 %div)
  br label %loop
}

define void @throw_body1(i64 %x, i64 %y, i1* %cond) {
entry:
  %div = udiv i64 %x, %y
  br label %loop

loop:                                             ; preds = %body, %entry
  br label %body

body:                                             ; preds = %loop
  call void @use(i64 %div)
  br i1 false, label %loop, label %exit

exit:                                             ; preds = %body
  ret void
}

define void @throw_body2(i64 %x, i64 %y, i1* %cond) {
entry:
  br label %loop

loop:                                             ; preds = %body, %entry
  br label %body

body:                                             ; preds = %loop
  call void @use_nothing()
  %div = udiv i64 %x, %y
  call void @use(i64 %div)
  br i1 false, label %loop, label %exit

exit:                                             ; preds = %body
  ret void
}

define void @throw_body3(i64 %x, i64 %y, i1* %cond) {
entry:
  br label %loop

loop:                                             ; preds = %body, %entry
  br label %body

body:                                             ; preds = %loop
  call void @call_nothrow()
  %div = udiv i64 %x, %y
  call void @use(i64 %div)
  br i1 false, label %loop, label %exit

exit:                                             ; preds = %body
  ret void
}

attributes #0 = { nounwind }
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 8:11 PM