This is an archive of the discontinued LLVM Phabricator instance.

[LV] Try to sink users recursively for first-order recurrences.
ClosedPublic

Authored by fhahn on Jul 30 2020, 9:20 AM.

Details

Summary

Update isFirstOrderRecurrence to explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.

Fixes PR44286 (and another PR or two).

Diff Detail

Event Timeline

fhahn created this revision.Jul 30 2020, 9:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2020, 9:20 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
fhahn requested review of this revision.Jul 30 2020, 9:20 AM

https://godbolt.org/z/nXCxW8

This should help to vectorize this code as well, or?

fhahn added a comment.Aug 15 2020, 8:14 AM

https://godbolt.org/z/nXCxW8

This should help to vectorize this code as well, or?

I think the unidentified PHI node in the example above is a pointer induction, not a first order recurrence? Looks like the actual problem is that LV only sees the inner loop fully unrolled and the SLP vectorizer does not vectorize the body of the unrolled loop. If built with -fno-unroll-loops, LV will vectorize the inner loop, but it would probably be better if the SLP vectorizer can just vectorize the unrolled body.

bmahjour added inline comments.Aug 24 2020, 7:06 AM
llvm/lib/Analysis/IVDescriptors.cpp
745–746

update comment

Does this patch improve some public benchmarks?

fhahn updated this revision to Diff 293552.Sep 22 2020, 12:33 PM
fhahn marked an inline comment as done.

Rebased, updated comment, added additional tests.

Does this patch improve some public benchmarks?

It does not really impact the number of vectorized loops in SPEC2000/SPEC2006/MultiSource. But I think it should have a strictly positive impact (module exposing some existing cost model limitations due to vectorizing more complex loops).

Any futher comments, @bmahjour?

Any futher comments, @bmahjour?

Nope, I'm fine with the changes.

Any futher comments, @bmahjour?

Nope, I'm fine with the changes.

So probably ok to land this now..

Any futher comments, @bmahjour?

Nope, I'm fine with the changes.

Can you accept it formally?

Sorry, I meant to do a more thorough review before approving. I had a closer look today and found some nits and a couple of questions. Please see inline comments.

llvm/lib/Analysis/IVDescriptors.cpp
745–746

nit: this comment is not describing the lambda that follows it.

808

We need to make sure we don't visit the same user (from the UserWorkList) multiple times. For example if we have a hypothetical scenario where:

%0 = phi [..., %3], [...]
%1 = phi [...], [Previous] ; The "Phi" that seeds the algorithm.
%2 = add %1, %0
%3 = add %2, 1
<Previous>
...

... but I guess that's handled because worklist is actually a set-like container so if we try to add a user that already exists in the worklist, the insertion won't take place and the size of the worklist is unaffected. Could you please put a comment about the choice of the container and how the algorithm guarantees termination?

810–815

users of Previous -> users of Phi

llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
414

The comment implies that all phi nodes prevent sinking, but I think here the reason %first_time.1 cannot be sunk past %for.next is because (one of) the user(s) of %first_time.1 is the Previous itself. If that's correct, can we say "some phi nodes"?

421

remove or update the comment after ; (here and on lines 300, 303 and 306)

Reping. Rebase?

fhahn updated this revision to Diff 344567.May 11 2021, 2:47 PM
fhahn marked 2 inline comments as done.

Rebased and addressed comments, thanks!

A blast from the past :)

Ayal added a comment.May 12 2021, 5:34 AM

How about making SinkAfter a MapVector instead of DenseMap (to have its iteration order match insertion order) and insert interdependent sink candidates in the desired order?
How about traversing all insns from phi to previous, in order, checking if any is using the phi or a prior insn that was sunk? E.g., by maintaining the set of their users.
This might be clearer, and avoid the sorting.

fhahn updated this revision to Diff 345544.May 14 2021, 1:31 PM

Rebased, update to also avoid sinking instructions that may read from memory, as we cannot move them over ones with side-effects, move sort to isFirstOrderRecurrence.

How about making SinkAfter a MapVector instead of DenseMap (to have its iteration order match insertion order) and insert interdependent sink candidates in the desired order?
How about traversing all insns from phi to previous, in order, checking if any is using the phi or a prior insn that was sunk? E.g., by maintaining the set of their users.
This might be clearer, and avoid the sorting.

I'm not sure if it is possible to maintain the order by dominance in the map as we go along, because we an instruction could be used by multiple others and we may visit them in reverse dominance order (relative to each other), so we would miss the fact we need to 'fix' the order of the first instruction. I think it is also difficult & expensive to change the order of multiple entries in the map vector.

I added a few more interesting scenarios as tests in 68d52f0dbe2e and c62f984814c4.

I *think* however it should be enough to sort only sort the instructions to sink per FOR phi, as we currently allow sinking for each instruction only once. That should certainly be better, as we have less to sort overall and everything is contained in IVDescriptor.cpp. The new code already maintains a vector of instructions to sink, so the sort fits in quite nicely. WDYT?

fhahn edited the summary of this revision. (Show Details)May 14 2021, 3:29 PM
Ayal added a comment.May 23 2021, 12:49 PM

Rebased, update to also avoid sinking instructions that may read from memory, as we cannot move them over ones with side-effects, move sort to isFirstOrderRecurrence.

How about making SinkAfter a MapVector instead of DenseMap (to have its iteration order match insertion order) and insert interdependent sink candidates in the desired order?
How about traversing all insns from phi to previous, in order, checking if any is using the phi or a prior insn that was sunk? E.g., by maintaining the set of their users.
This might be clearer, and avoid the sorting.

I'm not sure if it is possible to maintain the order by dominance in the map as we go along, because we an instruction could be used by multiple others and we may visit them in reverse dominance order (relative to each other), so we would miss the fact we need to 'fix' the order of the first instruction. I think it is also difficult & expensive to change the order of multiple entries in the map vector.

I added a few more interesting scenarios as tests in 68d52f0dbe2e and c62f984814c4.

I *think* however it should be enough to sort only sort the instructions to sink per FOR phi, as we currently allow sinking for each instruction only once. That should certainly be better, as we have less to sort overall and everything is contained in IVDescriptor.cpp. The new code already maintains a vector of instructions to sink, so the sort fits in quite nicely. WDYT?

The instructions to sink reside inside the header block in the desired order. It should be possible to traverse them, once, from the FOR phi to the terminator or Previous, w/o sorting nor changing order, by maintaining the set of phi's (direct and indirect) users. When encountering such a user during the traversal, check if it is sunkable, append it to be moved after the last sunken user (or after Previous if this is the first user being sunk), and add its users to the set - those that belong to the header block, others need to be dominated by (and distinct from) Previous.
Perhaps this would also help support sinking memory instructions in the future.
Sounds reasonable?

fhahn added a comment.May 24 2021, 9:43 AM

The instructions to sink reside inside the header block in the desired order. It should be possible to traverse them, once, from the FOR phi to the terminator or Previous, w/o sorting nor changing order, by maintaining the set of phi's (direct and indirect) users. When encountering such a user during the traversal, check if it is sunkable, append it to be moved after the last sunken user (or after Previous if this is the first user being sunk), and add its users to the set - those that belong to the header block, others need to be dominated by (and distinct from) Previous.
Perhaps this would also help support sinking memory instructions in the future.
Sounds reasonable?

IIUC your suggestion would require to iterate over the instructions in the header block compared to just traversing the def-use chains from the PHI we are analyzing as we do at the moment? That would work, as we would visit the candidates in the correct order. But requiring to potentially iterate over all instructions in the header block might be worse in practice than sorting the (probably few) instructions that require sinking?

It will be indeed required once we want to sink memory operations, but until then it seems to me that just traversing the def-use chains + sorting would probably be less work in general for the cases we currently support (also given the interface that requires us to analyze each FOR separately). Please let me know if I understood correctly. Happy to update the patch as suggested if you think it's beneficial on its own.

Ayal added a comment.May 25 2021, 11:46 PM

The instructions to sink reside inside the header block in the desired order. It should be possible to traverse them, once, from the FOR phi to the terminator or Previous, w/o sorting nor changing order, by maintaining the set of phi's (direct and indirect) users. When encountering such a user during the traversal, check if it is sunkable, append it to be moved after the last sunken user (or after Previous if this is the first user being sunk), and add its users to the set - those that belong to the header block, others need to be dominated by (and distinct from) Previous.
Perhaps this would also help support sinking memory instructions in the future.
Sounds reasonable?

IIUC your suggestion would require to iterate over the instructions in the header block compared to just traversing the def-use chains from the PHI we are analyzing as we do at the moment? That would work, as we would visit the candidates in the correct order. But requiring to potentially iterate over all instructions in the header block might be worse in practice than sorting the (probably few) instructions that require sinking?

It will be indeed required once we want to sink memory operations, but until then it seems to me that just traversing the def-use chains + sorting would probably be less work in general for the cases we currently support (also given the interface that requires us to analyze each FOR separately). Please let me know if I understood correctly. Happy to update the patch as suggested if you think it's beneficial on its own.

Happy to go with your sorting InstrsToSink approach, added some comments to it below.

Right, this order-preserving alternative requires iterating over the instructions in the header block from the FOR phi until its last header-block-user is encountered. Agree that this would probably take longer than sorting InstrsToSink, e.g., if there's a user close to the end of the header. Mainly pointed out as an alternative, towards supporting sinking memory instructions (supporting multiple FOR phi's would be possible in a single scan, but seems an overkill), and as a response to

I'm not sure if it is possible to maintain the order by dominance in the map as we go along...

;-)

llvm/lib/Analysis/IVDescriptors.cpp
747

The candidates need to be maintained in a set in order to avoid considering them repeatedly. May as well have this set ordered by dominance relation, instead of sorting it at the end.
E.g., let UserWorkList be a SmallVector that is pushed and popped until empty, where candidates are checked to be sunkable when being pushed into it (instead of when being popped/traversed), after successfully inserting them into the sorted InstrsToSink set. WDYT?

783

This sounds right, given that we only consider I's that post-dominate the FOR phi.
Seems somewhat odd though, calls for an assert and a testcase.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9161

Sort multiple users in reverse instead of reversing all here?
Or rather - have them sink after each other instead of all sinking after Previous, thereby expressing their desired order and dependencies directly?

llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
414

The (first) reason %first_time.1 cannot be sunk is because it appears outside the header and is not dominated by Previous. The fact that it feeds Previous is a second sinking-preventing reason.

Ayal added inline comments.May 26 2021, 8:06 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9161

(BTW, note that by having multiple users sink after each other, it is also possible to reorder them lazily instead of sorting: when moving Sink after Target, check if Target should also be sunk - and if so sink it first, recursively, while taking care to sink each SinkAfter instruction once. Multiple users is the only exception to sinking an instruction after a Target that should itself sink.)

fhahn updated this revision to Diff 348063.May 26 2021, 1:07 PM

Address latest comments: Use ordered set for instructions to sink, entries in SinkAfter are chained together now as well, instead of all sinking after the same 'previous' instruction. I hope I did not miss anything.

fhahn updated this revision to Diff 348068.May 26 2021, 1:25 PM

Adjust worklist logic.

fhahn added inline comments.May 26 2021, 1:28 PM
llvm/lib/Analysis/IVDescriptors.cpp
747

I updated the code to use a ordered set for InstrToSink as suggested. I think the only thing is that we might check the same instruction multiple times, if it is already dominated by Previous. In that case, it won't get inserted into InstrToSink. But I think that should be fine.

783

I think there's already a test case where this can happen: @sink_into_replication_region. I added an assert that the instruction must be in the same BB as the starting phi.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9161

I went with Or rather - have them sink after each other instead of all sinking after Previous, thereby expressing their desired order and dependencies directly

The dependencies are chained together in SinkAfter now. But keeping them sorted to start with seems simpler overall and ensures that map is accurate. WDYT?

llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll
414

Good point, I adjusted the comment.

Ayal added inline comments.May 26 2021, 2:38 PM
llvm/lib/Analysis/IVDescriptors.cpp
808

Perhaps the logic could be simplified somewhat, compressed to see it all together here:

WorkList = { Phi }
while WorkList not empty {
  Insn = pop from WorkList
  for (User *User : Insn->users()) {
    // Cyclic dependence
    if (Previous == User)
      return false;
    // No need to sink User
    if (DT->dominates(Previous, User) ||
        isa<PHINode>(User))
      continue;
    // Cannot sink User
    if ((User->getParent() != PhiBB) ||
        User->mayHaveSideEffects() ||
        User->mayReadFromMemory() ||
        (User->getParent()->getTerminator() == User) ||
        (SinkAfter.find(User) != SinkAfter.end()))
      return false;
    // Sink User tentatively and check its users
    if InstsToSink.insert(User)
      append User to Worklist
  }
}
fhahn updated this revision to Diff 348100.May 26 2021, 2:47 PM

Simplify/compact code as suggested.

fhahn added inline comments.May 26 2021, 2:48 PM
llvm/lib/Analysis/IVDescriptors.cpp
808

Thanks! I adjusted/compacted the code as suggested, but with keeping the check if the instruction has been sunk already and the assertion is UserI is a PHINode.

Ayal added inline comments.May 27 2021, 3:07 AM
llvm/lib/Analysis/IVDescriptors.cpp
747

Agreed. We could cache/memoize a set of confirmed users in addition to InstrToSink, but their re-confirmation should be quick, and reaching users multiple times is probably infrequent.

756

Assert they have a common parent (as done by comesBefore())?
If sorting should work (somehow) across basic blocks, better enumerate them somehow, otherwise both A < B and A > B may both hold.

766

Indeed, better check this first, than check last if insertion fails!

781

The isa<PHInode> case belongs here, with the "we're fine with UserI's current position" case.
(The property we're really after here is conditional dominance: Previous dominates UserI conditional on having reached FOR Phi)

783

Agreed. (Nice test, where an add accumulating the FOR phi is sunk after a previous udiv, and the fact that the add feeds its reduction header phi is fine.)

787

// We cannot [sink] user.

790

While we're here, simply check !UserI->isTerminator()?
Worth retaining the uninformative comment?

791

Would be good to retain the TODO comment about supporting sinking a user after multiple (deepest) previouses.

795

Relevant if isa<PHINode> is moved above.
If it stays here, it's redundant, as we bailed-out earlier if UserI is not in PhiBB.

815

Suffice to provide each User of Current to TryToPushSinkableUser[s](), i.e., how about doing

for (User *User : Current->users())
  if (!TryToPushSinkableUser(cast<Instruction>(User)))
    return false;

to simplify the lambda?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9161

Agreed. The map itself is accurate whether sorted or not in this case, but lazy reordering may potentially enter an infinite loop (if buggy), and requires marking sunk instructions. Given that a set is needed to ensure uniqueness, it may as well provide the desired order, as done above.

fhahn updated this revision to Diff 348226.May 27 2021, 5:16 AM
fhahn marked 11 inline comments as done.

Address latest comments, thanks!

llvm/lib/Analysis/IVDescriptors.cpp
756

This is needed so we conveniently check if arbitrary instructions are in the set, but those instructions should never be inserted. But I can change it to only query the set for instructions in the same BB. (comesBefore already asserts that they are in the same BB)

781

At this point we don't know if the PHI is in the same block or not. (checked only below). We may incorrectly skip a PHI node in a different BB, which is not dominated by Previous (tested in cannot_sink_phi in first-order-recurrence-complex.ll). I left the check at it's original place, so it only takes place after we ensure that the parents match.

787

removed the whole comment.

790

Updated & comment removed.

791

Sounds good. I split off the SinkAfter check again, so it's clear which single check the comment refers to.

795

As mentioned above, I think the check needs to happen after we checked that they are in the same BB. I added the assert to catch inconsistencies if the code is moved, but it's not too helpful. I removed it.

Ayal accepted this revision.May 27 2021, 9:23 AM

This looks good to me, thanks!
Title might emphasize "[LV] Try to sink multiple users ..."
Noting another couple of TODO thoughts.

llvm/lib/Analysis/IVDescriptors.cpp
756

Ah, ok.
Another potential TODO is to sink users in non-header basic blocks that post-dominate the header, which may extend CompareByComesBefore to distinct blocks using their domination.

781

Ah, right; cannot_sink_phi may be one case where Previous needs to be hoisted before a user, instead of vice versa, assuming no other reorder-preventing reasons exist...

This revision is now accepted and ready to land.May 27 2021, 9:23 AM
This revision was landed with ongoing or failed builds.May 31 2021, 11:56 AM
This revision was automatically updated to reflect the committed changes.