This is an archive of the discontinued LLVM Phabricator instance.

[LV][LoopInfo] Add irreducible CFG detection for outer loops
ClosedPublic

Authored by dcaballe on Dec 5 2017, 4:40 PM.

Details

Summary

Context

We are planning to introduce a series of patches that aim to extend LV for *explicit* outer loop vectorization support, using the VPlan infrastructure ---- starting from trivial outer loops and then build up functionality to deal with more complex outer loops. This can be further extended to perform outer loop auto-vectorization by additional work on legality and cost modeling, but for the sake of simplicity, that aspect is outside of the scope of this patch series. Further context in the RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html

This patch is the first installment of Patch Series #1. Patch Series #1 focuses on bringing to LV the initial functionality to detect, construct VPlans and generate code out of them for a well limited set of trivial outer loops that have been annotated to be explicitly vectorized. In particular, Patch Series #1 will add support for explicit vectorization of trivial outer loops where:

  1. inner loops are uniform, i.e., they are trivially lock-step among all vector elements and,
  2. there are no branches inside the outer loop body other than a single loop latch and a zero-trip-test per inner loop.

The following loop is an example of a supported outer loop:

#pragma clang loop vectorize(enable) vectorize_width(4)
for (i=0;i<N;i++){
  // no branches here
  for (j=0;j<loop_invariant_value; j++){
    // no branches here
  }
  // no branches here
}

In order to achieve these goals, Patch Series #1 will introduce the following functionality:

  1. Restrictive detection of supported loop nests, including inner loop uniformity check analysis.
  2. Initial VPlan construction for outer loops.
  3. VPlan-based vector code generation for outer loops.

Patch Series #1. Sub-patch #1

This patch adds support for detecting outer loops with irreducible control flow in LV. Current detection uses SCCs and only works for innermost loops. This patch adds a utility function that works on any CFG, given its RPO traversal and its LoopInfoBase. This function is a generalization of ‘isIrreducibleCFG’ from lib/CodeGen/ShrinkWrap.cpp. The code in lib/CodeGen/ShrinkWrap.cpp is also updated to use the new generic utility function.

I would appreciate your feedback.

Thanks,
Diego

Diff Detail

Repository
rL LLVM

Event Timeline

dcaballe created this revision.Dec 5 2017, 4:40 PM
dcaballe updated this revision to Diff 125657.Dec 5 2017, 5:21 PM
dcaballe edited the summary of this revision. (Show Details)

Adding full context to the patch (sorry).

fhahn added a subscriber: fhahn.Dec 6 2017, 2:08 AM
egarcia added a subscriber: egarcia.Dec 6 2017, 9:56 AM

Ping :)

If you are OK with the proposed algorithm, I would appreciate feedback on where to refactor the 'isIrreducibleCFG' utility function for Loop and MachineLoop.
As I mentioned, LoopBase could be a good candidate. This utility function would be similar to isRecursivelyLCSSAForm implemented in Loop class.
Does it sound reasonable?

Thanks!
Diego

fhahn added a reviewer: fhahn.Dec 22 2017, 2:49 AM

If you are OK with the proposed algorithm, I would appreciate feedback on where to refactor the 'isIrreducibleCFG' utility function for Loop and MachineLoop.
As I mentioned, LoopBase could be a good candidate. This utility function would be similar to isRecursivelyLCSSAForm implemented in Loop class.
Does it sound reasonable?

I think moving this to LoopBase makes sense, if there are multiple users for this function, rather than duplicating it. For the naming, I would call it something like containsIrreducibleCFG, to make it clear that it checks if the loop body contains irreducible CFG (also make that clear in the comment), rather than the loop itself is irreducible.

Thanks for the feedback, Florian, and sorry for the delay. I'm about to post a new version.

Diego.

dcaballe updated this revision to Diff 129171.Jan 9 2018, 3:23 PM

The refactoring of a generic utility function to detect irreducible CFG required a little more work because the other user, ShrinkWrap (D41886), needed this functionality at Function level and not at Loop level as we need it in LV. This new patch adds such a utility function which works on any CFG given its RPO traversal and its LoopInfoBase. This function is a generalization of ‘isIrreducibleCFG’ in lib/CodeGen/ShrinkWrap.cpp. It should be generic enough to deal with any CFG in LLVM that is supported by LoopInfoBase. RPO traversal is a function parameter which would allow to reuse RPO traversals that were previously computed for other purposes.

If you are OK with this approach, there are two questions that should be discussed as part of the review:

  1. Location of this utility function: I placed this function in include/llvm/Analysis/CFG.h. I would be happy to move it somewhere else if you think it’s more appropriate.
  2. Removing or preserving existing SCC-based ‘hasCyclesInLoopBody’ function for inner loop cases: Since the new generic utility function works for inner and outer loops, I think that ‘hasCyclesInLoopBody’ is no longer necessary. I think that computing and traversing the SCCs of an inner loop body without back-edges (1 SCC per BasicBlock) is, at least, as expensive in practice as computing and traversing the inner loop body in RPO. I measured the compile time of both approaches in a couple of benchmarks suites and I couldn’t observe any performance difference.

D41886 updates ShrinkWrap to use this generic utility function.

Please, let me know what you think.
Thanks,
Diego.

dcaballe updated this revision to Diff 129972.Jan 16 2018, 9:10 AM

Merging D41886 with this patch, as requested in D41886.

Ping :)
Patch #2 was also posted: D42447

fhahn accepted this revision.Feb 17 2018, 5:31 PM

Thanks Diego, LGTM. Sorry for the long delay! Please wait with committing a few days, to give other people a little bit more time to voice additional concerns.

lib/Transforms/Vectorize/LoopVectorize.cpp
59 ↗(On Diff #129972)

Can this include be removed now?

This revision is now accepted and ready to land.Feb 17 2018, 5:31 PM
fhahn added a comment.Feb 17 2018, 5:35 PM

Please update the commit message with the updates from your comment :)

dcaballe updated this revision to Diff 135094.Feb 20 2018, 10:27 AM
dcaballe edited the summary of this revision. (Show Details)

Thanks for reviewing this patch, Florian.

  • Rebasing patch to ToT and addressing the SCCIterator.h comment (good catch!)
  • Updating the message in "Sub-patch #1" section so that it can be used as commit message.

Please, Florian, feel free to commit this patch whenever you think it's appropriate. I still don't have commit access.

Thanks,
Diego

Is this NFC?
If not, would it be possible to have some test coverage for these changes?

Yes, this is NFC. Since this is kind of a refactoring, there are already tests for the 'containsIrreducibleCFG' utility function in ShrinkWarp and LV passes.
I'm introducing more tests targeting outer loops in the subsequent patch (D42447), which starts to enable the outer loop functionality in LV.
At this point, LV skips all the outer loops so I cannot do any testing in that regard.

Hey, can you help me understand the algorithm?
I don't have a good feel for the invariants you expect going in.
It just says "it works on anything given loop info".
IE What info you expect the loopinfo to provide vs the topo ordering.

There are lots of ways to test for reducibility, and it's unclear which you are using here (and why that should be used if the loop info isn't around, for example).

To make it simple, let's take the canonical irreducible graph

A->B
A->C
B->C
C->B
C->D

     A
 /        \
B<- -> C
             |
             D

What do you expect to happen here? What do you expect the loopinfo to provide vs the RPO.

The algorithms i'm familiar with to detect irreducibility don't need the loop info:

  1. T1/T2
  2. Tarjan's algorithm, which is basically (this is borrowed from eric stolz)
test_dom(a,b) returns TRUE if a dominates b
push( v ) pushes v onto a reverse topologically-sorted stack


Invoke: top_sort( entry node )


top_sort( node v ) {
      mark_visited( v );
      Visit all successors s of v {
              if( mark_visited( s ) && !pushed( s ) && !test_dom( s, v ) ) {
                          Irreducible_Graph = TRUE;
                          Exit -- no need to continue now!
              }
              if( !mark_visited( s ) )
                          top_sort( s );
      }
      push( v );
}
  1. (another variant of tarjan's) Build a DFST, remove the back edges, see if the result is acylic.

It's true you can use loop info, but given the dom test is O(1) i'm not sure why it's worth computing to test irreducibility, especially if you are doing an RPO computation anyway.
Can you help me understand the algorithm being used here and why?

Hi Daniel,

thanks for the detailed response!

The main goal of this patch is to extend the current irreducibility detection in LV, based on SCCs explicitly built for that purpose, to work on outer loops. We found the algorithm in ShrinkWrap and decided to generalize it to promote and reuse existing code in LLVM instead of re-implementing a LV-specific version.

can you help me understand the algorithm?
IE What info you expect the loopinfo to provide vs the topo ordering.

LoopInfo analysis is “indirectly” computing CFG irreducibility by detecting irreducible loops and leaving them out of the loop forest. In the proposed algorithm, LoopInfo provides information about (already computed) reducible loop backedges. When a backedge is found during the CFG traversal, the algorithm checks that the backedge is one of the reducible backedges in LoopInfo. Otherwise, the CFG is irreducible. This logic is in function 'isProperBackedge'.

What do you expect to happen here? What do you expect the loopinfo to provide vs the RPO.

For this example, LoopInfo wouln't contain any loop, so the algorithm will return that the CFG is irreducible when checking the B <- -> C backedge.

It's true you can use loop info, but given the dom test is O(1) i'm not sure why it's worth computing to test irreducibility, especially if you are doing an RPO computation anyway.

Please, note that this patch is not intended to provide the best irreducibility algorithm for any arbitrary client. In both ShrinkWrap and LV, LoopInfo is an already required analysis. The rationale behind the proposed algorithm is that we don't want to compute irreducibility detection again when this information is already available in LoopInfo. This keeps the algorithm simple. I agree that the current documentation in 'containsIrreducibleCFG' might be confusing. Computing LoopInfo from scratch only to determine the irreducibility of a CFG might not be the best approach but if it's already computed, which is the case in ShrinkWrap and LV, I don't see why this approach is worse than using DT. I can improve the documentation in that regard.

Please, also note that the proposed algorithm also allows to reuse an existing RPO traversal, which is also important in terms of performance.

If a more generic algorithm is necessary for more general cases, I would suggest addressing that in a different patch. However, I think there is no silver bullet here. For a DT-based implementation one could argue that LoopInfo could be available and DT could have been invalidated and would need to be recomputed.

Pinging @qcolombet, the original author of this algorithm in ShrinkWrap, in case he wants to provide further information.

Does it sound reasonable?

Thanks,
Diego

Thanks, that is very helpful. It's much more likely that Dominator info is up to date than loop info, but it's not worth getting into here. The only thing I request is that you update the comments to suggest this only probably makes sense to use if loopinfo is computed already, and please just add your description of how it works with the example to the comments.

fhahn added a comment.Feb 22 2018, 7:09 AM

Please, Florian, feel free to commit this patch whenever you think it's appropriate. I still don't have commit access.

Can do, but probably not before Monday.

hsaito added a subscriber: hsaito.Feb 22 2018, 11:29 AM

Please, Florian, feel free to commit this patch whenever you think it's appropriate. I still don't have commit access.

Can do, but probably not before Monday.

Thanks, Florian, Daniel, and all. Diego will update the comments in the latter half of next week when he comes back from vacation. Sorry for the delay.
In the meantime, please look at D42447, the next patch in the series, if you haven't.

Thanks,
Hideki

dcaballe updated this revision to Diff 136437.Feb 28 2018, 5:08 PM

Improving documentation of utility function 'containsIrreducibleCFG'.

Thanks, Daniel/Florian/Hideki.
Daniel, please, let me know if the new documentation addresses your concerns.

Thanks,
Diego

dberlin accepted this revision.Feb 28 2018, 9:29 PM

Looks good to me, thanks so much for the updates!

Thank you, Daniel!
OK, Florian, feel free to commit it whenever you have time. Thanks.

Diego

This revision was automatically updated to reflect the committed changes.

Thanks, Florian!