Page MenuHomePhabricator

[DA] conservatively mark the join of every divergent branch
ClosedPublic

Authored by sameerds on Jun 14 2020, 8:30 AM.

Details

Summary

For a loop, a join block is a block that is reachable along multiple
disjoint paths from the exiting block of a loop. If the exit condition
of the loop is divergent, then such join blocks must also be marked
divergent. This currently fails in some cases because not all join
blocks are identified correctly.

The workaround is to conservatively mark every join block of any
branch (not necessarily the exiting block of a loop) as divergent.

https://bugs.llvm.org/show_bug.cgi?id=46372

Diff Detail

Event Timeline

sameerds created this revision.Jun 14 2020, 8:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2020, 8:30 AM

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.
Also, a divergent loop exit does not automatically become join divergent, eg:

bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.

I am not sure what you mean here. propagateLoopDivergence is calling propagateJoinDivergence ... so doesn't that count as a result of the former?

Also, a divergent loop exit does not automatically become join divergent, eg:

bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

But here, bb3 is not a join block for the bb2 loop, since there is only one path from the loop exit reaching bb3.

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.

I am not sure what you mean here. propagateLoopDivergence is calling propagateJoinDivergence ... so doesn't that count as a result of the former?

Also, a divergent loop exit does not automatically become join divergent, eg:

bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

But here, bb3 is not a join block for the bb2 loop, since there is only one path from the loop exit reaching bb3.

Exactly. With this patch %x.uni is marked as join divergent even though it is uniform: In its present form the patch solves one issue at the cost of precision.

The real culprit here is the SDA::join_blocks interface that does not distinguish divergent join blocks and loop exits. The SDA "knows" which is which internally, loses that information as it passes divergent blocks through join_blocks and the DA has to reverse-engineer that information. I have an unfinished patch here that fixes that and should resolve the bug addressed by this patch as well. The unfinished patch also factors the back-and-forth between the DA and SDA to propagate along nested loops completely into the SDA (propagateLoopDiv does not really belong in the DA).

Long story short: This is an ok fix (but please add a FIXME/TODO that hints to the issues with this approach) but we should rework the join_blocks interface in any case.

llvm/lib/Analysis/DivergenceAnalysis.cpp
299

Add a FIXME here that clearly states that this is a quickfix. You could also add the following test (in comments since it will fail) to your test case to document the imprecision caused by this:

; CHECK: bb2:
; CHECK-NOT: DIVERGENT:       %Guard.bb2 = phi i1 [ true, %bb1 ], [ false, %bb0 ]

define protected amdgpu_kernel void @test2(i1 %uni) {
bb0:
  %tid.x = call i32 @llvm.amdgcn.workitem.id.x()
  %i5 = icmp eq i32 %tid.x, -1
  br i1 %uni, label %bb1, label %bb2

bb1:                                              ; preds = %bb2, %bb0
  %lsr.iv = phi i32 [ 7, %bb0 ], [ %lsr.iv.next, %bb1 ]
  %lsr.iv.next = add nsw i32 %lsr.iv, -1
  br i1 %i5, label %bb2, label %bb1

bb2:                                              ; preds = %bb2, %bb1
  %Guard.bb2 = phi i1 [ true, %bb1 ], [ false, %bb0 ]
  ret void
}

attributes #0 = { nounwind readnone speculatable }

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.

I am not sure what you mean here. propagateLoopDivergence is calling propagateJoinDivergence ... so doesn't that count as a result of the former?

Also, a divergent loop exit does not automatically become join divergent, eg:

bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

But here, bb3 is not a join block for the bb2 loop, since there is only one path from the loop exit reaching bb3.

Exactly. With this patch %x.uni is marked as join divergent even though it is uniform: In its present form the patch solves one issue at the cost of precision.

The real culprit here is the SDA::join_blocks interface that does not distinguish divergent join blocks and loop exits. The SDA "knows" which is which internally, loses that information as it passes divergent blocks through join_blocks and the DA has to reverse-engineer that information. I have an unfinished patch here that fixes that and should resolve the bug addressed by this patch as well. The unfinished patch also factors the back-and-forth between the DA and SDA to propagate along nested loops completely into the SDA (propagateLoopDiv does not really belong in the DA).

Long story short: This is an ok fix (but please add a FIXME/TODO that hints to the issues with this approach) but we should rework the join_blocks interface in any case.

I am new to this party. Why is %x marked as uni? Its value will differ amongst threads as branch in bb2 is divergent. Is my understanding incorrect?

bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

I am new to this party. Why is %x marked as uni? Its value will differ amongst threads as branch in bb2 is divergent. Is my understanding incorrect?

Here, threads exit bb2 on different iterations because of the divergent branch, but when they converge at bb3, they all assign the same value %m.uni to %x.uni. And since bb1 has a uniform exit, either all threads reach here directly from bb1, or they all reach here from bb2. So the value of %x.uni is uniformly %n.uni or %m.uni respectively.

sameerds marked an inline comment as done.Jun 16 2020, 6:57 AM
sameerds added inline comments.
llvm/lib/Analysis/DivergenceAnalysis.cpp
299

Not intending to bikeshed, but having a CHECK-NOT here will make it appear as if the test is endorsing this behaviour and things are exactly as they should be. Wouldn't an XFAIL be better? In addition, would filing a bug be a more reliable way to track the FIXME?

@simoll, I've been staring at the code for some time now, and I am no longer sure that I have a proper grasp on the problem. It seems my fix works, but not for the reasons that I thought it should. Running the unmodified analysis on "join-at-loop-exit.ll" shows the following debug output:

propBranchDiv bb1
        propJoinDiv bb3
propLoopDiv bb1

bb3 is a joindivergent exit for the [bb1, bb2] loop, but it seems that propLoopDiv does not visit it at all. That would imply that bb3 is not present in the call to join_blocks() on the loop. Is that expected behaviour in the current implementation? When does join_blocks() return the correct set of blocks for a loop?

My fix seems like a really big hammer that works because propBranchDiv gets called on bb1. But that was not my intention when I created the fix.

Do you still think that we should go ahead with this blunt fix? What is the ETA for the proper fix?

Do you still think that we should go ahead with this blunt fix? What is the ETA for the proper fix?

Just make it clear with a comment that it is a quickfix and create a bug report with the test case i ported earlier. You can reference the bug ID from the comment.
I can't make any promises for the proper fix but i hope to have the patch (SDA::join_blocks interface cleanup) ready for review by end of July.

sameerds updated this revision to Diff 271563.Jun 17 2020, 8:30 PM

filed a bug; improved the description; added a failing testcase

sameerds retitled this revision from [DivergenceAnalysis] mark join of divergent loop exits to [DA] conservatively mark the join of every divergent branch.Jun 17 2020, 8:34 PM
sameerds edited the summary of this revision. (Show Details)
simoll accepted this revision.Jun 18 2020, 12:16 AM
This revision is now accepted and ready to land.Jun 18 2020, 12:16 AM
This revision was automatically updated to reflect the committed changes.