This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
1/2
DivergenceAnalysis.cpp
-
test/Analysis/DivergenceAnalysis/AMDGPU/
-
Analysis/
-
DivergenceAnalysis/
-
AMDGPU/
-
join-at-loop-exit.ll
-
trivial-join-at-loop-exit.ll

Differential D81806

[DA] conservatively mark the join of every divergent branch
ClosedPublic

Authored by sameerds on Jun 14 2020, 8:30 AM.

Download Raw Diff

Details

Reviewers

simoll
nhaehnle
kerbowa
foad

Commits

rG7aad220795b5: [DA] conservatively mark the join of every divergent branch

Summary

For a loop, a join block is a block that is reachable along multiple
disjoint paths from the exiting block of a loop. If the exit condition
of the loop is divergent, then such join blocks must also be marked
divergent. This currently fails in some cases because not all join
blocks are identified correctly.

The workaround is to conservatively mark every join block of any
branch (not necessarily the exiting block of a loop) as divergent.

https://bugs.llvm.org/show_bug.cgi?id=46372

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sameerds created this revision.Jun 14 2020, 8:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2020, 8:30 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 2 others. · View Herald Transcript

sameerds added reviewers: simoll, nhaehnle, kerbowa, foad.Jun 14 2020, 8:33 AM

Harbormaster completed remote builds in B60239: Diff 270625.Jun 14 2020, 9:04 AM

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.
Also, a divergent loop exit does not automatically become join divergent, eg:

bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

In D81806#2094869, @simoll wrote:

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.

I am not sure what you mean here. propagateLoopDivergence is calling propagateJoinDivergence ... so doesn't that count as a result of the former?

Also, a divergent loop exit does not automatically become join divergent, eg:

bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent

But here, bb3 is not a join block for the bb2 loop, since there is only one path from the loop exit reaching bb3.

In D81806#2094960, @sameerds wrote:
In D81806#2094869, @simoll wrote:

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.

I am not sure what you mean here. propagateLoopDivergence is calling propagateJoinDivergence ... so doesn't that count as a result of the former?
Also, a divergent loop exit does not automatically become join divergent, eg:
bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent
But here, bb3 is not a join block for the bb2 loop, since there is only one path from the loop exit reaching bb3.

Exactly. With this patch %x.uni is marked as join divergent even though it is uniform: In its present form the patch solves one issue at the cost of precision.

The real culprit here is the SDA::join_blocks interface that does not distinguish divergent join blocks and loop exits. The SDA "knows" which is which internally, loses that information as it passes divergent blocks through join_blocks and the DA has to reverse-engineer that information. I have an unfinished patch here that fixes that and should resolve the bug addressed by this patch as well. The unfinished patch also factors the back-and-forth between the DA and SDA to propagate along nested loops completely into the SDA (propagateLoopDiv does not really belong in the DA).

Long story short: This is an ok fix (but please add a FIXME/TODO that hints to the issues with this approach) but we should rework the join_blocks interface in any case.

llvm/lib/Analysis/DivergenceAnalysis.cpp

299

Add a FIXME here that clearly states that this is a quickfix. You could also add the following test (in comments since it will fail) to your test case to document the imprecision caused by this:

; CHECK: bb2:
; CHECK-NOT: DIVERGENT:       %Guard.bb2 = phi i1 [ true, %bb1 ], [ false, %bb0 ]

define protected amdgpu_kernel void @test2(i1 %uni) {
bb0:
  %tid.x = call i32 @llvm.amdgcn.workitem.id.x()
  %i5 = icmp eq i32 %tid.x, -1
  br i1 %uni, label %bb1, label %bb2

bb1:                                              ; preds = %bb2, %bb0
  %lsr.iv = phi i32 [ 7, %bb0 ], [ %lsr.iv.next, %bb1 ]
  %lsr.iv.next = add nsw i32 %lsr.iv, -1
  br i1 %i5, label %bb2, label %bb1

bb2:                                              ; preds = %bb2, %bb1
  %Guard.bb2 = phi i1 [ true, %bb1 ], [ false, %bb0 ]
  ret void
}

attributes #0 = { nounwind readnone speculatable }

In D81806#2095487, @simoll wrote:
In D81806#2094960, @sameerds wrote:
In D81806#2094869, @simoll wrote:

I am a bit surprised this is necessary. Actually, join divergence in bb3 should come about as a result of propagateLoopDivergence for the [bb1, bb2] loop and not immediately because it is a divergent exit.

I am not sure what you mean here. propagateLoopDivergence is calling propagateJoinDivergence ... so doesn't that count as a result of the former?
Also, a divergent loop exit does not automatically become join divergent, eg:
bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent
But here, bb3 is not a join block for the bb2 loop, since there is only one path from the loop exit reaching bb3.
Exactly. With this patch %x.uni is marked as join divergent even though it is uniform: In its present form the patch solves one issue at the cost of precision.

The real culprit here is the SDA::join_blocks interface that does not distinguish divergent join blocks and loop exits. The SDA "knows" which is which internally, loses that information as it passes divergent blocks through join_blocks and the DA has to reverse-engineer that information. I have an unfinished patch here that fixes that and should resolve the bug addressed by this patch as well. The unfinished patch also factors the back-and-forth between the DA and SDA to propagate along nested loops completely into the SDA (propagateLoopDiv does not really belong in the DA).

Long story short: This is an ok fix (but please add a FIXME/TODO that hints to the issues with this approach) but we should rework the join_blocks interface in any case.

I am new to this party. Why is %x marked as uni? Its value will differ amongst threads as branch in bb2 is divergent. Is my understanding incorrect?

In D81806#2095605, @madhur13490 wrote:
bb1: 
   %m.uni = <uni>
   %n.uni = <uni>
   br <uni>, %bb2, %bb3

bb2: [..]
   br <div>, %bb2, %bb3

bb3:
   %x.uni = phi [%m.uni, %bb2], [%n.uni %bb1]  <-- divergent loop exit that is not join divergent
I am new to this party. Why is %x marked as uni? Its value will differ amongst threads as branch in bb2 is divergent. Is my understanding incorrect?

Here, threads exit bb2 on different iterations because of the divergent branch, but when they converge at bb3, they all assign the same value %m.uni to %x.uni. And since bb1 has a uniform exit, either all threads reach here directly from bb1, or they all reach here from bb2. So the value of %x.uni is uniformly %n.uni or %m.uni respectively.

sameerds marked an inline comment as done.Jun 16 2020, 6:57 AM

sameerds added inline comments.

llvm/lib/Analysis/DivergenceAnalysis.cpp
299	Not intending to bikeshed, but having a CHECK-NOT here will make it appear as if the test is endorsing this behaviour and things are exactly as they should be. Wouldn't an XFAIL be better? In addition, would filing a bug be a more reliable way to track the FIXME?

@simoll, I've been staring at the code for some time now, and I am no longer sure that I have a proper grasp on the problem. It seems my fix works, but not for the reasons that I thought it should. Running the unmodified analysis on "join-at-loop-exit.ll" shows the following debug output:

propBranchDiv bb1
        propJoinDiv bb3
propLoopDiv bb1

bb3 is a joindivergent exit for the [bb1, bb2] loop, but it seems that propLoopDiv does not visit it at all. That would imply that bb3 is not present in the call to join_blocks() on the loop. Is that expected behaviour in the current implementation? When does join_blocks() return the correct set of blocks for a loop?

My fix seems like a really big hammer that works because propBranchDiv gets called on bb1. But that was not my intention when I created the fix.

Do you still think that we should go ahead with this blunt fix? What is the ETA for the proper fix?

In D81806#2097379, @sameerds wrote:

Do you still think that we should go ahead with this blunt fix? What is the ETA for the proper fix?

Just make it clear with a comment that it is a quickfix and create a bug report with the test case i ported earlier. You can reference the bug ID from the comment.
I can't make any promises for the proper fix but i hope to have the patch (SDA::join_blocks interface cleanup) ready for review by end of July.

filed a bug; improved the description; added a failing testcase

sameerds retitled this revision from [DivergenceAnalysis] mark join of divergent loop exits to [DA] conservatively mark the join of every divergent branch.Jun 17 2020, 8:34 PM

sameerds edited the summary of this revision. (Show Details)

Harbormaster failed remote builds in B60754: Diff 271563!Jun 17 2020, 9:35 PM

Harbormaster completed remote builds in B60754: Diff 271563.Jun 17 2020, 11:58 PM

simoll accepted this revision.Jun 18 2020, 12:16 AM

This revision is now accepted and ready to land.Jun 18 2020, 12:16 AM

Closed by commit rG7aad220795b5: [DA] conservatively mark the join of every divergent branch (authored by sameerds). · Explain WhyJun 18 2020, 5:24 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

DivergenceAnalysis.cpp

9 lines

test/

Analysis/

DivergenceAnalysis/

AMDGPU/

join-at-loop-exit.ll

36 lines

trivial-join-at-loop-exit.ll

28 lines

Diff 271673

llvm/lib/Analysis/DivergenceAnalysis.cpp

Show First 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	bool DivergenceAnalysis::propagateJoinDivergence(const BasicBlock &JoinBlock,
// ignore divergence outside the region		// ignore divergence outside the region
if (!inRegion(JoinBlock)) {		if (!inRegion(JoinBlock)) {
return false;		return false;
}		}

// push non-divergent phi nodes in JoinBlock to the worklist		// push non-divergent phi nodes in JoinBlock to the worklist
pushPHINodes(JoinBlock);		pushPHINodes(JoinBlock);

// JoinBlock is a divergent loop exit
if (BranchLoop && !BranchLoop->contains(&JoinBlock)) {
return true;
}

// disjoint-paths divergent at JoinBlock		// disjoint-paths divergent at JoinBlock
markBlockJoinDivergent(JoinBlock);		markBlockJoinDivergent(JoinBlock);
		simollUnsubmitted Not Done Reply Inline Actions Add a FIXME here that clearly states that this is a quickfix. You could also add the following test (in comments since it will fail) to your test case to document the imprecision caused by this: ; CHECK: bb2: ; CHECK-NOT: DIVERGENT: %Guard.bb2 = phi i1 [ true, %bb1 ], [ false, %bb0 ] define protected amdgpu_kernel void @test2(i1 %uni) { bb0: %tid.x = call i32 @llvm.amdgcn.workitem.id.x() %i5 = icmp eq i32 %tid.x, -1 br i1 %uni, label %bb1, label %bb2 bb1: ; preds = %bb2, %bb0 %lsr.iv = phi i32 [ 7, %bb0 ], [ %lsr.iv.next, %bb1 ] %lsr.iv.next = add nsw i32 %lsr.iv, -1 br i1 %i5, label %bb2, label %bb1 bb2: ; preds = %bb2, %bb1 %Guard.bb2 = phi i1 [ true, %bb1 ], [ false, %bb0 ] ret void } attributes #0 = { nounwind readnone speculatable } simoll: Add a FIXME here that clearly states that this is a quickfix. You could also add the following…
		sameerdsAuthorUnsubmitted Done Reply Inline Actions Not intending to bikeshed, but having a CHECK-NOT here will make it appear as if the test is endorsing this behaviour and things are exactly as they should be. Wouldn't an XFAIL be better? In addition, would filing a bug be a more reliable way to track the FIXME? sameerds: Not intending to bikeshed, but having a CHECK-NOT here will make it appear as if the test is…
return false;
		// JoinBlock is a divergent loop exit
		return BranchLoop && !BranchLoop->contains(&JoinBlock);
}		}

void DivergenceAnalysis::propagateBranchDivergence(const Instruction &Term) {		void DivergenceAnalysis::propagateBranchDivergence(const Instruction &Term) {
LLVM_DEBUG(dbgs() << "propBranchDiv " << Term.getParent()->getName() << "\n");		LLVM_DEBUG(dbgs() << "propBranchDiv " << Term.getParent()->getName() << "\n");

markDivergent(Term);		markDivergent(Term);

// Don't propagate divergence from unreachable blocks.		// Don't propagate divergence from unreachable blocks.
▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

llvm/test/Analysis/DivergenceAnalysis/AMDGPU/join-at-loop-exit.ll

This file was added.

				; RUN: opt -mtriple amdgcn-unknown-amdhsa -analyze -divergence -use-gpu-divergence-analysis %s \| FileCheck %s

				; CHECK: bb3:
				; CHECK: DIVERGENT: %Guard.bb4 = phi i1 [ true, %bb1 ], [ false, %bb2 ]
				; CHECK: DIVERGENT: br i1 %Guard.bb4, label %bb4, label %bb5

				; Function Attrs: nounwind readnone speculatable
				declare i32 @llvm.amdgcn.workitem.id.x() #0

				define protected amdgpu_kernel void @test() {
				bb0:
				%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
				%i5 = icmp eq i32 %tid.x, -1
				br label %bb1

				bb1: ; preds = %bb2, %bb0
				%lsr.iv = phi i32 [ 7, %bb0 ], [ %lsr.iv.next, %bb2 ]
				br i1 %i5, label %bb2, label %bb3

				bb2: ; preds = %bb1
				%lsr.iv.next = add nsw i32 %lsr.iv, -1
				%i14 = icmp eq i32 %lsr.iv.next, 0
				br i1 %i14, label %bb3, label %bb1

				bb3: ; preds = %bb2, %bb1
				%Guard.bb4 = phi i1 [ true, %bb1 ], [ false, %bb2 ]
				br i1 %Guard.bb4, label %bb4, label %bb5

				bb4: ; preds = %bb3
				br label %bb5

				bb5: ; preds = %bb3, %bb4
				ret void
				}

				attributes #0 = { nounwind readnone speculatable }

llvm/test/Analysis/DivergenceAnalysis/AMDGPU/trivial-join-at-loop-exit.ll

This file was added.

				; RUN: opt -mtriple amdgcn-unknown-amdhsa -analyze -divergence -use-gpu-divergence-analysis %s \| FileCheck %s
				; XFAIL: *

				; https://bugs.llvm.org/show_bug.cgi?id=46372

				; CHECK: bb2:
				; CHECK-NOT: DIVERGENT: %Guard.bb2 = phi i1 [ true, %bb1 ], [ false, %bb0 ]

				; Function Attrs: nounwind readnone speculatable
				declare i32 @llvm.amdgcn.workitem.id.x() #0

				define protected amdgpu_kernel void @test2(i1 %uni) {
				bb0:
				%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
				%i5 = icmp eq i32 %tid.x, -1
				br i1 %uni, label %bb1, label %bb2

				bb1: ; preds = %bb2, %bb0
				%lsr.iv = phi i32 [ 7, %bb0 ], [ %lsr.iv.next, %bb1 ]
				%lsr.iv.next = add nsw i32 %lsr.iv, -1
				br i1 %i5, label %bb2, label %bb1

				bb2: ; preds = %bb2, %bb1
				%Guard.bb2 = phi i1 [ true, %bb1 ], [ false, %bb0 ]
				ret void
				}

				attributes #0 = { nounwind readnone speculatable }

This is an archive of the discontinued LLVM Phabricator instance.

[DA] conservatively mark the join of every divergent branchClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 271673

llvm/lib/Analysis/DivergenceAnalysis.cpp

llvm/test/Analysis/DivergenceAnalysis/AMDGPU/join-at-loop-exit.ll

llvm/test/Analysis/DivergenceAnalysis/AMDGPU/trivial-join-at-loop-exit.ll

[DA] conservatively mark the join of every divergent branch
ClosedPublic