This is an archive of the discontinued LLVM Phabricator instance.

StructurizeCFG: Test for branch divergence correctly
ClosedPublic

Authored by nhaehnle on Nov 28 2017, 4:23 AM.

Details

Summary

This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.

This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.

Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4

Event Timeline

nhaehnle created this revision.Nov 28 2017, 4:23 AM
arsenm added inline comments.Nov 28 2017, 12:00 PM
lib/Transforms/Scalar/StructurizeCFG.cpp
253

I think the flag should override the pass argument if explicitly specified. I think other passes do something like if .getNumOccurrences() > 0 use the flag ...

test/Transforms/StructurizeCFG/uniform-regions.ll
1

The -mtriple should be dropped. If it's needed, you should create an AMDGPU specific test subdirectory in StructurizeCFG

alex-t edited edge metadata.Nov 29 2017, 3:01 AM

In general looks correct to me. You definitely should check the branch itself - not the condition.
In your case (divergent loop-exit) branch itself is divergent because of the control dependency.

include/llvm/Analysis/DivergenceAnalysis.h
40

Comment is not necessary. By definition:

instruction produces uniform result if and only if all it's operands are uniform and all the branches it control depend of are uniform.

This comment assume that the person who uses Divergence Analysis has no idea of what it is.

46

Same as above

test/Transforms/StructurizeCFG/uniform-regions.ll
34

This is misleading. I can guess that using %v in the comparison means that branch condition is divergent.
Nevertheless, I think that it's more clear if you explicitly use the divergent definition for %v.
like:

%v = call i32 @llvm.amdgcn.workitem.id.x()

nhaehnle updated this revision to Diff 131600.Jan 26 2018, 8:57 AM
nhaehnle marked 3 inline comments as done.

Address review comments:

  • command line flag can also force-disable
  • move uniform-region.ll test into AMDGPU subdirectory
  • use llvm.amdgcn.workitem.id.x
arsenm added inline comments.Jan 26 2018, 9:09 AM
test/Transforms/StructurizeCFG/AMDGPU/uniform-regions.ll
1 ↗(On Diff #131600)

I think we should start using update_test_checks for any structurer tests

nhaehnle updated this revision to Diff 131798.Jan 29 2018, 6:59 AM

Use update_test_checks

nhaehnle marked an inline comment as done.Jan 29 2018, 6:59 AM
nhaehnle added inline comments.
include/llvm/Analysis/DivergenceAnalysis.h
40

Strongly disagree. This is the header/interface file, and the definition you cite appears nowhere in the code, so the comment is valuable. To elaborate:

This comment assume that the person who uses Divergence Analysis has no idea of what it is.

Kind of, though I would rather say: the comment assumes that the person who uses DivergenceAnalysis has no experience with it, and that is entirely fair. Actually, I'd say the mere existence of this patch is evidence that the comment makes sense.

You see, this patch fixes a bug in StructurizeCFG, which only uses divergence analysis optionally, and may be used by targets where divergence is not an issue. So it is entirely reasonable that a person who has no experience at all with DivergenceAnalysis would work on StructurizeCFG in a way that happens to touch how it interacts with DivergenceAnalysis. This comment serves as a warning to someone like that.

lib/Transforms/Scalar/StructurizeCFG.cpp
253

Done.

test/Transforms/StructurizeCFG/AMDGPU/uniform-regions.ll
1 ↗(On Diff #131600)

Done

test/Transforms/StructurizeCFG/uniform-regions.ll
1

Done.

34

Done.

alex-t added inline comments.Jan 30 2018, 7:00 AM
include/llvm/Analysis/DivergenceAnalysis.h
40

This problem may occur in case the value that is defined uniformly in a loop is used outside the loop in another block that is control dependent of the divergent loop exit branch. The value itself it is still uniform.
But the value that is produced by the user is divergent because of the divergent control dependency. So DivergentAnalysis is correct. The problem is in misunderstanding of what is control dependencies in the context of divergence.

I'd give same example as in https://reviews.llvm.org/D40547

%tid = call i32 @llvm.amdgcn.workitem.id.x()

for.body:

%val = add i32 %val, 1 <== definition of %val is uniform
%cmp = icmp gt i64 %tid, %arg1
br i1 %cmp, label %for.end, label %for.body <== loop exit condition is divergent
for.end:

%result = add i32 %val, %x <== each thread will have different %val here. So the %result is divergent but %val is still uniform

Anyway it is not a serious issue. I just want to explain what I meant.

nhaehnle marked an inline comment as done.Feb 21 2018, 6:53 AM

Finally got time to get back to this. Can I get a review on this please?

alex-t accepted this revision.Feb 21 2018, 7:45 AM
This revision is now accepted and ready to land.Feb 21 2018, 7:45 AM
This revision was automatically updated to reflect the committed changes.
spatel added a subscriber: spatel.Feb 23 2018, 12:22 PM

I'm not sure if it was this change or something else, but the optnone test is not stable on my system (testing with llc built from r325923):

$ ./llvm-lit ../../llvm/test/CodeGen/AMDGPU/control-flow-optnone.ll 
-- Testing: 1 tests, 1 threads --
PASS: LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll (1 of 1)
Testing Time: 0.11s
  Expected Passes    : 1
$ ./llvm-lit ../../llvm/test/CodeGen/AMDGPU/control-flow-optnone.ll 
-- Testing: 1 tests, 1 threads --
FAIL: LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll (1 of 1)
Testing Time: 0.11s
********************
Failing Tests (1):
    LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll

I also got blame mail from a bot about this test failing for what would seem to be an unrelated change:
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/7201

Yes, it's this change. It's subtle, DivergenceAnalysis has always ended up with dangling pointers, but prior to this change it happened to not matter because we never queried newly created values. This change ended up querying newly created values, and the results differed depending on whether the new allocations happened to reuse the dangling pointers. I uploaded a new version of this change as D43743.