This is an archive of the discontinued LLVM Phabricator instance.

StructurizeCFG: Test for branch divergence correctly
ClosedPublic

Authored by nhaehnle on Nov 28 2017, 4:23 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
jlebar
alex-t

Commits

rG43c1115cd461: StructurizeCFG: Test for branch divergence correctly
rL325881: StructurizeCFG: Test for branch divergence correctly

Summary

This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.

This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.

Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4

Diff Detail

Build Status

Buildable 12545
Build 12545: arc lint + arc unit

Event Timeline

nhaehnle created this revision.Nov 28 2017, 4:23 AM

Herald added subscribers: tpr, wdng. · View Herald TranscriptNov 28 2017, 4:23 AM

rampitec added a reviewer: alex-t.Nov 28 2017, 11:53 AM

arsenm added inline comments.Nov 28 2017, 12:00 PM

lib/Transforms/Scalar/StructurizeCFG.cpp
253	I think the flag should override the pass argument if explicitly specified. I think other passes do something like if .getNumOccurrences() > 0 use the flag ...
test/Transforms/StructurizeCFG/uniform-regions.ll
1	The -mtriple should be dropped. If it's needed, you should create an AMDGPU specific test subdirectory in StructurizeCFG

In general looks correct to me. You definitely should check the branch itself - not the condition.
In your case (divergent loop-exit) branch itself is divergent because of the control dependency.

include/llvm/Analysis/DivergenceAnalysis.h
40	Comment is not necessary. By definition: instruction produces uniform result if and only if all it's operands are uniform and all the branches it control depend of are uniform. This comment assume that the person who uses Divergence Analysis has no idea of what it is.
46	Same as above
test/Transforms/StructurizeCFG/uniform-regions.ll
34	This is misleading. I can guess that using %v in the comparison means that branch condition is divergent. Nevertheless, I think that it's more clear if you explicitly use the divergent definition for %v. like: %v = call i32 @llvm.amdgcn.workitem.id.x()

Address review comments:

command line flag can also force-disable
move uniform-region.ll test into AMDGPU subdirectory
use llvm.amdgcn.workitem.id.x

Harbormaster completed remote builds in B14282: Diff 131600.Jan 26 2018, 8:58 AM

arsenm added inline comments.Jan 26 2018, 9:09 AM

test/Transforms/StructurizeCFG/AMDGPU/uniform-regions.ll
1 ↗	(On Diff #131600)	I think we should start using update_test_checks for any structurer tests

Use update_test_checks

nhaehnle marked an inline comment as done.Jan 29 2018, 6:59 AM

nhaehnle added inline comments.

include/llvm/Analysis/DivergenceAnalysis.h
40	Strongly disagree. This is the header/interface file, and the definition you cite appears nowhere in the code, so the comment is valuable. To elaborate: This comment assume that the person who uses Divergence Analysis has no idea of what it is. Kind of, though I would rather say: the comment assumes that the person who uses DivergenceAnalysis has no experience with it, and that is entirely fair. Actually, I'd say the mere existence of this patch is evidence that the comment makes sense. You see, this patch fixes a bug in StructurizeCFG, which only uses divergence analysis optionally, and may be used by targets where divergence is not an issue. So it is entirely reasonable that a person who has no experience at all with DivergenceAnalysis would work on StructurizeCFG in a way that happens to touch how it interacts with DivergenceAnalysis. This comment serves as a warning to someone like that.
lib/Transforms/Scalar/StructurizeCFG.cpp
253	Done.
test/Transforms/StructurizeCFG/AMDGPU/uniform-regions.ll
1 ↗	(On Diff #131600)	Done
test/Transforms/StructurizeCFG/uniform-regions.ll
1	Done.
34	Done.

alex-t added inline comments.Jan 30 2018, 7:00 AM

include/llvm/Analysis/DivergenceAnalysis.h
40	This problem may occur in case the value that is defined uniformly in a loop is used outside the loop in another block that is control dependent of the divergent loop exit branch. The value itself it is still uniform. But the value that is produced by the user is divergent because of the divergent control dependency. So DivergentAnalysis is correct. The problem is in misunderstanding of what is control dependencies in the context of divergence. I'd give same example as in https://reviews.llvm.org/D40547 %tid = call i32 @llvm.amdgcn.workitem.id.x() for.body: %val = add i32 %val, 1 <== definition of %val is uniform %cmp = icmp gt i64 %tid, %arg1 br i1 %cmp, label %for.end, label %for.body <== loop exit condition is divergent for.end: %result = add i32 %val, %x <== each thread will have different %val here. So the %result is divergent but %val is still uniform Anyway it is not a serious issue. I just want to explain what I meant.

Finally got time to get back to this. Can I get a review on this please?

alex-t accepted this revision.Feb 21 2018, 7:45 AM

This revision is now accepted and ready to land.Feb 21 2018, 7:45 AM

Closed by commit rL325881: StructurizeCFG: Test for branch divergence correctly (authored by nha). · Explain WhyFeb 23 2018, 2:47 AM

This revision was automatically updated to reflect the committed changes.

I'm not sure if it was this change or something else, but the optnone test is not stable on my system (testing with llc built from r325923):

$ ./llvm-lit ../../llvm/test/CodeGen/AMDGPU/control-flow-optnone.ll 
-- Testing: 1 tests, 1 threads --
PASS: LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll (1 of 1)
Testing Time: 0.11s
  Expected Passes    : 1
$ ./llvm-lit ../../llvm/test/CodeGen/AMDGPU/control-flow-optnone.ll 
-- Testing: 1 tests, 1 threads --
FAIL: LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll (1 of 1)
Testing Time: 0.11s
********************
Failing Tests (1):
    LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll

I also got blame mail from a bot about this test failing for what would seem to be an unrelated change:
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/7201

Yes, it's this change. It's subtle, DivergenceAnalysis has always ended up with dangling pointers, but prior to this change it happened to not matter because we never queried newly created values. This change ended up querying newly created values, and the results differed depending on whether the new allocations happened to reuse the dangling pointers. I uploaded a new version of this change as D43743.

Revision Contents

Path

Size

include/

llvm/

Analysis/

DivergenceAnalysis.h

8 lines

lib/

Transforms/

Scalar/

StructurizeCFG.cpp

11 lines

test/

CodeGen/

AMDGPU/

control-flow-optnone.ll

4 lines

Transforms/

StructurizeCFG/

uniform-regions.ll

49 lines

Diff 124542

include/llvm/Analysis/DivergenceAnalysis.h

Show All 29 Lines	public:

void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

// Print all divergent branches in the function.		// Print all divergent branches in the function.
void print(raw_ostream &OS, const Module *) const override;		void print(raw_ostream &OS, const Module *) const override;

// Returns true if V is divergent.		// Returns true if V is divergent at its definition.
		//
		// Even if this function returns false, V may still be divergent when used
		alex-tUnsubmitted Not Done Reply Inline Actions Comment is not necessary. By definition: instruction produces uniform result if and only if all it's operands are uniform and all the branches it control depend of are uniform. This comment assume that the person who uses Divergence Analysis has no idea of what it is. alex-t: Comment is not necessary. By definition: instruction produces uniform result if and only if…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Strongly disagree. This is the header/interface file, and the definition you cite appears nowhere in the code, so the comment is valuable. To elaborate: This comment assume that the person who uses Divergence Analysis has no idea of what it is. Kind of, though I would rather say: the comment assumes that the person who uses DivergenceAnalysis has no experience with it, and that is entirely fair. Actually, I'd say the mere existence of this patch is evidence that the comment makes sense. You see, this patch fixes a bug in StructurizeCFG, which only uses divergence analysis optionally, and may be used by targets where divergence is not an issue. So it is entirely reasonable that a person who has no experience at all with DivergenceAnalysis would work on StructurizeCFG in a way that happens to touch how it interacts with DivergenceAnalysis. This comment serves as a warning to someone like that. nhaehnle: Strongly disagree. This is the header/interface file, and the definition you cite appears…
		alex-tUnsubmitted Not Done Reply Inline Actions This problem may occur in case the value that is defined uniformly in a loop is used outside the loop in another block that is control dependent of the divergent loop exit branch. The value itself it is still uniform. But the value that is produced by the user is divergent because of the divergent control dependency. So DivergentAnalysis is correct. The problem is in misunderstanding of what is control dependencies in the context of divergence. I'd give same example as in https://reviews.llvm.org/D40547 %tid = call i32 @llvm.amdgcn.workitem.id.x() for.body: %val = add i32 %val, 1 <== definition of %val is uniform %cmp = icmp gt i64 %tid, %arg1 br i1 %cmp, label %for.end, label %for.body <== loop exit condition is divergent for.end: %result = add i32 %val, %x <== each thread will have different %val here. So the %result is divergent but %val is still uniform Anyway it is not a serious issue. I just want to explain what I meant. alex-t: This problem may occur in case the value that is defined uniformly in a loop is used outside…
		// in a different basic block.
bool isDivergent(const Value *V) const { return DivergentValues.count(V); }		bool isDivergent(const Value *V) const { return DivergentValues.count(V); }

// Returns true if V is uniform/non-divergent.		// Returns true if V is uniform/non-divergent.
		//
		// Even if this function returns true, V may still be divergent when used
		alex-tUnsubmitted Not Done Reply Inline Actions Same as above alex-t: Same as above
		// in a different basic block.
bool isUniform(const Value *V) const { return !isDivergent(V); }		bool isUniform(const Value *V) const { return !isDivergent(V); }

private:		private:
// Stores all divergent values.		// Stores all divergent values.
DenseSet<const Value *> DivergentValues;		DenseSet<const Value *> DivergentValues;
};		};
} // End llvm namespace		} // End llvm namespace

lib/Transforms/Scalar/StructurizeCFG.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines

#define DEBUG_TYPE "structurizecfg"		#define DEBUG_TYPE "structurizecfg"

// The name for newly created blocks.		// The name for newly created blocks.
static const char *const FlowBlockName = "Flow";		static const char *const FlowBlockName = "Flow";

namespace {		namespace {

		static cl::opt<bool> ForceSkipUniformRegions(
		"structurizecfg-skip-uniform-regions",
		cl::Hidden,
		cl::desc("Force the StructurizeCFG pass to skip uniform regions"),
		cl::init(false));

// Definition of the complex types used in this pass.		// Definition of the complex types used in this pass.

using BBValuePair = std::pair<BasicBlock , Value >;		using BBValuePair = std::pair<BasicBlock , Value >;

using RNVector = SmallVector<RegionNode *, 8>;		using RNVector = SmallVector<RegionNode *, 8>;
using BBVector = SmallVector<BasicBlock *, 8>;		using BBVector = SmallVector<BasicBlock *, 8>;
using BranchVector = SmallVector<BranchInst *, 8>;		using BranchVector = SmallVector<BranchInst *, 8>;
using BBValueVector = SmallVector<BBValuePair, 2>;		using BBValueVector = SmallVector<BBValuePair, 2>;
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	class StructurizeCFG : public RegionPass {
void createFlow();		void createFlow();

void rebuildSSA();		void rebuildSSA();

public:		public:
static char ID;		static char ID;

explicit StructurizeCFG(bool SkipUniformRegions = false)		explicit StructurizeCFG(bool SkipUniformRegions = false)
: RegionPass(ID), SkipUniformRegions(SkipUniformRegions) {		: RegionPass(ID),
		SkipUniformRegions(SkipUniformRegions \|\| ForceSkipUniformRegions) {
		arsenmUnsubmitted Done Reply Inline Actions I think the flag should override the pass argument if explicitly specified. I think other passes do something like if .getNumOccurrences() > 0 use the flag ... arsenm: I think the flag should override the pass argument if explicitly specified. I think other…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Done. nhaehnle: Done.
initializeStructurizeCFGPass(*PassRegistry::getPassRegistry());		initializeStructurizeCFGPass(*PassRegistry::getPassRegistry());
}		}

bool doInitialization(Region *R, RGPassManager &RGM) override;		bool doInitialization(Region *R, RGPassManager &RGM) override;

bool runOnRegion(Region *R, RGPassManager &RGM) override;		bool runOnRegion(Region *R, RGPassManager &RGM) override;

StringRef getPassName() const override { return "Structurize control flow"; }		StringRef getPassName() const override { return "Structurize control flow"; }
▲ Show 20 Lines • Show All 631 Lines • ▼ Show 20 Lines

static bool hasOnlyUniformBranches(const Region *R,		static bool hasOnlyUniformBranches(const Region *R,
const DivergenceAnalysis &DA) {		const DivergenceAnalysis &DA) {
for (const BasicBlock *BB : R->blocks()) {		for (const BasicBlock *BB : R->blocks()) {
const BranchInst *Br = dyn_cast<BranchInst>(BB->getTerminator());		const BranchInst *Br = dyn_cast<BranchInst>(BB->getTerminator());
if (!Br \|\| !Br->isConditional())		if (!Br \|\| !Br->isConditional())
continue;		continue;

if (!DA.isUniform(Br->getCondition()))		if (!DA.isUniform(Br))
return false;		return false;
DEBUG(dbgs() << "BB: " << BB->getName() << " has uniform terminator\n");		DEBUG(dbgs() << "BB: " << BB->getName() << " has uniform terminator\n");
}		}
return true;		return true;
}		}

/// \brief Run the transformation for each region found		/// \brief Run the transformation for each region found
bool StructurizeCFG::runOnRegion(Region *R, RGPassManager &RGM) {		bool StructurizeCFG::runOnRegion(Region *R, RGPassManager &RGM) {
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/control-flow-optnone.ll

	Show All 9 Lines
	; GCN: s_branch			; GCN: s_branch

	; GCN-DAG: v_cmp_lt_i32			; GCN-DAG: v_cmp_lt_i32
	; GCN-DAG: v_cmp_gt_i32			; GCN-DAG: v_cmp_gt_i32
	; GCN: s_and_b64			; GCN: s_and_b64
	; GCN: s_mov_b64 exec			; GCN: s_mov_b64 exec

	; GCN: s_or_b64 exec, exec			; GCN: s_or_b64 exec, exec
	; GCN: v_cmp_eq_u32			; GCN: s_cmp_eq_u32
	; GCN: s_cbranch_vccnz			; GCN: s_cbranch_scc1
	; GCN-NEXT: s_branch			; GCN-NEXT: s_branch
	define amdgpu_kernel void @copytoreg_divergent_brcond(i32 %arg, i32 %arg1, i32 %arg2) #0 {			define amdgpu_kernel void @copytoreg_divergent_brcond(i32 %arg, i32 %arg1, i32 %arg2) #0 {
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp3 = zext i32 %tmp to i64			%tmp3 = zext i32 %tmp to i64
	%tmp5 = add i64 %tmp3, undef			%tmp5 = add i64 %tmp3, undef
	%tmp6 = trunc i64 %tmp5 to i32			%tmp6 = trunc i64 %tmp5 to i32
	%tmp7 = mul nsw i32 %tmp6, %arg2			%tmp7 = mul nsw i32 %tmp6, %arg2
	Show All 27 Lines

test/Transforms/StructurizeCFG/uniform-regions.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-- -S -o - -structurizecfg -structurizecfg-skip-uniform-regions < %s \| FileCheck %s
				arsenmUnsubmitted Done Reply Inline Actions The -mtriple should be dropped. If it's needed, you should create an AMDGPU specific test subdirectory in StructurizeCFG arsenm: The -mtriple should be dropped. If it's needed, you should create an AMDGPU specific test…
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Done. nhaehnle: Done.

				; CHECK-LABEL: @uniform(
				; CHECK: entry:
				; CHECK: br i1 %cc, label %if, label %end, !structurizecfg.uniform !0
				; CHECK: if:
				; CHECK: br label %end, !structurizecfg.uniform !0
				define amdgpu_cs void @uniform(i32 inreg %v) {
				entry:
				%cc = icmp eq i32 %v, 0
				br i1 %cc, label %if, label %end

				if:
				br label %end

				end:
				ret void
				}

				; CHECK-LABEL: @nonuniform(
				; CHECK-NOT: !structurizecfg
				define amdgpu_cs void @nonuniform(i32 addrspace(2)* %ptr) {
				entry:
				br label %for.body

				for.body:
				%i = phi i32 [0, %entry], [%i.inc, %end.loop]
				%cc = icmp ult i32 %i, 4
				br i1 %cc, label %mid.loop, label %for.end

				mid.loop:
				%gep = getelementptr i32, i32 addrspace(2)* %ptr, i32 %i
				%v = load i32, i32 addrspace(2)* %gep, align 4
				%cc2 = icmp eq i32 %v, 0
				alex-tUnsubmitted Done Reply Inline Actions This is misleading. I can guess that using %v in the comparison means that branch condition is divergent. Nevertheless, I think that it's more clear if you explicitly use the divergent definition for %v. like: %v = call i32 @llvm.amdgcn.workitem.id.x() alex-t: This is misleading. I can guess that using %v in the comparison means that branch condition is…
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Done. nhaehnle: Done.
				br i1 %cc2, label %end.loop, label %for.end

				end.loop:
				%i.inc = add i32 %i, 1
				br label %for.body

				for.end:
				br i1 %cc, label %if, label %end

				if:
				br label %end

				end:
				ret void
				}