Download Raw Diff

Details

Reviewers

tstellar
foad
ruiling
arsenm

Commits

rG4bbcbdaee5c9: [AMDGPU] Unify divergent nodes if the PostDom tree has one root

Summary

This patch allows AMDGPUUnifyDivergenceExitNodes pass
to transform a function whose PDT has exactly one root
and ends in a branch instruction. Fixes
https://github.com/llvm/llvm-project/issues/58861.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gandhi21299 created this revision.Dec 10 2022, 5:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2022, 5:33 PM

Herald added subscribers: kosarev, foad, kerbowa and 7 others. · View Herald Transcript

gandhi21299 requested review of this revision.Dec 10 2022, 5:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2022, 5:33 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

rebased

gandhi21299 added reviewers: tstellar, foad.Dec 10 2022, 5:35 PM

gandhi21299 added a project: Restricted Project.

Harbormaster completed remote builds in B202419: Diff 481890.Dec 10 2022, 6:55 PM

refactored patch

refactor

Harbormaster completed remote builds in B202427: Diff 481898.Dec 10 2022, 11:40 PM

Thanks for working on this, but I think it is more reasonable to fix the issue early. The function is showing a problem in our cfg lowering passes for AMDGPU. The cfg lowering are being done by several passes. I know it is very tricky and fragile now, so it is not easy to determine what the right fix should be. For this specific case, the input IR does not have a return block, thus bypass the structurizeCFG pass. The AMDGPUUnifyDivergentExits happened before StructurizeCFG already has some ability to handle infinite loops, but this case was skipped because of the check if (PDT.root_size() <= 1). In this case, BB3 is a root in PDT but it is not a true exit. I would suggest we change this. If there is only one root in PostDominatorTree, please check whether it ends with return/unreachable instruction before we can skip the transformation. If the single root in PDT ends with a branch instruction, we should continue the transformation to insert a dummy return block and insert a control flow edge that is never taken with trick like br i1 true, ..., DummyReturn. By doing this, we should be able to lower the cfg correctly.

gandhi21299 added a reviewer: ruiling.Dec 12 2022, 3:17 PM

@ruiling PDT.getRoot() yields BB4. So for PDT with exactly one root, we should only allow the transformation when its terminator is a branch instruction? This causes the following tests to fail due to insertion of dummy blocks:

LLVM :: CodeGen/AMDGPU/agpr-copy-no-free-registers.ll

LLVM :: CodeGen/AMDGPU/branch-relaxation.ll
LLVM :: CodeGen/AMDGPU/cf-loop-on-constant.ll
LLVM :: CodeGen/AMDGPU/control-flow-optnone.ll
LLVM :: CodeGen/AMDGPU/infinite-loop.ll
LLVM :: CodeGen/AMDGPU/kill-infinite-loop.ll
LLVM :: CodeGen/AMDGPU/loop-live-out-copy-undef-subrange.ll
LLVM :: CodeGen/AMDGPU/optimize-negated-cond.ll
LLVM :: CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll
LLVM :: CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll

In D139780#3990936, @gandhi21299 wrote:

@ruiling PDT.getRoot() yields BB4. So for PDT with exactly one root, we should only allow the transformation when its terminator is a branch instruction?

Yes.

This causes the following tests to fail due to insertion of dummy blocks:

I think you just need to update the test checks. It is expected there will be changes in generated code.

corrected solution as suggested by reviewer, updated tests accordingly

gandhi21299 retitled this revision from [AMDGPU] Annotate control flow on visited blocks to [AMDGPU] Unify divergent nodes if the PostDom tree has one root.Dec 13 2022, 11:14 PM

gandhi21299 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B203025: Diff 482724.Dec 14 2022, 12:42 AM

corrected failing test different-addrspace-crash.ll

This comment has been deleted.

llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
657 ↗	(On Diff #482724)	Why there is no DummyReturnBlock for GFX908?

Harbormaster completed remote builds in B203279: Diff 483080.Dec 14 2022, 11:27 PM

I haven't looked closely but some of these test changes look worse

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
192 ↗	(On Diff #483080)	What about switches?

In D139780#4004766, @arsenm wrote:

I haven't looked closely but some of these test changes look worse

We need this change to correctly structurize the function that ends with an infinite loop. Most test changes are minor except unstructured-cfg-def-use-issue.ll. This is mainly because it is compiled incorrectly before. We have to structurize a function if there is a divergent branch.

inserted a check for switch terminator, corrected tests accordingly

gandhi21299 marked an inline comment as done.Jan 2 2023, 4:41 PM

ran update_test_checks

Harbormaster completed remote builds in B205385: Diff 485895.Jan 2 2023, 5:59 PM

In D139780#4022193, @gandhi21299 wrote:

inserted a check for switch terminator, ...

I don't think just checking for switch terminator would work. My suggestion is don't handle switch here. The pass has never been designed to work with switch terminator, and it needs non-trivial work to support switch terminator in this pass. As we already lower switch terminator before this pass, I think it is not important to support switch terminator in this pass now.

Sure, I will revert to the previous revision then.

arsenm accepted this revision.Jan 3 2023, 9:34 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109 ↗	(On Diff #485895)	We should have a required LowerSwitchID too

This revision is now accepted and ready to land.Jan 3 2023, 9:34 AM

Rebased and removed switch inst check

Thanks Matt for the approval! How does the patch look @ruiling ?

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109 ↗	(On Diff #485895)	I will have a seperate patch for that, it seems to be causing difficulties when the pass manager schedules UnifyDivergentExitNodes.

ruiling added inline comments.Jan 3 2023, 9:02 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109 ↗	(On Diff #485895)	I think for function pass dependency or pass ordering, I still prefer they are managed by compiler developer. If I remember correctly, the new pass manager does not support dependency between function passes?
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
657 ↗	(On Diff #482724)	Did you try to get the answer for the question? It sounds strange we get different behavior for gfx908 and gfx90A here.

arsenm added inline comments.Jan 3 2023, 9:07 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109 ↗	(On Diff #485895)	The important part is verification. We shouldn't have arbitrary pass contracts not enforced by a verifier

Harbormaster completed remote builds in B205596: Diff 486157.Jan 3 2023, 9:10 PM

ruiling added inline comments.Jan 3 2023, 10:07 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109 ↗	(On Diff #485895)	I agree that pass contracts or assumption should be enforced by verification. For this specific issue, can we verify within this pass that a terminator should not be SwitchInst?

gandhi21299 added inline comments.Jan 3 2023, 10:16 PM

llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
657 ↗	(On Diff #482724)	In gfx908, the block is eliminated much later in the pipeline which does not happen in gfx90a.

I think it is good to go. Thanks!

llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
657 ↗	(On Diff #482724)	Thanks for taking a second look!

arsenm accepted this revision.Jan 4 2023, 7:19 AM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
109 ↗	(On Diff #485895)	That's what I was asking for for switch handling (should also worry about indirectbr, caller and invoke)

This revision was landed with ongoing or failed builds.Jan 4 2023, 9:45 AM

Closed by commit rG4bbcbdaee5c9: [AMDGPU] Unify divergent nodes if the PostDom tree has one root (authored by gandhi21299). · Explain Why

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rG4bbcbdaee5c9: [AMDGPU] Unify divergent nodes if the PostDom tree has one root.

ruiling mentioned this in D118250: AMDGPU: Mark control flow intrinsics non-duplicable.Feb 1 2023, 10:25 PM

ruiling mentioned this in rGbe3f4591aff0: AMDGPU: Mark control flow intrinsics non-duplicable.Feb 5 2023, 11:34 PM

Diff 481889

llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp

//===- SIAnnotateControlFlow.cpp ------------------------------------------===//		//===- SIAnnotateControlFlow.cpp ------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// Annotates the control flow with hardware specific intrinsics.		/// Annotates the control flow with hardware specific intrinsics.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "GCNSubtarget.h"		#include "GCNSubtarget.h"
		#include "llvm/ADT/DenseSet.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines

/// Is BB the last block saved on the stack ?		/// Is BB the last block saved on the stack ?
bool SIAnnotateControlFlow::isTopOfStack(BasicBlock *BB) {		bool SIAnnotateControlFlow::isTopOfStack(BasicBlock *BB) {
return !Stack.empty() && Stack.back().first == BB;		return !Stack.empty() && Stack.back().first == BB;
}		}

/// Pop the last saved value from the control flow stack		/// Pop the last saved value from the control flow stack
Value *SIAnnotateControlFlow::popSaved() {		Value *SIAnnotateControlFlow::popSaved() {
return Stack.pop_back_val().second;		auto [x, y] = Stack.pop_back_val();
		// dbgs() << "popping off <" << x->getName() << ", " << *y << ">\n";
		return y;
}		}

/// Push a BB and saved value to the control flow stack		/// Push a BB and saved value to the control flow stack
void SIAnnotateControlFlow::push(BasicBlock BB, Value Saved) {		void SIAnnotateControlFlow::push(BasicBlock BB, Value Saved) {
Stack.push_back(std::make_pair(BB, Saved));		Stack.push_back(std::make_pair(BB, Saved));
}		}

/// Can the condition represented by this PHI node treated like		/// Can the condition represented by this PHI node treated like
Show All 35 Lines
/// Open a new "If" block		/// Open a new "If" block
bool SIAnnotateControlFlow::openIf(BranchInst *Term) {		bool SIAnnotateControlFlow::openIf(BranchInst *Term) {
if (isUniform(Term))		if (isUniform(Term))
return false;		return false;

Value *Ret = CallInst::Create(If, Term->getCondition(), "", Term);		Value *Ret = CallInst::Create(If, Term->getCondition(), "", Term);
Term->setCondition(ExtractValueInst::Create(Ret, 0, "", Term));		Term->setCondition(ExtractValueInst::Create(Ret, 0, "", Term));
push(Term->getSuccessor(1), ExtractValueInst::Create(Ret, 1, "", Term));		push(Term->getSuccessor(1), ExtractValueInst::Create(Ret, 1, "", Term));
		// dbgs() << "Stack size " << Stack.size() << '\n';
return true;		return true;
}		}

/// Close the last "If" block and open a new "Else" block		/// Close the last "If" block and open a new "Else" block
bool SIAnnotateControlFlow::insertElse(BranchInst *Term) {		bool SIAnnotateControlFlow::insertElse(BranchInst *Term) {
if (isUniform(Term)) {		if (isUniform(Term)) {
return false;		return false;
}		}
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	for (BasicBlock *Pred : predecessors(Target)) {
// of the loop at BB, it should not reset or change "Broken", which keeps		// of the loop at BB, it should not reset or change "Broken", which keeps
// track of the number of threads exited the loop at BB.		// track of the number of threads exited the loop at BB.
else if (L->contains(Pred) && DT->dominates(Pred, BB))		else if (L->contains(Pred) && DT->dominates(Pred, BB))
PHIValue = Broken;		PHIValue = Broken;
Broken->addIncoming(PHIValue, Pred);		Broken->addIncoming(PHIValue, Pred);
}		}

Term->setCondition(CallInst::Create(Loop, Arg, "", Term));		Term->setCondition(CallInst::Create(Loop, Arg, "", Term));

push(Term->getSuccessor(0), Arg);		push(Term->getSuccessor(0), Arg);

return true;		return true;
}		}

/// Close the last opened control flow		/// Close the last opened control flow
bool SIAnnotateControlFlow::closeControlFlow(BasicBlock *BB) {		bool SIAnnotateControlFlow::closeControlFlow(BasicBlock *BB) {
llvm::Loop *L = LI->getLoopFor(BB);		llvm::Loop *L = LI->getLoopFor(BB);

assert(Stack.back().first == BB);		assert(Stack.back().first == BB);

if (L && L->getHeader() == BB) {		if (L && L->getHeader() == BB) {
// We can't insert an EndCF call into a loop header, because it will		// We can't insert an EndCF call into a loop header, because it will
// get executed on every iteration of the loop, when it should be		// get executed on every iteration of the loop, when it should be
// executed only once before the loop.		// executed only once before the loop.
SmallVector <BasicBlock *, 8> Latches;		SmallVector <BasicBlock *, 8> Latches;
L->getLoopLatches(Latches);		L->getLoopLatches(Latches);
Show All 14 Lines	if (!isa<UndefValue>(Exec) && !isa<UnreachableInst>(FirstInsertionPt)) {
Instruction *ExecDef = cast<Instruction>(Exec);		Instruction *ExecDef = cast<Instruction>(Exec);
BasicBlock *DefBB = ExecDef->getParent();		BasicBlock *DefBB = ExecDef->getParent();
if (!DT->dominates(DefBB, BB)) {		if (!DT->dominates(DefBB, BB)) {
// Split edge to make Def dominate Use		// Split edge to make Def dominate Use
FirstInsertionPt = &*SplitEdge(DefBB, BB, DT, LI)->getFirstInsertionPt();		FirstInsertionPt = &*SplitEdge(DefBB, BB, DT, LI)->getFirstInsertionPt();
}		}
CallInst::Create(EndCf, Exec, "", FirstInsertionPt);		CallInst::Create(EndCf, Exec, "", FirstInsertionPt);
}		}

return true;		return true;
}		}

/// Annotate the control flow with intrinsics so the backend can		/// Annotate the control flow with intrinsics so the backend can
/// recognize if/then/else and loops.		/// recognize if/then/else and loops.
bool SIAnnotateControlFlow::runOnFunction(Function &F) {		bool SIAnnotateControlFlow::runOnFunction(Function &F) {

DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
DA = &getAnalysis<LegacyDivergenceAnalysis>();		DA = &getAnalysis<LegacyDivergenceAnalysis>();
TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();		TargetPassConfig &TPC = getAnalysis<TargetPassConfig>();
const TargetMachine &TM = TPC.getTM<TargetMachine>();		const TargetMachine &TM = TPC.getTM<TargetMachine>();

bool Changed = false;		bool Changed = false;
initialize(*F.getParent(), TM.getSubtarget<GCNSubtarget>(F));		initialize(*F.getParent(), TM.getSubtarget<GCNSubtarget>(F));
for (df_iterator<BasicBlock *> I = df_begin(&F.getEntryBlock()),		for (df_iterator<BasicBlock *> I = df_begin(&F.getEntryBlock()),
E = df_end(&F.getEntryBlock()); I != E; ++I) {		E = df_end(&F.getEntryBlock()); I != E; ++I) {
BasicBlock BB = I;		BasicBlock BB = I;
BranchInst *Term = dyn_cast<BranchInst>(BB->getTerminator());		BranchInst *Term = dyn_cast<BranchInst>(BB->getTerminator());

if (!Term \|\| Term->isUnconditional()) {		if (!Term \|\| Term->isUnconditional()) {
if (isTopOfStack(BB))		if (isTopOfStack(BB)) {
Changed \|= closeControlFlow(BB);		Changed \|= closeControlFlow(BB);
		}
continue;		continue;
}		}

if (I.nodeVisited(Term->getSuccessor(1))) {		if (I.nodeVisited(Term->getSuccessor(1))) {
if (isTopOfStack(BB))		if (isTopOfStack(BB)) {
Changed \|= closeControlFlow(BB);		Changed \|= closeControlFlow(BB);
		}

if (DT->dominates(Term->getSuccessor(1), BB))		if (DT->dominates(Term->getSuccessor(1), BB)) {
Changed \|= handleLoop(Term);		if (handleLoop(Term) && I.nodeVisited(Stack.back().first)) {
		Changed = true;
		closeControlFlow(Stack.back().first);
		}
		}
continue;		continue;
}		}

if (isTopOfStack(BB)) {		if (isTopOfStack(BB)) {
PHINode *Phi = dyn_cast<PHINode>(Term->getCondition());		PHINode *Phi = dyn_cast<PHINode>(Term->getCondition());
if (Phi && Phi->getParent() == BB && isElse(Phi) && !hasKill(BB)) {		if (Phi && Phi->getParent() == BB && isElse(Phi) && !hasKill(BB)) {
Changed \|= insertElse(Term);		Changed \|= insertElse(Term);
Changed \|= eraseIfUnused(Phi);		Changed \|= eraseIfUnused(Phi);
continue;		continue;
}		}

Changed \|= closeControlFlow(BB);		Changed \|= closeControlFlow(BB);
}		}

Changed \|= openIf(Term);		Changed \|= openIf(Term);
}		}

		// A basic block may be pushed as a result of loop handling
		// and could never be processed if it was already visited before.
		// Close control flow at this basic block.
		// if (Stack.size() == 1)
		// closeControlFlow(Stack.back().first);

if (!Stack.empty()) {		if (!Stack.empty()) {
// CFG was probably not structured.		// CFG was probably not structured.
report_fatal_error("failed to annotate CFG");		report_fatal_error("failed to annotate CFG");
}		}

return Changed;		return Changed;
}		}

/// Create the annotation pass		/// Create the annotation pass
FunctionPass *llvm::createSIAnnotateControlFlowPass() {		FunctionPass *llvm::createSIAnnotateControlFlowPass() {
return new SIAnnotateControlFlow();		return new SIAnnotateControlFlow();
}		}

llvm/test/CodeGen/AMDGPU/si-annotate-nested-control-flows.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-- -S -structurizecfg -si-annotate-control-flow %s \| FileCheck %s

				define void @nested_inf_loop(i1 %0, i1 %1) {
				; CHECK-LABEL: define void @nested_inf_loop(
				; CHECK-NEXT: BB:
				; CHECK-NEXT: br label %BB1
				; CHECK: BB1: ; preds = %BB3, %BB
				; CHECK-NEXT: %2 = call { i1, i64 } @llvm.amdgcn.if.i64(i1 %0)
				; CHECK-NEXT: %3 = extractvalue { i1, i64 } %2, 0
				; CHECK-NEXT: %4 = extractvalue { i1, i64 } %2, 1
				; CHECK-NEXT: br i1 %3, label %BB3, label %BB2
				; CHECK: BB2: ; preds = %BB1
				; CHECK-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %4)
				; CHECK-NEXT: br label %BB4
				; CHECK: BB4: ; preds = %BB4, %BB2
				; CHECK-NEXT: %phi.broken = phi i64 [ %5, %BB4 ], [ 0, %BB2 ]
				; CHECK-NEXT: %5 = call i64 @llvm.amdgcn.if.break.i64(i1 %1, i64 %phi.broken)
				; CHECK-NEXT: %6 = call i1 @llvm.amdgcn.loop.i64(i64 %5)
				; CHECK-NEXT: br i1 %6, label %BB4.BB3_crit_edge, label %BB4
				; CHECK: BB4.BB3_crit_edge: ; preds = %BB4
				; CHECK-NEXT: call void @llvm.amdgcn.end.cf.i64(i64 %5)
				; CHECK-NEXT: br label %BB3
				; CHECK: BB3: ; preds = %BB4.BB3_crit_edge, %BB1
				; CHECK-NEXT: br label %BB1
				;
				BB:
				br label %BB1

				BB1:
				br i1 %0, label %BB3, label %BB2

				BB2:
				br label %BB4

				BB4:
				br i1 %1, label %BB3, label %BB4

				BB3:
				br label %BB1
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Unify divergent nodes if the PostDom tree has one root
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 481889

llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp

llvm/test/CodeGen/AMDGPU/si-annotate-nested-control-flows.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Unify divergent nodes if the PostDom tree has one rootClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 481889

llvm/lib/Target/AMDGPU/SIAnnotateControlFlow.cpp

llvm/test/CodeGen/AMDGPU/si-annotate-nested-control-flows.ll

[AMDGPU] Unify divergent nodes if the PostDom tree has one root
ClosedPublic