This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUTargetMachine.cpp
1/1
AMDGPUUnifyDivergentExitNodes.h
17/19
AMDGPUUnifyDivergentExitNodes.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
si-annotate-nested-control-flows.ll

Differential D141355

[AMDGPUUnifyDivergentExitNodes] Add NewPM support
ClosedPublic

Authored by gandhi21299 on Jan 9 2023, 10:05 PM.

Download Raw Diff

Details

Reviewers

arsenm
lebedev.ri
foad
bcahoon
sameerds
vitalybuka

Commits

rGb48e7c2d01a3: [AMDGPUUnifyDivergentExitNodes] Add NewPM support
rGa5455e32b364: [AMDGPUUnifyDivergentExitNodes] Add NewPM support

Summary

Meanwhile, use UniformityAnalysis instead of LegacyDivergenceAnalysis to collect divergence info.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gandhi21299 created this revision.Jan 9 2023, 10:05 PM

Herald added subscribers: kosarev, foad, kerbowa and 3 others. · View Herald TranscriptJan 9 2023, 10:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2023, 10:05 PM

gandhi21299 requested review of this revision.Jan 9 2023, 10:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2023, 10:05 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

gandhi21299 retitled this revision from PATCH to [AMDGPU] Add NewPM support to AMDGPUUnifyDivergentExitNodes pass.Jan 9 2023, 10:07 PM

Herald added subscribers: tpr, dstuttard, yaxunl and 2 others. · View Herald TranscriptJan 9 2023, 10:07 PM

added checks for the opt command in si-annotate-nested-control-flow.ll
applied clang-format

gandhi21299 added reviewers: arsenm, lebedev.ri, foad, bcahoon.Jan 9 2023, 10:25 PM

gandhi21299 added a subscriber: Restricted Project.

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 9 2023, 10:25 PM

Harbormaster completed remote builds in B206710: Diff 487684.Jan 9 2023, 11:11 PM

arsenm added inline comments.Jan 10 2023, 5:58 AM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
126–127	Not sure why you really need to change this, but the function should probably be a template argument. Is DA just missing a layer to present the PM independent analysis?

gandhi21299 added inline comments.Jan 10 2023, 9:21 AM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
126–127	The LegacyPM version depends on `LegacyDivergenceAnalysis` info whereas the NewPM version of this pass depends on `DivergenceInfo`.

arsenm added inline comments.Jan 10 2023, 9:41 AM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
126–127	There should be a common logic DivergenceInfo that both pass versions export

isUniformlyReached now accepts a template argument instead of a std::function to allow both pass managers to use their corresponding DivergenceAnalysis info.
DivergenceAnalysis::isUniform(Value &V) -> DivergenceAnalysis::isUniform(Value *V)

Rebased and applied clang-format

Harbormaster completed remote builds in B206842: Diff 487870.Jan 10 2023, 12:33 PM

Rebased and initialized TTI = nullptr

Harbormaster completed remote builds in B207206: Diff 488369.Jan 11 2023, 4:40 PM

gandhi21299 added inline comments.Jan 12 2023, 11:15 AM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
126–127	How do these changes look @arsenm ?

arsenm added inline comments.Jan 12 2023, 11:57 AM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
126–127	I'm still confused. LegacyDivergenceAnalysis and DivergenceAnalysis/DivergenceInfo are two different analyses. LegacyDivergenceAnalysis is not the Legacy pass manager version of DivergenceAnalysis. New pass manager shouldn't invisibly switch the analysis used. I think LegacyDivergenceAnalysis needs to be ported to new PM first

gandhi21299 added inline comments.Jan 14 2023, 2:54 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
126–127	Ahh I guess I got confused with the names as well. Sure, I will port LegacyDivergenceAnalysis to newPM first.

gandhi21299 planned changes to this revision.Jan 14 2023, 3:01 PM

Rebased

Added missing newline

gandhi21299 added a reviewer: sameerds.Jan 22 2023, 9:10 PM

minor nits

At this point, there are three analyses as follows:

LegacyDA is actually wrong, but useful in some cases involving irreducible control flow.
GpuDA is correct, but treats irreducible control flow very conservatively.
UniformityAnalysis is both correct and better at irreducible control flow.

Note that in almost every use on AMDGPU, LegacyDA simply acts as a wrapper for GpuDA.

Since this is fresh work, why not just go straight to UniformityAnalysis? I would definitely recommend that because not only does go straight to new stuff, it also avoids the unnecessary template. This would be a good time to use the new analysis. And if issues crop up that can't be fixed, then we should fall back no further than GpuDA, and unify their exposed Info objects along the way.

llvm/include/llvm/Analysis/DivergenceAnalysis.h
177 ↗	(On Diff #491230)	Why is this change necessary? The point of accepting a reference is to indicate that nullptr is not expected inside the function.

sameerds added inline comments.Jan 22 2023, 11:31 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
126–127	No! Don't even try. It was considered and discarded long ago.
339	As already noted, this legacy is not related to the pass manager. Just use the same DA in both.

Harbormaster completed remote builds in B209273: Diff 491230.Jan 22 2023, 11:37 PM

tsymalla added a subscriber: tsymalla.Jan 22 2023, 11:57 PM

tsymalla added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
126	Can you give a more meaningful name to the template argument? For instance, `DivergenceAnalysisType`?

gandhi21299 planned changes to this revision.Jan 24 2023, 10:52 AM

ruiling added a subscriber: ruiling.Feb 2 2023, 7:01 AM

ruiling added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
189–191	I think you just removed this accidentally.

I will create a new patch to replace LegacyDivergenceAnalysis with Uniformity analysis, unless it's too early to do so.

Rebased with uniformity analysis dependency
Removed template param
Refactors

gandhi21299 marked 5 inline comments as done.Mar 13 2023, 8:05 PM

Harbormaster completed remote builds in B219229: Diff 504919.Mar 13 2023, 8:53 PM

gandhi21299 retitled this revision from [AMDGPU] Add NewPM support to AMDGPUUnifyDivergentExitNodes pass to [AMDGPUUnifyDivergentExitNodes] Add NewPM support.Mar 13 2023, 9:39 PM

sameerds added inline comments.Mar 13 2023, 9:53 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
189–191	Was this comment addressed?
197	Either declare parameters as references, or implement checks for nullptr in the function body. Also, can any of these be const?
337–338	If you know that a variable cannot be nullptr, it is preferable to declare it as a reference rather than a pointer. It can be converted into a pointer at the point where some function expects a pointer. For example, the call below can be "whatever.run(&DT, &PDT, UA)".

gandhi21299 added inline comments.Mar 13 2023, 10:50 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
189–191	yup, I moved it under `AMDGPUUnifyDivergentExitNodesImpl::run(..)`

gandhi21299 added inline comments.Mar 13 2023, 10:53 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
197	`DominatorTree` may be updated but `PostDominatorTree` and `UniformityInfo` are preserved.

Addressed reviewer comments

Harbormaster completed remote builds in B219254: Diff 504953.Mar 14 2023, 12:22 AM

The commit message should make it clear that you're also changing it to use UniformityAnalysis instead of (Legacy?)DivergenceAnalysis.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.h
58	It looks like only this class needs to be declared in the `.h` file. The others could stay in the `.cpp` file.

gandhi21299 marked 7 inline comments as done.Mar 14 2023, 8:38 AM

gandhi21299 edited the summary of this revision. (Show Details)Mar 14 2023, 10:56 AM

Move AMDGPUUnifyDivergentExitNodesImpl and AMDGPUUnifyDivergentExitNodes classes to the source file from the header file.

Move includes from header to source

do not need to include PassManager
TTI should be private instead of protected in the Impl

Harbormaster completed remote builds in B219419: Diff 505187.Mar 14 2023, 12:56 PM

LGTM, but wait for a day to let other reviewers comment.

This revision is now accepted and ready to land.Mar 14 2023, 11:01 PM

Internal CI passed.

arsenm accepted this revision.Mar 15 2023, 3:56 PM

Closed by commit rGa5455e32b364: [AMDGPUUnifyDivergentExitNodes] Add NewPM support (authored by gandhi21299). · Explain WhyMar 16 2023, 9:13 AM

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rGa5455e32b364: [AMDGPUUnifyDivergentExitNodes] Add NewPM support.

vitalybuka reopened this revision.Mar 16 2023, 7:02 PM

vitalybuka added a subscriber: vitalybuka.

vitalybuka added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
342	DT can be null where, which is UB https://lab.llvm.org/buildbot/#/builders/5/builds/32228/steps/16/logs/stdio

This revision is now accepted and ready to land.Mar 16 2023, 7:02 PM

vitalybuka added a reverting change: rGaa15fe98b64e: Revert "[AMDGPUUnifyDivergentExitNodes] Add NewPM support".Mar 16 2023, 7:10 PM

Impl::run(..) now accepts DominatorTree* instead of DominatorTree& to avoid undefined behavior due to the possibility of DT == nullptr.

gandhi21299 added a reviewer: vitalybuka.Mar 24 2023, 1:13 PM

gandhi21299 added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
342	Great catch! How does it look now?

arsenm added inline comments.Mar 24 2023, 1:17 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
192–195	When is DT null but PDT won't be?

gandhi21299 added inline comments.Mar 24 2023, 1:28 PM

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
192–195	This pass depends on PDT so it can never be null. DT may be null if it does not need to be preserved, which is specified by `-simplifycfg-require-and-preserve-domtree`.

Harbormaster completed remote builds in B221671: Diff 508203.Mar 24 2023, 1:50 PM

arsenm accepted this revision.Mar 24 2023, 7:21 PM

Closed by commit rGb48e7c2d01a3: [AMDGPUUnifyDivergentExitNodes] Add NewPM support (authored by gandhi21299). · Explain WhyMar 25 2023, 1:05 PM

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rGb48e7c2d01a3: [AMDGPUUnifyDivergentExitNodes] Add NewPM support.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

5 lines

AMDGPUUnifyDivergentExitNodes.h

31 lines

AMDGPUUnifyDivergentExitNodes.cpp

71 lines

test/

CodeGen/

AMDGPU/

si-annotate-nested-control-flows.ll

72 lines

Diff 508336

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show All 16 Lines
#include "AMDGPUAliasAnalysis.h"		#include "AMDGPUAliasAnalysis.h"
#include "AMDGPUCtorDtorLowering.h"		#include "AMDGPUCtorDtorLowering.h"
#include "AMDGPUExportClustering.h"		#include "AMDGPUExportClustering.h"
#include "AMDGPUIGroupLP.h"		#include "AMDGPUIGroupLP.h"
#include "AMDGPUMacroFusion.h"		#include "AMDGPUMacroFusion.h"
#include "AMDGPURegBankSelect.h"		#include "AMDGPURegBankSelect.h"
#include "AMDGPUTargetObjectFile.h"		#include "AMDGPUTargetObjectFile.h"
#include "AMDGPUTargetTransformInfo.h"		#include "AMDGPUTargetTransformInfo.h"
		#include "AMDGPUUnifyDivergentExitNodes.h"
#include "GCNIterativeScheduler.h"		#include "GCNIterativeScheduler.h"
#include "GCNSchedStrategy.h"		#include "GCNSchedStrategy.h"
#include "GCNVOPDUtils.h"		#include "GCNVOPDUtils.h"
#include "R600.h"		#include "R600.h"
#include "R600MachineFunctionInfo.h"		#include "R600MachineFunctionInfo.h"
#include "R600TargetMachine.h"		#include "R600TargetMachine.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "SIMachineScheduler.h"		#include "SIMachineScheduler.h"
▲ Show 20 Lines • Show All 617 Lines • ▼ Show 20 Lines	PB.registerPipelineParsingCallback(
if (PassName == "amdgpu-propagate-attributes-early") {		if (PassName == "amdgpu-propagate-attributes-early") {
PM.addPass(AMDGPUPropagateAttributesEarlyPass(*this));		PM.addPass(AMDGPUPropagateAttributesEarlyPass(*this));
return true;		return true;
}		}
if (PassName == "amdgpu-promote-kernel-arguments") {		if (PassName == "amdgpu-promote-kernel-arguments") {
PM.addPass(AMDGPUPromoteKernelArgumentsPass());		PM.addPass(AMDGPUPromoteKernelArgumentsPass());
return true;		return true;
}		}
		if (PassName == "amdgpu-unify-divergent-exit-nodes") {
		PM.addPass(AMDGPUUnifyDivergentExitNodesPass());
		return true;
		}
return false;		return false;
});		});

PB.registerAnalysisRegistrationCallback([](FunctionAnalysisManager &FAM) {		PB.registerAnalysisRegistrationCallback([](FunctionAnalysisManager &FAM) {
FAM.registerPass([&] { return AMDGPUAA(); });		FAM.registerPass([&] { return AMDGPUAA(); });
});		});

PB.registerParseAACallback([](StringRef AAName, AAManager &AAM) {		PB.registerParseAACallback([](StringRef AAName, AAManager &AAM) {
▲ Show 20 Lines • Show All 951 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.h

This file was added.

				//===- AMDGPUUnifyDivergentExitNodes.h ------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This is a variant of the UnifyFunctionExitNodes pass. Rather than ensuring
				// there is at most one ret and one unreachable instruction, it ensures there is
				// at most one divergent exiting block.
				//
				// StructurizeCFG can't deal with multi-exit regions formed by branches to
				// multiple return nodes. It is not desirable to structurize regions with
				// uniform branches, so unifying those to the same return block as divergent
				// branches inhibits use of scalar branching. It still can't deal with the case
				// where one branch goes to return, and one unreachable. Replace unreachable in
				// this case with a return.
				//
				//===----------------------------------------------------------------------===//

				#include "AMDGPU.h"

				namespace llvm {
				class AMDGPUUnifyDivergentExitNodesPass
				: public PassInfoMixin<AMDGPUUnifyDivergentExitNodesPass> {
				public:
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
				};

				} // end namespace llvm
				foadUnsubmitted Done Reply Inline Actions It looks like only this class needs to be declared in the `.h` file. The others could stay in the `.cpp` file. foad: It looks like only this class needs to be declared in the `.h` file. The others could stay in…

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp

Show All 13 Lines
// multiple return nodes. It is not desirable to structurize regions with		// multiple return nodes. It is not desirable to structurize regions with
// uniform branches, so unifying those to the same return block as divergent		// uniform branches, so unifying those to the same return block as divergent
// branches inhibits use of scalar branching. It still can't deal with the case		// branches inhibits use of scalar branching. It still can't deal with the case
// where one branch goes to return, and one unreachable. Replace unreachable in		// where one branch goes to return, and one unreachable. Replace unreachable in
// this case with a return.		// this case with a return.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "AMDGPUUnifyDivergentExitNodes.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
#include "SIDefines.h"		#include "SIDefines.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/DomTreeUpdater.h"		#include "llvm/Analysis/DomTreeUpdater.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
Show All 18 Lines
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "amdgpu-unify-divergent-exit-nodes"		#define DEBUG_TYPE "amdgpu-unify-divergent-exit-nodes"

namespace {		namespace {

class AMDGPUUnifyDivergentExitNodes : public FunctionPass {		class AMDGPUUnifyDivergentExitNodesImpl {
private:		private:
const TargetTransformInfo *TTI = nullptr;		const TargetTransformInfo *TTI = nullptr;

public:		public:
static char ID; // Pass identification, replacement for typeid		AMDGPUUnifyDivergentExitNodesImpl() = delete;
		AMDGPUUnifyDivergentExitNodesImpl(const TargetTransformInfo *TTI)
AMDGPUUnifyDivergentExitNodes() : FunctionPass(ID) {		: TTI(TTI) {}
initializeAMDGPUUnifyDivergentExitNodesPass(*PassRegistry::getPassRegistry());
}

// We can preserve non-critical-edgeness when we unify function exit nodes		// We can preserve non-critical-edgeness when we unify function exit nodes
void getAnalysisUsage(AnalysisUsage &AU) const override;
BasicBlock *unifyReturnBlockSet(Function &F, DomTreeUpdater &DTU,		BasicBlock *unifyReturnBlockSet(Function &F, DomTreeUpdater &DTU,
ArrayRef<BasicBlock *> ReturningBlocks,		ArrayRef<BasicBlock *> ReturningBlocks,
StringRef Name);		StringRef Name);
bool runOnFunction(Function &F) override;		bool run(Function &F, DominatorTree *DT, const PostDominatorTree &PDT,
		const UniformityInfo &UA);
};		};

		class AMDGPUUnifyDivergentExitNodes : public FunctionPass {
		public:
		static char ID;
		AMDGPUUnifyDivergentExitNodes() : FunctionPass(ID) {
		initializeAMDGPUUnifyDivergentExitNodesPass(
		*PassRegistry::getPassRegistry());
		}
		void getAnalysisUsage(AnalysisUsage &AU) const override;
		bool runOnFunction(Function &F) override;
		};
} // end anonymous namespace		} // end anonymous namespace

char AMDGPUUnifyDivergentExitNodes::ID = 0;		char AMDGPUUnifyDivergentExitNodes::ID = 0;

char &llvm::AMDGPUUnifyDivergentExitNodesID = AMDGPUUnifyDivergentExitNodes::ID;		char &llvm::AMDGPUUnifyDivergentExitNodesID = AMDGPUUnifyDivergentExitNodes::ID;

INITIALIZE_PASS_BEGIN(AMDGPUUnifyDivergentExitNodes, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(AMDGPUUnifyDivergentExitNodes, DEBUG_TYPE,
"Unify divergent function exit nodes", false, false)		"Unify divergent function exit nodes", false, false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(UniformityInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(UniformityInfoWrapperPass)
INITIALIZE_PASS_END(AMDGPUUnifyDivergentExitNodes, DEBUG_TYPE,		INITIALIZE_PASS_END(AMDGPUUnifyDivergentExitNodes, DEBUG_TYPE,
"Unify divergent function exit nodes", false, false)		"Unify divergent function exit nodes", false, false)

void AMDGPUUnifyDivergentExitNodes::getAnalysisUsage(AnalysisUsage &AU) const{		void AMDGPUUnifyDivergentExitNodes::getAnalysisUsage(AnalysisUsage &AU) const {
if (RequireAndPreserveDomTree)		if (RequireAndPreserveDomTree)
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();

AU.addRequired<PostDominatorTreeWrapperPass>();		AU.addRequired<PostDominatorTreeWrapperPass>();

AU.addRequired<UniformityInfoWrapperPass>();		AU.addRequired<UniformityInfoWrapperPass>();

if (RequireAndPreserveDomTree) {		if (RequireAndPreserveDomTree) {
Show All 11 Lines	void AMDGPUUnifyDivergentExitNodes::getAnalysisUsage(AnalysisUsage &AU) const {
AU.addPreservedID(LowerSwitchID);		AU.addPreservedID(LowerSwitchID);
FunctionPass::getAnalysisUsage(AU);		FunctionPass::getAnalysisUsage(AU);

AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
}		}

/// \returns true if \p BB is reachable through only uniform branches.		/// \returns true if \p BB is reachable through only uniform branches.
/// XXX - Is there a more efficient way to find this?		/// XXX - Is there a more efficient way to find this?
static bool isUniformlyReached(const UniformityInfo &UA, BasicBlock &BB) {		static bool isUniformlyReached(const UniformityInfo &UA, BasicBlock &BB) {
		tsymallaUnsubmitted Done Reply Inline Actions Can you give a more meaningful name to the template argument? For instance, `DivergenceAnalysisType`? tsymalla: Can you give a more meaningful name to the template argument? For instance…
SmallVector<BasicBlock *, 8> Stack(predecessors(&BB));		SmallVector<BasicBlock *, 8> Stack(predecessors(&BB));
		arsenmUnsubmitted Done Reply Inline Actions Not sure why you really need to change this, but the function should probably be a template argument. Is DA just missing a layer to present the PM independent analysis? arsenm: Not sure why you really need to change this, but the function should probably be a template…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions The LegacyPM version depends on `LegacyDivergenceAnalysis` info whereas the NewPM version of this pass depends on `DivergenceInfo`. gandhi21299: The LegacyPM version depends on `LegacyDivergenceAnalysis` info whereas the NewPM version of…
		arsenmUnsubmitted Done Reply Inline Actions There should be a common logic DivergenceInfo that both pass versions export arsenm: There should be a common logic DivergenceInfo that both pass versions export
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions How do these changes look @arsenm ? gandhi21299: How do these changes look @arsenm ?
		arsenmUnsubmitted Done Reply Inline Actions I'm still confused. LegacyDivergenceAnalysis and DivergenceAnalysis/DivergenceInfo are two different analyses. LegacyDivergenceAnalysis is not the Legacy pass manager version of DivergenceAnalysis. New pass manager shouldn't invisibly switch the analysis used. I think LegacyDivergenceAnalysis needs to be ported to new PM first arsenm: I'm still confused. LegacyDivergenceAnalysis and DivergenceAnalysis/DivergenceInfo are two…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Ahh I guess I got confused with the names as well. Sure, I will port LegacyDivergenceAnalysis to newPM first. gandhi21299: Ahh I guess I got confused with the names as well. Sure, I will port LegacyDivergenceAnalysis…
		sameerdsUnsubmitted Done Reply Inline Actions No! Don't even try. It was considered and discarded long ago. sameerds: No! Don't even try. It was considered and discarded long ago.
SmallPtrSet<BasicBlock *, 8> Visited;		SmallPtrSet<BasicBlock *, 8> Visited;

while (!Stack.empty()) {		while (!Stack.empty()) {
BasicBlock *Top = Stack.pop_back_val();		BasicBlock *Top = Stack.pop_back_val();
if (!UA.isUniform(Top->getTerminator()))		if (!UA.isUniform(Top->getTerminator()))
return false;		return false;

for (BasicBlock *Pred : predecessors(Top)) {		for (BasicBlock *Pred : predecessors(Top)) {
if (Visited.insert(Pred).second)		if (Visited.insert(Pred).second)
Stack.push_back(Pred);		Stack.push_back(Pred);
}		}
}		}

return true;		return true;
}		}

BasicBlock *AMDGPUUnifyDivergentExitNodes::unifyReturnBlockSet(		BasicBlock *AMDGPUUnifyDivergentExitNodesImpl::unifyReturnBlockSet(
Function &F, DomTreeUpdater &DTU, ArrayRef<BasicBlock *> ReturningBlocks,		Function &F, DomTreeUpdater &DTU, ArrayRef<BasicBlock *> ReturningBlocks,
StringRef Name) {		StringRef Name) {
// Otherwise, we need to insert a new basic block into the function, add a PHI		// Otherwise, we need to insert a new basic block into the function, add a PHI
// nodes (if the function returns values), and convert all of the return		// nodes (if the function returns values), and convert all of the return
// instructions into unconditional branches.		// instructions into unconditional branches.
BasicBlock *NewRetBlock = BasicBlock::Create(F.getContext(), Name, &F);		BasicBlock *NewRetBlock = BasicBlock::Create(F.getContext(), Name, &F);
IRBuilder<> B(NewRetBlock);		IRBuilder<> B(NewRetBlock);

Show All 31 Lines	for (BasicBlock *BB : ReturningBlocks) {
// Cleanup possible branch to unconditional branch to the return.		// Cleanup possible branch to unconditional branch to the return.
simplifyCFG(BB, *TTI, RequireAndPreserveDomTree ? &DTU : nullptr,		simplifyCFG(BB, *TTI, RequireAndPreserveDomTree ? &DTU : nullptr,
SimplifyCFGOptions().bonusInstThreshold(2));		SimplifyCFGOptions().bonusInstThreshold(2));
}		}

return NewRetBlock;		return NewRetBlock;
}		}

bool AMDGPUUnifyDivergentExitNodes::runOnFunction(Function &F) {		bool AMDGPUUnifyDivergentExitNodesImpl::run(Function &F, DominatorTree *DT,
DominatorTree *DT = nullptr;		const PostDominatorTree &PDT,
if (RequireAndPreserveDomTree)		const UniformityInfo &UA) {
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();

auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
if (PDT.root_size() == 0 \|\|		if (PDT.root_size() == 0 \|\|
		arsenmUnsubmitted Not Done Reply Inline Actions When is DT null but PDT won't be? arsenm: When is DT null but PDT won't be?
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions This pass depends on PDT so it can never be null. DT may be null if it does not need to be preserved, which is specified by `-simplifycfg-require-and-preserve-domtree`. gandhi21299: This pass depends on PDT so it can never be null. DT may be null if it does not need to be…
(PDT.root_size() == 1 &&		(PDT.root_size() == 1 &&
!isa<BranchInst>(PDT.getRoot()->getTerminator())))		!isa<BranchInst>(PDT.getRoot()->getTerminator())))
ruilingUnsubmitted Done Reply Inline Actions I think you just removed this accidentally. ruiling: I think you just removed this accidentally.
sameerdsUnsubmitted Done Reply Inline Actions Was this comment addressed? sameerds: Was this comment addressed?
gandhi21299AuthorUnsubmitted Done Reply Inline Actions yup, I moved it under `AMDGPUUnifyDivergentExitNodesImpl::run(..)` gandhi21299: yup, I moved it under `AMDGPUUnifyDivergentExitNodesImpl::run(..)`
		sameerdsUnsubmitted Done Reply Inline Actions Either declare parameters as references, or implement checks for nullptr in the function body. Also, can any of these be const? sameerds: Either declare parameters as references, or implement checks for nullptr in the function body.
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions `DominatorTree` may be updated but `PostDominatorTree` and `UniformityInfo` are preserved. gandhi21299: `DominatorTree` may be updated but `PostDominatorTree` and `UniformityInfo` are preserved.
return false;		return false;

UniformityInfo &UA =
getAnalysis<UniformityInfoWrapperPass>().getUniformityInfo();
TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

// Loop over all of the blocks in a function, tracking all of the blocks that		// Loop over all of the blocks in a function, tracking all of the blocks that
// return.		// return.
SmallVector<BasicBlock *, 4> ReturningBlocks;		SmallVector<BasicBlock *, 4> ReturningBlocks;
SmallVector<BasicBlock *, 4> UnreachableBlocks;		SmallVector<BasicBlock *, 4> UnreachableBlocks;

// Dummy return block for infinite loop.		// Dummy return block for infinite loop.
BasicBlock *DummyReturnBB = nullptr;		BasicBlock *DummyReturnBB = nullptr;

▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	if (ReturningBlocks.empty())
return Changed; // No blocks return		return Changed; // No blocks return

if (ReturningBlocks.size() == 1)		if (ReturningBlocks.size() == 1)
return Changed; // Already has a single return block		return Changed; // Already has a single return block

unifyReturnBlockSet(F, DTU, ReturningBlocks, "UnifiedReturnBlock");		unifyReturnBlockSet(F, DTU, ReturningBlocks, "UnifiedReturnBlock");
return true;		return true;
}		}

		bool AMDGPUUnifyDivergentExitNodes::runOnFunction(Function &F) {
		DominatorTree *DT = nullptr;
		if (RequireAndPreserveDomTree)
		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
		const auto &PDT =
		getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
		sameerdsUnsubmitted Done Reply Inline Actions If you know that a variable cannot be nullptr, it is preferable to declare it as a reference rather than a pointer. It can be converted into a pointer at the point where some function expects a pointer. For example, the call below can be "whatever.run(&DT, &PDT, UA)". sameerds: If you know that a variable cannot be nullptr, it is preferable to declare it as a reference…
		const auto &UA = getAnalysis<UniformityInfoWrapperPass>().getUniformityInfo();
		sameerdsUnsubmitted Done Reply Inline Actions As already noted, this legacy is not related to the pass manager. Just use the same DA in both. sameerds: As already noted, this legacy is not related to the pass manager. Just use the same DA in both.
		const auto *TranformInfo =
		&getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
		return AMDGPUUnifyDivergentExitNodesImpl(TranformInfo).run(F, DT, PDT, UA);
		vitalybukaUnsubmitted Not Done Reply Inline Actions DT can be null where, which is UB https://lab.llvm.org/buildbot/#/builders/5/builds/32228/steps/16/logs/stdio vitalybuka: DT can be null where, which is UB https://lab.llvm.
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Great catch! How does it look now? gandhi21299: Great catch! How does it look now?
		}

		PreservedAnalyses
		AMDGPUUnifyDivergentExitNodesPass::run(Function &F,
		FunctionAnalysisManager &AM) {
		DominatorTree *DT = nullptr;
		if (RequireAndPreserveDomTree)
		DT = &AM.getResult<DominatorTreeAnalysis>(F);

		const auto &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
		const auto &UA = AM.getResult<UniformityInfoAnalysis>(F);
		const auto *TransformInfo = &AM.getResult<TargetIRAnalysis>(F);
		return AMDGPUUnifyDivergentExitNodesImpl(TransformInfo).run(F, DT, PDT, UA)
		? PreservedAnalyses::none()
		: PreservedAnalyses::all();
		}

llvm/test/CodeGen/AMDGPU/si-annotate-nested-control-flows.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa %s -o - \| FileCheck %s			; RUN: opt -mtriple=amdgcn-amd-amdhsa -p simplifycfg,amdgpu-unify-divergent-exit-nodes %s -S -o - \| FileCheck %s --check-prefix=OPT
				; RUN: llc -mtriple=amdgcn-amd-amdhsa %s -o - \| FileCheck %s --check-prefix=ISA

	define void @nested_inf_loop(i1 %0, i1 %1) {			define void @nested_inf_loop(i1 %0, i1 %1) {
	; CHECK-LABEL: nested_inf_loop:			; OPT-LABEL: @nested_inf_loop(
	; CHECK-NEXT: %bb.0: ; %BB			; OPT-NEXT: BB:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; OPT-NEXT: br label [[BB1:%.*]]
	; CHECK-NEXT: v_and_b32_e32 v1, 1, v1			; OPT: BB1:
	; CHECK-NEXT: v_and_b32_e32 v0, 1, v0			; OPT-NEXT: [[BRMERGE:%.]] = select i1 [[TMP0:%.]], i1 true, i1 [[TMP1:%.*]]
	; CHECK-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v1			; OPT-NEXT: br i1 [[BRMERGE]], label [[BB1]], label [[INFLOOP:%.*]]
	; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0			; OPT: infloop:
	; CHECK-NEXT: s_xor_b64 s[6:7], vcc, -1			; OPT-NEXT: br i1 true, label [[INFLOOP]], label [[DUMMYRETURNBLOCK:%.*]]
	; CHECK-NEXT: s_mov_b64 s[8:9], 0			; OPT: DummyReturnBlock:
	; CHECK-NEXT: .LBB0_1: ; %BB1			; OPT-NEXT: ret void
	; CHECK: s_and_b64 s[10:11], exec, s[6:7]			;
	; CHECK-NEXT: s_or_b64 s[8:9], s[10:11], s[8:9]			; ISA-LABEL: nested_inf_loop:
	; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]			; ISA-NEXT: %bb.0: ; %BB
	; CHECK-NEXT: s_cbranch_execnz .LBB0_1			; ISA-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: %bb.2: ; %BB2			; ISA-NEXT: v_and_b32_e32 v1, 1, v1
	; CHECK: s_or_b64 exec, exec, s[8:9]			; ISA-NEXT: v_and_b32_e32 v0, 1, v0
	; CHECK-NEXT: s_mov_b64 s[8:9], 0			; ISA-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v1
	; CHECK-NEXT: .LBB0_3: ; %BB4			; ISA-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
	; CHECK: s_and_b64 s[10:11], exec, s[4:5]			; ISA-NEXT: s_xor_b64 s[6:7], vcc, -1
	; CHECK-NEXT: s_or_b64 s[8:9], s[10:11], s[8:9]			; ISA-NEXT: s_mov_b64 s[8:9], 0
	; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]			; ISA-NEXT: .LBB0_1: ; %BB1
	; CHECK-NEXT: s_cbranch_execnz .LBB0_3			; ISA: s_and_b64 s[10:11], exec, s[6:7]
	; CHECK-NEXT: %bb.4: ; %loop.exit.guard			; ISA-NEXT: s_or_b64 s[8:9], s[10:11], s[8:9]
	; CHECK: s_or_b64 exec, exec, s[8:9]			; ISA-NEXT: s_andn2_b64 exec, exec, s[8:9]
	; CHECK-NEXT: s_mov_b64 vcc, 0			; ISA-NEXT: s_cbranch_execnz .LBB0_1
	; CHECK-NEXT: s_mov_b64 s[8:9], 0			; ISA-NEXT: %bb.2: ; %BB2
	; CHECK-NEXT: s_branch .LBB0_1			; ISA: s_or_b64 exec, exec, s[8:9]
	; CHECK-NEXT: %bb.5: ; %DummyReturnBlock			; ISA-NEXT: s_mov_b64 s[8:9], 0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; ISA-NEXT: .LBB0_3: ; %BB4
				; ISA: s_and_b64 s[10:11], exec, s[4:5]
				; ISA-NEXT: s_or_b64 s[8:9], s[10:11], s[8:9]
				; ISA-NEXT: s_andn2_b64 exec, exec, s[8:9]
				; ISA-NEXT: s_cbranch_execnz .LBB0_3
				; ISA-NEXT: %bb.4: ; %loop.exit.guard
				; ISA: s_or_b64 exec, exec, s[8:9]
				; ISA-NEXT: s_mov_b64 vcc, 0
				; ISA-NEXT: s_mov_b64 s[8:9], 0
				; ISA-NEXT: s_branch .LBB0_1
				; ISA-NEXT: %bb.5: ; %DummyReturnBlock
				; ISA-NEXT: s_setpc_b64 s[30:31]
	BB:			BB:
	br label %BB1			br label %BB1

	BB1:			BB1:
	br i1 %0, label %BB3, label %BB2			br i1 %0, label %BB3, label %BB2

	BB2:			BB2:
	br label %BB4			br label %BB4

	BB4:			BB4:
	br i1 %1, label %BB3, label %BB4			br i1 %1, label %BB3, label %BB4

	BB3:			BB3:
	br label %BB1			br label %BB1
	}			}