This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
4/4
AMDGPUAnnotateUniformValues.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
llc-pipeline.ll

Differential D144162

[AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues
ClosedPublic

Authored by gandhi21299 on Feb 15 2023, 9:07 PM.

Download Raw Diff

Details

Reviewers

foad
sameerds
rampitec
tstellar
Pierre-vh

Commits

rGa78301560d8d: [AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

gandhi21299 created this revision.Feb 15 2023, 9:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 9:07 PM

Herald added subscribers: kosarev, StephenFan, kerbowa and 3 others. · View Herald Transcript

gandhi21299 requested review of this revision.Feb 15 2023, 9:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 9:07 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

gandhi21299 added reviewers: sameerds, rampitec, tstellar.Feb 15 2023, 9:10 PM

gandhi21299 added a project: Restricted Project.

gandhi21299 retitled this revision from [AnnotateUniformValues] Replace LegacyDA with Uniformity Analysis to [AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues.Feb 15 2023, 9:13 PM

Herald added subscribers: tpr, dstuttard, yaxunl and 2 others. · View Herald TranscriptFeb 15 2023, 9:13 PM

Harbormaster completed remote builds in B214053: Diff 497880.Feb 15 2023, 10:10 PM

sameerds added inline comments.Feb 16 2023, 4:40 AM

llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

85 ↗

(On Diff #497880)

The comment at the start of this test says the following:

11 ; This module creates a divergent branch in block Flow2. The branch is
12 ; marked as divergent by the divergence analysis but the condition is
13 ; not. This test ensures that the divergence of the branch is tested,
14 ; not its condition, so that branch is correctly emitted as divergent.

It seems that the proposed change completely fails the intention of this test?

I think for this specific case, we should report %8 as uniform, and the branch should also be uniform. But there seems something wrong in the uniform analysis, if you try opt -passes='print<uniformity>' with the test here. We will see both the condition %8 and the conditional branch br i1 %8,... are reported as divergent. but the isUniform() query return true for the branch instruction. I think there should be something wrong in uniform analysis. If the condition is divergent, I think the branch should also be divergent. Another issue I want to point out that the uniform analysis has a subtle difference with divergence analysis. There is some comment in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPURewriteUndefForPHI.cpp#L13 to explain the issue. I think we need to fill the gap to switch to uniform analysis, otherwise we will regress code generation. The last time we discussed this, I think we want some target specific option in uniform analysis to match this behavior with divergence analysis.

In D144162#4134296, @ruiling wrote:

I think for this specific case, we should report %8 as uniform, and the branch should also be uniform. But there seems something wrong in the uniform analysis, if you try opt -passes='print<uniformity>' with the test here. We will see both the condition %8 and the conditional branch br i1 %8,... are reported as divergent. but the isUniform() query return true for the branch instruction. I think there should be something wrong in uniform analysis. If the condition is divergent, I think the branch should also be divergent. Another issue I want to point out that the uniform analysis has a subtle difference with divergence analysis. There is some comment in https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPURewriteUndefForPHI.cpp#L13 to explain the issue. I think we need to fill the gap to switch to uniform analysis, otherwise we will regress code generation. The last time we discussed this, I think we want some target specific option in uniform analysis to match this behavior with divergence analysis.

I agree with that assessment, and pretty much found the same thing myself by comparing with the output of "print<divergence>". The uniformity analysis assumes a phi is divergent if it has an undef argument, but the divergence analysis does not make that assumption. The difference is in the call to PHINode::hasConstantValue() for divergence analysis and PHINode::hasConstantOrUndefValue() for uniformity analysis.

I think we should just fix the uniformity analysis to match the existing behaviour and revisit the issue later. I can have the patch ready by Monday EOD IST.

The failing test should pass after D144254 lands.

I think it is reasonable to match divergent analysis behavior regarding to undef. The other problem that isUniform() return true for a divergent branch instruction makes me wonder: is it the best way to replace use of divergence analysis with uniform analysis one by one? Although I am optimistic about the quality of uniform analysis, I think it may be more helpful to replace all the occurrences of divergence analysis and fix all the bugs uncovered. Ideally, we would have very little test changes. The reason is that one specific pass may not have enough test coverage. Fixing the bugs after switching all the uses of divergence analysis to uniform analysis will make us more confident that we will less likely cause regression. Any different idea?

In D144162#4138003, @ruiling wrote:

I think it is reasonable to match divergent analysis behavior regarding to undef. The other problem that isUniform() return true for a divergent branch instruction makes me wonder: is it the best way to replace use of divergence analysis with uniform analysis one by one? Although I am optimistic about the quality of uniform analysis, I think it may be more helpful to replace all the occurrences of divergence analysis and fix all the bugs uncovered. Ideally, we would have very little test changes. The reason is that one specific pass may not have enough test coverage. Fixing the bugs after switching all the uses of divergence analysis to uniform analysis will make us more confident that we will less likely cause regression. Any different idea?

I tried replacing the analysis in AMDGPUUnifyDivergentExitNodes pass, it causes 20 tests to fail including a few fatal errors. I could imagine a lot more failures may occur after replacing in 8 or so other middle-end passes. Replacing all the occurences in a single patch will be massive. I prefer the one pass at a time approach (like the approach with this patch).

I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.

In D144162#4140550, @sameerds wrote:

I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.

Do you mean a single option that determines the analysis to use and all transformations use analysis depending on the option value? Something like --enable-uniformity-analysis=1?

In D144162#4138003, @ruiling wrote:

Fixing the bugs after switching all the uses of divergence analysis to uniform analysis will make us more confident that we will less likely cause regression. Any different idea?

I think changing them one at a time is fine, this shouldn't be a long drawn out process.

In D144162#4140550, @sameerds wrote:

I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.

Another possible way may be do the replacement one pass by one pass, but we only submit all the changes until all the passes have switched to uniform analysis.

kzhuravl added a reviewer: Pierre-vh.Feb 22 2023, 12:30 PM

In D144162#4141978, @gandhi21299 wrote:

In D144162#4140550, @sameerds wrote:

I agree with @gandhi21299 ... it's best to work through the passes one at a time. But maybe we should not enable each change as it happens. We could put the changes behind a command-line option to switch between DA and UA. One single option that switches whatever passes have been updated, until all passes are updated and we are ready to flip the switch permanently.

Do you mean a single option that determines the analysis to use and all transformations use analysis depending on the option value? Something like --enable-uniformity-analysis=1?

Not exactly like that. We can make the UA a wrapper over the DA, and update the passes to use only the UA. Depending on the commandline, the UA will actually compute its own results, or simply hand off to a DA stored internally. This is how the legacy DA and the DA are currently arranged.

Once we are ready to shift to UA for good, we can eliminate all this. And we should.

Instead of checking uniformity of a branch instruction in the pass, check whether its parent basic block is not divergent.
Update test checks.

gandhi21299 marked an inline comment as done.Feb 23 2023, 1:30 AM

Harbormaster completed remote builds in B215454: Diff 499768.Feb 23 2023, 2:49 AM

sameerds added inline comments.Feb 23 2023, 7:59 AM

llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
81	I am not sure we should be doing this. Smells like a bug in the UA!

sameerds added a parent revision: D144699: [llvm][Uniformity] provide overloads for Instruction* and Value*.Feb 23 2023, 10:44 PM

sameerds added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
81	D144699 allows the original "isUniform()" call to work as expected on a BranchInst.

Rebased against main

Harbormaster completed remote builds in B216396: Diff 501033.Feb 27 2023, 11:38 PM

LGTM! But please ensure this passes internal builds before submitting upstream.

This revision is now accepted and ready to land.Feb 27 2023, 11:45 PM

Corrected branch-uniformity.ll and i1-copy-from-loop.ll

Harbormaster completed remote builds in B216406: Diff 501047.Feb 28 2023, 1:31 AM

foad added inline comments.Feb 28 2023, 2:33 AM

llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
81	Did you mean to change this back to `isUniform`? (I don't really mind either way.)
llvm/test/CodeGen/AMDGPU/branch-uniformity.ll
12 ↗	(On Diff #501047)	This looks like it might be the same kind of regression as `i1-copy-from-loop.ll`
llvm/test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll
46 ↗	(On Diff #501047)	Same kind of regression here I think (the `s_cselect_b64 -1, 0` is a bad smell).
llvm/test/CodeGen/AMDGPU/i1-copy-from-loop.ll
28 ↗	(On Diff #501047)	This is a regression.

gandhi21299 added inline comments.Feb 28 2023, 8:28 AM

llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp
81	Yea I meant to change this back, great catch!
llvm/test/CodeGen/AMDGPU/i1-copy-from-loop.ll
28 ↗	(On Diff #501047)	Should I handle all the regressions in a seperate patch?

foad added inline comments.Feb 28 2023, 9:04 AM

llvm/test/CodeGen/AMDGPU/i1-copy-from-loop.ll
28 ↗	(On Diff #501047)	Normally you would not commit a patch that causes regressions, unless you can explain them and get agreement that they are acceptable.

Reverted back to isUniform, fixes all the regressions.

gandhi21299 marked 3 inline comments as done.Feb 28 2023, 9:33 AM

Harbormaster completed remote builds in B216501: Diff 501185.Feb 28 2023, 10:59 AM

Closed by commit rGa78301560d8d: [AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues (authored by gandhi21299). · Explain WhyFeb 28 2023, 12:06 PM

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rGa78301560d8d: [AMDGPU] Replace LegacyDA with Uniformity Analysis in AnnotateUniformValues.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUAnnotateUniformValues.cpp

14 lines

test/

CodeGen/

AMDGPU/

llc-pipeline.ll

40 lines

Diff 501253

llvm/lib/Target/AMDGPU/AMDGPUAnnotateUniformValues.cpp

Show All 10 Lines
/// can be used during instruction selection.		/// can be used during instruction selection.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "Utils/AMDGPUBaseInfo.h"		#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDGPUMemoryUtils.h"		#include "Utils/AMDGPUMemoryUtils.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
		#include "llvm/Analysis/UniformityAnalysis.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"

#define DEBUG_TYPE "amdgpu-annotate-uniform"		#define DEBUG_TYPE "amdgpu-annotate-uniform"

using namespace llvm;		using namespace llvm;

namespace {		namespace {

class AMDGPUAnnotateUniformValues : public FunctionPass,		class AMDGPUAnnotateUniformValues : public FunctionPass,
public InstVisitor<AMDGPUAnnotateUniformValues> {		public InstVisitor<AMDGPUAnnotateUniformValues> {
LegacyDivergenceAnalysis *DA;		UniformityInfo *UA;
MemorySSA *MSSA;		MemorySSA *MSSA;
AliasAnalysis *AA;		AliasAnalysis *AA;
bool isEntryFunc;		bool isEntryFunc;
bool Changed;		bool Changed;

void setUniformMetadata(Instruction *I) {		void setUniformMetadata(Instruction *I) {
I->setMetadata("amdgpu.uniform", MDNode::get(I->getContext(), {}));		I->setMetadata("amdgpu.uniform", MDNode::get(I->getContext(), {}));
Changed = true;		Changed = true;
Show All 9 Lines	public:
AMDGPUAnnotateUniformValues() :		AMDGPUAnnotateUniformValues() :
FunctionPass(ID) { }		FunctionPass(ID) { }
bool doInitialization(Module &M) override;		bool doInitialization(Module &M) override;
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;
StringRef getPassName() const override {		StringRef getPassName() const override {
return "AMDGPU Annotate Uniform Values";		return "AMDGPU Annotate Uniform Values";
}		}
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<LegacyDivergenceAnalysis>();		AU.addRequired<UniformityInfoWrapperPass>();
AU.addRequired<MemorySSAWrapperPass>();		AU.addRequired<MemorySSAWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.setPreservesAll();		AU.setPreservesAll();
}		}

void visitBranchInst(BranchInst &I);		void visitBranchInst(BranchInst &I);
void visitLoadInst(LoadInst &I);		void visitLoadInst(LoadInst &I);
};		};

} // End anonymous namespace		} // End anonymous namespace

INITIALIZE_PASS_BEGIN(AMDGPUAnnotateUniformValues, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(AMDGPUAnnotateUniformValues, DEBUG_TYPE,
"Add AMDGPU uniform metadata", false, false)		"Add AMDGPU uniform metadata", false, false)
INITIALIZE_PASS_DEPENDENCY(LegacyDivergenceAnalysis)		INITIALIZE_PASS_DEPENDENCY(UniformityInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)		INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_END(AMDGPUAnnotateUniformValues, DEBUG_TYPE,		INITIALIZE_PASS_END(AMDGPUAnnotateUniformValues, DEBUG_TYPE,
"Add AMDGPU uniform metadata", false, false)		"Add AMDGPU uniform metadata", false, false)

char AMDGPUAnnotateUniformValues::ID = 0;		char AMDGPUAnnotateUniformValues::ID = 0;

void AMDGPUAnnotateUniformValues::visitBranchInst(BranchInst &I) {		void AMDGPUAnnotateUniformValues::visitBranchInst(BranchInst &I) {
if (DA->isUniform(&I))		if (UA->isUniform(&I))
		sameerdsUnsubmitted Done Reply Inline Actions I am not sure we should be doing this. Smells like a bug in the UA! sameerds: I am not sure we should be doing this. Smells like a bug in the UA!
		sameerdsUnsubmitted Done Reply Inline Actions D144699 allows the original "isUniform()" call to work as expected on a BranchInst. sameerds: D144699 allows the original "isUniform()" call to work as expected on a BranchInst.
		foadUnsubmitted Done Reply Inline Actions Did you mean to change this back to `isUniform`? (I don't really mind either way.) foad: Did you mean to change this back to `isUniform`? (I don't really mind either way.)
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Yea I meant to change this back, great catch! gandhi21299: Yea I meant to change this back, great catch!
setUniformMetadata(&I);		setUniformMetadata(&I);
}		}

void AMDGPUAnnotateUniformValues::visitLoadInst(LoadInst &I) {		void AMDGPUAnnotateUniformValues::visitLoadInst(LoadInst &I) {
Value *Ptr = I.getPointerOperand();		Value *Ptr = I.getPointerOperand();
if (!DA->isUniform(Ptr))		if (!UA->isUniform(Ptr))
return;		return;
Instruction *PtrI = dyn_cast<Instruction>(Ptr);		Instruction *PtrI = dyn_cast<Instruction>(Ptr);
if (PtrI)		if (PtrI)
setUniformMetadata(PtrI);		setUniformMetadata(PtrI);

// We're tracking up to the Function boundaries, and cannot go beyond because		// We're tracking up to the Function boundaries, and cannot go beyond because
// of FunctionPass restrictions. We can ensure that is memory not clobbered		// of FunctionPass restrictions. We can ensure that is memory not clobbered
// for memory operations that are live in to entry points only.		// for memory operations that are live in to entry points only.
if (!isEntryFunc)		if (!isEntryFunc)
return;		return;
bool GlobalLoad = I.getPointerAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS;		bool GlobalLoad = I.getPointerAddressSpace() == AMDGPUAS::GLOBAL_ADDRESS;
if (GlobalLoad && !AMDGPU::isClobberedInFunction(&I, MSSA, AA))		if (GlobalLoad && !AMDGPU::isClobberedInFunction(&I, MSSA, AA))
setNoClobberMetadata(&I);		setNoClobberMetadata(&I);
}		}

bool AMDGPUAnnotateUniformValues::doInitialization(Module &M) {		bool AMDGPUAnnotateUniformValues::doInitialization(Module &M) {
return false;		return false;
}		}

bool AMDGPUAnnotateUniformValues::runOnFunction(Function &F) {		bool AMDGPUAnnotateUniformValues::runOnFunction(Function &F) {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

DA = &getAnalysis<LegacyDivergenceAnalysis>();		UA = &getAnalysis<UniformityInfoWrapperPass>().getUniformityInfo();
MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();		MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();
AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
isEntryFunc = AMDGPU::isEntryFunctionCC(F.getCallingConv());		isEntryFunc = AMDGPU::isEntryFunctionCC(F.getCallingConv());

Changed = false;		Changed = false;
visit(F);		visit(F);
return Changed;		return Changed;
}		}

FunctionPass *		FunctionPass *
llvm::createAMDGPUAnnotateUniformValues() {		llvm::createAMDGPUAnnotateUniformValues() {
return new AMDGPUAnnotateUniformValues();		return new AMDGPUAnnotateUniformValues();
}		}

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; GCN-O0-NEXT: Natural Loop Information			; GCN-O0-NEXT: Natural Loop Information
	; GCN-O0-NEXT: Convert irreducible control-flow into natural loops			; GCN-O0-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O0-NEXT: Fixup each natural loop to have a single exit block			; GCN-O0-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O0-NEXT: Post-Dominator Tree Construction			; GCN-O0-NEXT: Post-Dominator Tree Construction
	; GCN-O0-NEXT: Dominance Frontier Construction			; GCN-O0-NEXT: Dominance Frontier Construction
	; GCN-O0-NEXT: Detect single entry single exit regions			; GCN-O0-NEXT: Detect single entry single exit regions
	; GCN-O0-NEXT: Region Pass Manager			; GCN-O0-NEXT: Region Pass Manager
	; GCN-O0-NEXT: Structurize control flow			; GCN-O0-NEXT: Structurize control flow
	; GCN-O0-NEXT: Post-Dominator Tree Construction			; GCN-O0-NEXT: Cycle Info Analysis
	; GCN-O0-NEXT: Natural Loop Information			; GCN-O0-NEXT: Uniformity Analysis
	; GCN-O0-NEXT: Legacy Divergence Analysis
	; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O0-NEXT: Function Alias Analysis Results			; GCN-O0-NEXT: Function Alias Analysis Results
	; GCN-O0-NEXT: Memory SSA			; GCN-O0-NEXT: Memory SSA
	; GCN-O0-NEXT: AMDGPU Annotate Uniform Values			; GCN-O0-NEXT: AMDGPU Annotate Uniform Values
				; GCN-O0-NEXT: Natural Loop Information
				; GCN-O0-NEXT: Post-Dominator Tree Construction
				; GCN-O0-NEXT: Legacy Divergence Analysis
	; GCN-O0-NEXT: SI annotate control flow			; GCN-O0-NEXT: SI annotate control flow
	; GCN-O0-NEXT: Post-Dominator Tree Construction			; GCN-O0-NEXT: Post-Dominator Tree Construction
	; GCN-O0-NEXT: Legacy Divergence Analysis			; GCN-O0-NEXT: Legacy Divergence Analysis
	; GCN-O0-NEXT: AMDGPU Rewrite Undef for PHI			; GCN-O0-NEXT: AMDGPU Rewrite Undef for PHI
	; GCN-O0-NEXT: LCSSA Verifier			; GCN-O0-NEXT: LCSSA Verifier
	; GCN-O0-NEXT: Loop-Closed SSA Form Pass			; GCN-O0-NEXT: Loop-Closed SSA Form Pass
	; GCN-O0-NEXT: DummyCGSCCPass			; GCN-O0-NEXT: DummyCGSCCPass
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Convert irreducible control-flow into natural loops			; GCN-O1-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O1-NEXT: Fixup each natural loop to have a single exit block			; GCN-O1-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
	; GCN-O1-NEXT: Dominance Frontier Construction			; GCN-O1-NEXT: Dominance Frontier Construction
	; GCN-O1-NEXT: Detect single entry single exit regions			; GCN-O1-NEXT: Detect single entry single exit regions
	; GCN-O1-NEXT: Region Pass Manager			; GCN-O1-NEXT: Region Pass Manager
	; GCN-O1-NEXT: Structurize control flow			; GCN-O1-NEXT: Structurize control flow
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Cycle Info Analysis
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Uniformity Analysis
	; GCN-O1-NEXT: Legacy Divergence Analysis
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	; GCN-O1-NEXT: Memory SSA			; GCN-O1-NEXT: Memory SSA
	; GCN-O1-NEXT: AMDGPU Annotate Uniform Values			; GCN-O1-NEXT: AMDGPU Annotate Uniform Values
				; GCN-O1-NEXT: Natural Loop Information
				; GCN-O1-NEXT: Post-Dominator Tree Construction
				; GCN-O1-NEXT: Legacy Divergence Analysis
	; GCN-O1-NEXT: SI annotate control flow			; GCN-O1-NEXT: SI annotate control flow
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
	; GCN-O1-NEXT: Legacy Divergence Analysis			; GCN-O1-NEXT: Legacy Divergence Analysis
	; GCN-O1-NEXT: AMDGPU Rewrite Undef for PHI			; GCN-O1-NEXT: AMDGPU Rewrite Undef for PHI
	; GCN-O1-NEXT: LCSSA Verifier			; GCN-O1-NEXT: LCSSA Verifier
	; GCN-O1-NEXT: Loop-Closed SSA Form Pass			; GCN-O1-NEXT: Loop-Closed SSA Form Pass
	; GCN-O1-NEXT: DummyCGSCCPass			; GCN-O1-NEXT: DummyCGSCCPass
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	▲ Show 20 Lines • Show All 275 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Convert irreducible control-flow into natural loops			; GCN-O1-OPTS-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O1-OPTS-NEXT: Fixup each natural loop to have a single exit block			; GCN-O1-OPTS-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Dominance Frontier Construction			; GCN-O1-OPTS-NEXT: Dominance Frontier Construction
	; GCN-O1-OPTS-NEXT: Detect single entry single exit regions			; GCN-O1-OPTS-NEXT: Detect single entry single exit regions
	; GCN-O1-OPTS-NEXT: Region Pass Manager			; GCN-O1-OPTS-NEXT: Region Pass Manager
	; GCN-O1-OPTS-NEXT: Structurize control flow			; GCN-O1-OPTS-NEXT: Structurize control flow
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Cycle Info Analysis
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Uniformity Analysis
	; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Memory SSA			; GCN-O1-OPTS-NEXT: Memory SSA
	; GCN-O1-OPTS-NEXT: AMDGPU Annotate Uniform Values			; GCN-O1-OPTS-NEXT: AMDGPU Annotate Uniform Values
				; GCN-O1-OPTS-NEXT: Natural Loop Information
				; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
				; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis
	; GCN-O1-OPTS-NEXT: SI annotate control flow			; GCN-O1-OPTS-NEXT: SI annotate control flow
	; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Post-Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis			; GCN-O1-OPTS-NEXT: Legacy Divergence Analysis
	; GCN-O1-OPTS-NEXT: AMDGPU Rewrite Undef for PHI			; GCN-O1-OPTS-NEXT: AMDGPU Rewrite Undef for PHI
	; GCN-O1-OPTS-NEXT: LCSSA Verifier			; GCN-O1-OPTS-NEXT: LCSSA Verifier
	; GCN-O1-OPTS-NEXT: Loop-Closed SSA Form Pass			; GCN-O1-OPTS-NEXT: Loop-Closed SSA Form Pass
	; GCN-O1-OPTS-NEXT: DummyCGSCCPass			; GCN-O1-OPTS-NEXT: DummyCGSCCPass
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Convert irreducible control-flow into natural loops			; GCN-O2-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O2-NEXT: Fixup each natural loop to have a single exit block			; GCN-O2-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
	; GCN-O2-NEXT: Dominance Frontier Construction			; GCN-O2-NEXT: Dominance Frontier Construction
	; GCN-O2-NEXT: Detect single entry single exit regions			; GCN-O2-NEXT: Detect single entry single exit regions
	; GCN-O2-NEXT: Region Pass Manager			; GCN-O2-NEXT: Region Pass Manager
	; GCN-O2-NEXT: Structurize control flow			; GCN-O2-NEXT: Structurize control flow
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Cycle Info Analysis
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Uniformity Analysis
	; GCN-O2-NEXT: Legacy Divergence Analysis
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Memory SSA			; GCN-O2-NEXT: Memory SSA
	; GCN-O2-NEXT: AMDGPU Annotate Uniform Values			; GCN-O2-NEXT: AMDGPU Annotate Uniform Values
				; GCN-O2-NEXT: Natural Loop Information
				; GCN-O2-NEXT: Post-Dominator Tree Construction
				; GCN-O2-NEXT: Legacy Divergence Analysis
	; GCN-O2-NEXT: SI annotate control flow			; GCN-O2-NEXT: SI annotate control flow
	; GCN-O2-NEXT: Post-Dominator Tree Construction			; GCN-O2-NEXT: Post-Dominator Tree Construction
	; GCN-O2-NEXT: Legacy Divergence Analysis			; GCN-O2-NEXT: Legacy Divergence Analysis
	; GCN-O2-NEXT: AMDGPU Rewrite Undef for PHI			; GCN-O2-NEXT: AMDGPU Rewrite Undef for PHI
	; GCN-O2-NEXT: LCSSA Verifier			; GCN-O2-NEXT: LCSSA Verifier
	; GCN-O2-NEXT: Loop-Closed SSA Form Pass			; GCN-O2-NEXT: Loop-Closed SSA Form Pass
	; GCN-O2-NEXT: Analysis if a function is memory bound			; GCN-O2-NEXT: Analysis if a function is memory bound
	; GCN-O2-NEXT: DummyCGSCCPass			; GCN-O2-NEXT: DummyCGSCCPass
	▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Convert irreducible control-flow into natural loops			; GCN-O3-NEXT: Convert irreducible control-flow into natural loops
	; GCN-O3-NEXT: Fixup each natural loop to have a single exit block			; GCN-O3-NEXT: Fixup each natural loop to have a single exit block
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
	; GCN-O3-NEXT: Dominance Frontier Construction			; GCN-O3-NEXT: Dominance Frontier Construction
	; GCN-O3-NEXT: Detect single entry single exit regions			; GCN-O3-NEXT: Detect single entry single exit regions
	; GCN-O3-NEXT: Region Pass Manager			; GCN-O3-NEXT: Region Pass Manager
	; GCN-O3-NEXT: Structurize control flow			; GCN-O3-NEXT: Structurize control flow
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Cycle Info Analysis
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Uniformity Analysis
	; GCN-O3-NEXT: Legacy Divergence Analysis
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory SSA			; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: AMDGPU Annotate Uniform Values			; GCN-O3-NEXT: AMDGPU Annotate Uniform Values
				; GCN-O3-NEXT: Natural Loop Information
				; GCN-O3-NEXT: Post-Dominator Tree Construction
				; GCN-O3-NEXT: Legacy Divergence Analysis
	; GCN-O3-NEXT: SI annotate control flow			; GCN-O3-NEXT: SI annotate control flow
	; GCN-O3-NEXT: Post-Dominator Tree Construction			; GCN-O3-NEXT: Post-Dominator Tree Construction
	; GCN-O3-NEXT: Legacy Divergence Analysis			; GCN-O3-NEXT: Legacy Divergence Analysis
	; GCN-O3-NEXT: AMDGPU Rewrite Undef for PHI			; GCN-O3-NEXT: AMDGPU Rewrite Undef for PHI
	; GCN-O3-NEXT: LCSSA Verifier			; GCN-O3-NEXT: LCSSA Verifier
	; GCN-O3-NEXT: Loop-Closed SSA Form Pass			; GCN-O3-NEXT: Loop-Closed SSA Form Pass
	; GCN-O3-NEXT: Analysis if a function is memory bound			; GCN-O3-NEXT: Analysis if a function is memory bound
	; GCN-O3-NEXT: DummyCGSCCPass			; GCN-O3-NEXT: DummyCGSCCPass
	▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines