This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Remove attempt at simplifying the format string in printf lowering
ClosedPublic

Authored by arsenm on Jun 28 2023, 10:55 AM.

Download Raw Diff

Details

Reviewers

vikramRH
sameerds
rampitec
Pierre-vh
cdevadas

Group Reviewers

Restricted Project

Summary

This avoids computing the dominator tree by removing the
simplifyInstruction use.

This was applying simplification with some kind of questionable
load-store forwarding and looking for the global. This had to have
been an ancient hack copied from previous backends. In the OpenCL
case, this is always emitted as required the direct global reference
anyway.

Diff Detail

Event Timeline

arsenm created this revision.Jun 28 2023, 10:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2023, 10:55 AM

Herald added subscribers: foad, kerbowa, hiraditya and 5 others. · View Herald Transcript

arsenm requested review of this revision.Jun 28 2023, 10:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2023, 10:55 AM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B241847: Diff 535468.Jun 28 2023, 10:56 AM

nikic added a subscriber: nikic.Jun 28 2023, 11:47 AM

nikic added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
461	Cached analyses on the NewPM should only be used for analyses preservation. You should either require DT here or not use it at all.

arsenm added inline comments.Jun 28 2023, 12:25 PM

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
461	I don't see anything about this in the comments on getCachedResult

arsenm added inline comments.Jun 28 2023, 12:30 PM

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
461	Is this not identical to how getBestSimplifyQuery uses this?

nikic added inline comments.Jun 28 2023, 12:43 PM

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp
461	It is -- we should fix it. Thankfully the places it is used have all the relevant analyses available anyway so it makes no functional difference. To clarify, the reason why this is problematic is that while the LegacyPM determined analysis availability statically, in the NewPM this depends on whether some pass happened to make any changes and ended up invalidating analyses or not. This means that we (generally speaking) do not have guarantees about whether a given analysis will be available at a specific pipeline position. You'll get the same pass executing with the analysis or without the analysis, at the same pipeline position, depending on how exactly your IR looks like.

Just remove simplifyInstruction because the usage here makes no sense

Herald added a subscriber: Anastasia. · View Herald TranscriptJun 30 2023, 2:37 PM

Harbormaster completed remote builds in B242564: Diff 536433.Jun 30 2023, 2:37 PM

In D153992#4465528, @arsenm wrote:

Just remove simplifyInstruction because the usage here makes no sense

I vaguely remember it was added because we were unable to compile something without it and with -O0. But that was during HSAIL times and likely irrelevant.

In D153992#4465550, @rampitec wrote:

In D153992#4465528, @arsenm wrote:

Just remove simplifyInstruction because the usage here makes no sense

I vaguely remember it was added because we were unable to compile something without it and with -O0. But that was during HSAIL times and likely irrelevant.

I figured it was something like that, but clang -O0 -X -disable-llvm-passes still produces the required direct global reference to printf

LGTM

This revision is now accepted and ready to land.Jun 30 2023, 2:49 PM

In D153992#4465566, @arsenm wrote:

In D153992#4465550, @rampitec wrote:

In D153992#4465528, @arsenm wrote:

Just remove simplifyInstruction because the usage here makes no sense

I vaguely remember it was added because we were unable to compile something without it and with -O0. But that was during HSAIL times and likely irrelevant.

I figured it was something like that, but clang -O0 -X -disable-llvm-passes still produces the required direct global reference to printf

From my recollections we were unable to get to format string. This shall be fine now without.

94e24624c2f5b7ebdfd8be898986f943c9462b7f

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUPrintfRuntimeBinding.cpp

55 lines

test/

CodeGen/

AMDGPU/

llc-pipeline.ll

25 lines

Diff 536433

llvm/lib/Target/AMDGPU/AMDGPUPrintfRuntimeBinding.cpp

Show All 14 Lines
// store the following into the printf buffer:		// store the following into the printf buffer:
// - format string (passed as a module's metadata unique ID)		// - format string (passed as a module's metadata unique ID)
// - bitwise copies of printf arguments		// - bitwise copies of printf arguments
// The backend passes will need to store metadata in the kernel		// The backend passes will need to store metadata in the kernel
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/DataExtractor.h"		#include "llvm/Support/DataExtractor.h"
#include "llvm/TargetParser/Triple.h"		#include "llvm/TargetParser/Triple.h"
Show All 9 Lines

public:		public:
static char ID;		static char ID;

explicit AMDGPUPrintfRuntimeBinding();		explicit AMDGPUPrintfRuntimeBinding();

private:		private:
bool runOnModule(Module &M) override;		bool runOnModule(Module &M) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();
}
};		};

class AMDGPUPrintfRuntimeBindingImpl {		class AMDGPUPrintfRuntimeBindingImpl {
public:		public:
AMDGPUPrintfRuntimeBindingImpl(		AMDGPUPrintfRuntimeBindingImpl() {}
function_ref<const DominatorTree &(Function &)> GetDT,
function_ref<const TargetLibraryInfo &(Function &)> GetTLI)
: GetDT(GetDT), GetTLI(GetTLI) {}
bool run(Module &M);		bool run(Module &M);

private:		private:
void getConversionSpecifiers(SmallVectorImpl<char> &OpConvSpecifiers,		void getConversionSpecifiers(SmallVectorImpl<char> &OpConvSpecifiers,
StringRef fmt, size_t num_ops) const;		StringRef fmt, size_t num_ops) const;

bool lowerPrintfForGpu(Module &M);		bool lowerPrintfForGpu(Module &M);

Value simplify(Instruction I, const TargetLibraryInfo *TLI,
const DominatorTree *DT) {
return simplifyInstruction(I, {*TD, TLI, DT});
}

const DataLayout *TD;		const DataLayout *TD;
function_ref<const DominatorTree &(Function &)> GetDT;
function_ref<const TargetLibraryInfo &(Function &)> GetTLI;
SmallVector<CallInst *, 32> Printfs;		SmallVector<CallInst *, 32> Printfs;
};		};
} // namespace		} // namespace

char AMDGPUPrintfRuntimeBinding::ID = 0;		char AMDGPUPrintfRuntimeBinding::ID = 0;

INITIALIZE_PASS_BEGIN(AMDGPUPrintfRuntimeBinding,		INITIALIZE_PASS_BEGIN(AMDGPUPrintfRuntimeBinding,
"amdgpu-printf-runtime-binding", "AMDGPU Printf lowering",		"amdgpu-printf-runtime-binding", "AMDGPU Printf lowering",
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	bool AMDGPUPrintfRuntimeBindingImpl::lowerPrintfForGpu(Module &M) {
unsigned UniqID = metaD->getNumOperands();		unsigned UniqID = metaD->getNumOperands();

for (auto *CI : Printfs) {		for (auto *CI : Printfs) {
unsigned NumOps = CI->arg_size();		unsigned NumOps = CI->arg_size();

SmallString<16> OpConvSpecifiers;		SmallString<16> OpConvSpecifiers;
Value *Op = CI->getArgOperand(0);		Value *Op = CI->getArgOperand(0);

if (auto LI = dyn_cast<LoadInst>(Op)) {
Op = LI->getPointerOperand();
for (auto *Use : Op->users()) {
if (auto SI = dyn_cast<StoreInst>(Use)) {
Op = SI->getValueOperand();
break;
}
}
}

if (auto I = dyn_cast<Instruction>(Op)) {
Value *Op_simplified =
simplify(I, &GetTLI(I->getFunction()), &GetDT(I->getFunction()));
if (Op_simplified)
Op = Op_simplified;
}

StringRef FormatStr;		StringRef FormatStr;
if (!getConstantStringInfo(Op, FormatStr)) {		if (!getConstantStringInfo(Op, FormatStr)) {
Value *Stripped = Op->stripPointerCasts();		Value *Stripped = Op->stripPointerCasts();
if (!isa<UndefValue>(Stripped) && !isa<ConstantPointerNull>(Stripped))		if (!isa<UndefValue>(Stripped) && !isa<ConstantPointerNull>(Stripped))
diagnoseInvalidFormatString(CI);		diagnoseInvalidFormatString(CI);
continue;		continue;
}		}

▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	if (Printfs.empty())
return false;		return false;

TD = &M.getDataLayout();		TD = &M.getDataLayout();

return lowerPrintfForGpu(M);		return lowerPrintfForGpu(M);
}		}

bool AMDGPUPrintfRuntimeBinding::runOnModule(Module &M) {		bool AMDGPUPrintfRuntimeBinding::runOnModule(Module &M) {
auto GetDT = [this](Function &F) -> DominatorTree & {		return AMDGPUPrintfRuntimeBindingImpl().run(M);
return this->getAnalysis<DominatorTreeWrapperPass>(F).getDomTree();
};
auto GetTLI = [this](Function &F) -> TargetLibraryInfo & {
return this->getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
};

return AMDGPUPrintfRuntimeBindingImpl(GetDT, GetTLI).run(M);
}		}

PreservedAnalyses		PreservedAnalyses
AMDGPUPrintfRuntimeBindingPass::run(Module &M, ModuleAnalysisManager &AM) {		AMDGPUPrintfRuntimeBindingPass::run(Module &M, ModuleAnalysisManager &AM) {
FunctionAnalysisManager &FAM =		bool Changed = AMDGPUPrintfRuntimeBindingImpl().run(M);
		nikicUnsubmitted Not Done Reply Inline Actions Cached analyses on the NewPM should only be used for analyses preservation. You should either require DT here or not use it at all. nikic: Cached analyses on the NewPM should only be used for analyses preservation. You should either…
		arsenmAuthorUnsubmitted Done Reply Inline Actions I don't see anything about this in the comments on getCachedResult arsenm: I don't see anything about this in the comments on getCachedResult
		arsenmAuthorUnsubmitted Done Reply Inline Actions Is this not identical to how getBestSimplifyQuery uses this? arsenm: Is this not identical to how getBestSimplifyQuery uses this?
		nikicUnsubmitted Not Done Reply Inline Actions It is -- we should fix it. Thankfully the places it is used have all the relevant analyses available anyway so it makes no functional difference. To clarify, the reason why this is problematic is that while the LegacyPM determined analysis availability statically, in the NewPM this depends on whether some pass happened to make any changes and ended up invalidating analyses or not. This means that we (generally speaking) do not have guarantees about whether a given analysis will be available at a specific pipeline position. You'll get the same pass executing with the analysis or without the analysis, at the same pipeline position, depending on how exactly your IR looks like. nikic: It is -- we should fix it. Thankfully the places it is used have all the relevant analyses…
AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
auto GetDT = [&FAM](Function &F) -> DominatorTree & {
return FAM.getResult<DominatorTreeAnalysis>(F);
};
auto GetTLI = [&FAM](Function &F) -> TargetLibraryInfo & {
return FAM.getResult<TargetLibraryAnalysis>(F);
};
bool Changed = AMDGPUPrintfRuntimeBindingImpl(GetDT, GetTLI).run(M);
return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();		return Changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
}		}

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show All 23 Lines
	; GCN-O0-NEXT:Register Usage Information Storage			; GCN-O0-NEXT:Register Usage Information Storage
	; GCN-O0-NEXT:Machine Branch Probability Analysis			; GCN-O0-NEXT:Machine Branch Probability Analysis
	; GCN-O0-NEXT: ModulePass Manager			; GCN-O0-NEXT: ModulePass Manager
	; GCN-O0-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O0-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Expand large div/rem			; GCN-O0-NEXT: Expand large div/rem
	; GCN-O0-NEXT: Expand large fp convert			; GCN-O0-NEXT: Expand large fp convert
	; GCN-O0-NEXT: AMDGPU Printf lowering			; GCN-O0-NEXT: AMDGPU Printf lowering
	; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Dominator Tree Construction
	; GCN-O0-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O0-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O0-NEXT: Lower OpenCL enqueued blocks			; GCN-O0-NEXT: Lower OpenCL enqueued blocks
	; GCN-O0-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O0-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Expand Atomic instructions			; GCN-O0-NEXT: Expand Atomic instructions
	; GCN-O0-NEXT: Lower constant intrinsics			; GCN-O0-NEXT: Lower constant intrinsics
	; GCN-O0-NEXT: Remove unreachable blocks from the CFG			; GCN-O0-NEXT: Remove unreachable blocks from the CFG
	; GCN-O0-NEXT: Expand vector predication intrinsics			; GCN-O0-NEXT: Expand vector predication intrinsics
	▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	; GCN-O0-NEXT: Machine Optimization Remark Emitter			; GCN-O0-NEXT: Machine Optimization Remark Emitter
	; GCN-O0-NEXT: Stack Frame Layout Analysis			; GCN-O0-NEXT: Stack Frame Layout Analysis
	; GCN-O0-NEXT: Function register usage analysis			; GCN-O0-NEXT: Function register usage analysis
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O0-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O0-NEXT: Machine Optimization Remark Emitter			; GCN-O0-NEXT: Machine Optimization Remark Emitter
	; GCN-O0-NEXT: AMDGPU Assembly Printer			; GCN-O0-NEXT: AMDGPU Assembly Printer
	; GCN-O0-NEXT: Free MachineFunction			; GCN-O0-NEXT: Free MachineFunction
	; GCN-O0-NEXT:Pass Arguments: -domtree
	; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Dominator Tree Construction

	; GCN-O1:Target Library Information			; GCN-O1:Target Library Information
	; GCN-O1-NEXT:Target Pass Configuration			; GCN-O1-NEXT:Target Pass Configuration
	; GCN-O1-NEXT:Machine Module Information			; GCN-O1-NEXT:Machine Module Information
	; GCN-O1-NEXT:Target Transform Information			; GCN-O1-NEXT:Target Transform Information
	; GCN-O1-NEXT:Assumption Cache Tracker			; GCN-O1-NEXT:Assumption Cache Tracker
	; GCN-O1-NEXT:AMDGPU Address space based Alias Analysis			; GCN-O1-NEXT:AMDGPU Address space based Alias Analysis
	; GCN-O1-NEXT:External Alias Analysis			; GCN-O1-NEXT:External Alias Analysis
	; GCN-O1-NEXT:Type-Based Alias Analysis			; GCN-O1-NEXT:Type-Based Alias Analysis
	; GCN-O1-NEXT:Scoped NoAlias Alias Analysis			; GCN-O1-NEXT:Scoped NoAlias Alias Analysis
	; GCN-O1-NEXT:Profile summary info			; GCN-O1-NEXT:Profile summary info
	; GCN-O1-NEXT:Argument Register Usage Information Storage			; GCN-O1-NEXT:Argument Register Usage Information Storage
	; GCN-O1-NEXT:Create Garbage Collector Module Metadata			; GCN-O1-NEXT:Create Garbage Collector Module Metadata
	; GCN-O1-NEXT:Machine Branch Probability Analysis			; GCN-O1-NEXT:Machine Branch Probability Analysis
	; GCN-O1-NEXT:Register Usage Information Storage			; GCN-O1-NEXT:Register Usage Information Storage
	; GCN-O1-NEXT:Default Regalloc Eviction Advisor			; GCN-O1-NEXT:Default Regalloc Eviction Advisor
	; GCN-O1-NEXT:Default Regalloc Priority Advisor			; GCN-O1-NEXT:Default Regalloc Priority Advisor
	; GCN-O1-NEXT: ModulePass Manager			; GCN-O1-NEXT: ModulePass Manager
	; GCN-O1-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O1-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Expand large div/rem			; GCN-O1-NEXT: Expand large div/rem
	; GCN-O1-NEXT: Expand large fp convert			; GCN-O1-NEXT: Expand large fp convert
	; GCN-O1-NEXT: AMDGPU Printf lowering			; GCN-O1-NEXT: AMDGPU Printf lowering
	; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O1-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O1-NEXT: Lower OpenCL enqueued blocks			; GCN-O1-NEXT: Lower OpenCL enqueued blocks
	; GCN-O1-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O1-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O1-NEXT: AMDGPU Attributor			; GCN-O1-NEXT: AMDGPU Attributor
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Cycle Info Analysis			; GCN-O1-NEXT: Cycle Info Analysis
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Infer address spaces			; GCN-O1-NEXT: Infer address spaces
	▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: Machine Optimization Remark Emitter			; GCN-O1-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-NEXT: Stack Frame Layout Analysis			; GCN-O1-NEXT: Stack Frame Layout Analysis
	; GCN-O1-NEXT: Function register usage analysis			; GCN-O1-NEXT: Function register usage analysis
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-NEXT: Machine Optimization Remark Emitter			; GCN-O1-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-NEXT: AMDGPU Assembly Printer			; GCN-O1-NEXT: AMDGPU Assembly Printer
	; GCN-O1-NEXT: Free MachineFunction			; GCN-O1-NEXT: Free MachineFunction
	; GCN-O1-NEXT:Pass Arguments: -domtree
	; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Dominator Tree Construction

	; GCN-O1-OPTS:Target Library Information			; GCN-O1-OPTS:Target Library Information
	; GCN-O1-OPTS-NEXT:Target Pass Configuration			; GCN-O1-OPTS-NEXT:Target Pass Configuration
	; GCN-O1-OPTS-NEXT:Machine Module Information			; GCN-O1-OPTS-NEXT:Machine Module Information
	; GCN-O1-OPTS-NEXT:Target Transform Information			; GCN-O1-OPTS-NEXT:Target Transform Information
	; GCN-O1-OPTS-NEXT:Assumption Cache Tracker			; GCN-O1-OPTS-NEXT:Assumption Cache Tracker
	; GCN-O1-OPTS-NEXT:AMDGPU Address space based Alias Analysis			; GCN-O1-OPTS-NEXT:AMDGPU Address space based Alias Analysis
	; GCN-O1-OPTS-NEXT:External Alias Analysis			; GCN-O1-OPTS-NEXT:External Alias Analysis
	; GCN-O1-OPTS-NEXT:Type-Based Alias Analysis			; GCN-O1-OPTS-NEXT:Type-Based Alias Analysis
	; GCN-O1-OPTS-NEXT:Scoped NoAlias Alias Analysis			; GCN-O1-OPTS-NEXT:Scoped NoAlias Alias Analysis
	; GCN-O1-OPTS-NEXT:Profile summary info			; GCN-O1-OPTS-NEXT:Profile summary info
	; GCN-O1-OPTS-NEXT:Argument Register Usage Information Storage			; GCN-O1-OPTS-NEXT:Argument Register Usage Information Storage
	; GCN-O1-OPTS-NEXT:Create Garbage Collector Module Metadata			; GCN-O1-OPTS-NEXT:Create Garbage Collector Module Metadata
	; GCN-O1-OPTS-NEXT:Machine Branch Probability Analysis			; GCN-O1-OPTS-NEXT:Machine Branch Probability Analysis
	; GCN-O1-OPTS-NEXT:Register Usage Information Storage			; GCN-O1-OPTS-NEXT:Register Usage Information Storage
	; GCN-O1-OPTS-NEXT:Default Regalloc Eviction Advisor			; GCN-O1-OPTS-NEXT:Default Regalloc Eviction Advisor
	; GCN-O1-OPTS-NEXT:Default Regalloc Priority Advisor			; GCN-O1-OPTS-NEXT:Default Regalloc Priority Advisor
	; GCN-O1-OPTS-NEXT: ModulePass Manager			; GCN-O1-OPTS-NEXT: ModulePass Manager
	; GCN-O1-OPTS-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O1-OPTS-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Expand large div/rem			; GCN-O1-OPTS-NEXT: Expand large div/rem
	; GCN-O1-OPTS-NEXT: Expand large fp convert			; GCN-O1-OPTS-NEXT: Expand large fp convert
	; GCN-O1-OPTS-NEXT: AMDGPU Printf lowering			; GCN-O1-OPTS-NEXT: AMDGPU Printf lowering
	; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O1-OPTS-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O1-OPTS-NEXT: Lower OpenCL enqueued blocks			; GCN-O1-OPTS-NEXT: Lower OpenCL enqueued blocks
	; GCN-O1-OPTS-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O1-OPTS-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O1-OPTS-NEXT: AMDGPU Attributor			; GCN-O1-OPTS-NEXT: AMDGPU Attributor
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Cycle Info Analysis			; GCN-O1-OPTS-NEXT: Cycle Info Analysis
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Infer address spaces			; GCN-O1-OPTS-NEXT: Infer address spaces
	▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter			; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-OPTS-NEXT: Stack Frame Layout Analysis			; GCN-O1-OPTS-NEXT: Stack Frame Layout Analysis
	; GCN-O1-OPTS-NEXT: Function register usage analysis			; GCN-O1-OPTS-NEXT: Function register usage analysis
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O1-OPTS-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter			; GCN-O1-OPTS-NEXT: Machine Optimization Remark Emitter
	; GCN-O1-OPTS-NEXT: AMDGPU Assembly Printer			; GCN-O1-OPTS-NEXT: AMDGPU Assembly Printer
	; GCN-O1-OPTS-NEXT: Free MachineFunction			; GCN-O1-OPTS-NEXT: Free MachineFunction
	; GCN-O1-OPTS-NEXT:Pass Arguments: -domtree
	; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction

	; GCN-O2:Target Library Information			; GCN-O2:Target Library Information
	; GCN-O2-NEXT:Target Pass Configuration			; GCN-O2-NEXT:Target Pass Configuration
	; GCN-O2-NEXT:Machine Module Information			; GCN-O2-NEXT:Machine Module Information
	; GCN-O2-NEXT:Target Transform Information			; GCN-O2-NEXT:Target Transform Information
	; GCN-O2-NEXT:Assumption Cache Tracker			; GCN-O2-NEXT:Assumption Cache Tracker
	; GCN-O2-NEXT:AMDGPU Address space based Alias Analysis			; GCN-O2-NEXT:AMDGPU Address space based Alias Analysis
	; GCN-O2-NEXT:External Alias Analysis			; GCN-O2-NEXT:External Alias Analysis
	; GCN-O2-NEXT:Type-Based Alias Analysis			; GCN-O2-NEXT:Type-Based Alias Analysis
	; GCN-O2-NEXT:Scoped NoAlias Alias Analysis			; GCN-O2-NEXT:Scoped NoAlias Alias Analysis
	; GCN-O2-NEXT:Profile summary info			; GCN-O2-NEXT:Profile summary info
	; GCN-O2-NEXT:Argument Register Usage Information Storage			; GCN-O2-NEXT:Argument Register Usage Information Storage
	; GCN-O2-NEXT:Create Garbage Collector Module Metadata			; GCN-O2-NEXT:Create Garbage Collector Module Metadata
	; GCN-O2-NEXT:Machine Branch Probability Analysis			; GCN-O2-NEXT:Machine Branch Probability Analysis
	; GCN-O2-NEXT:Register Usage Information Storage			; GCN-O2-NEXT:Register Usage Information Storage
	; GCN-O2-NEXT:Default Regalloc Eviction Advisor			; GCN-O2-NEXT:Default Regalloc Eviction Advisor
	; GCN-O2-NEXT:Default Regalloc Priority Advisor			; GCN-O2-NEXT:Default Regalloc Priority Advisor
	; GCN-O2-NEXT: ModulePass Manager			; GCN-O2-NEXT: ModulePass Manager
	; GCN-O2-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O2-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Expand large div/rem			; GCN-O2-NEXT: Expand large div/rem
	; GCN-O2-NEXT: Expand large fp convert			; GCN-O2-NEXT: Expand large fp convert
	; GCN-O2-NEXT: AMDGPU Printf lowering			; GCN-O2-NEXT: AMDGPU Printf lowering
	; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O2-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O2-NEXT: Lower OpenCL enqueued blocks			; GCN-O2-NEXT: Lower OpenCL enqueued blocks
	; GCN-O2-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O2-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O2-NEXT: AMDGPU Attributor			; GCN-O2-NEXT: AMDGPU Attributor
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Cycle Info Analysis			; GCN-O2-NEXT: Cycle Info Analysis
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Infer address spaces			; GCN-O2-NEXT: Infer address spaces
	▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Machine Optimization Remark Emitter			; GCN-O2-NEXT: Machine Optimization Remark Emitter
	; GCN-O2-NEXT: Stack Frame Layout Analysis			; GCN-O2-NEXT: Stack Frame Layout Analysis
	; GCN-O2-NEXT: Function register usage analysis			; GCN-O2-NEXT: Function register usage analysis
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O2-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O2-NEXT: Machine Optimization Remark Emitter			; GCN-O2-NEXT: Machine Optimization Remark Emitter
	; GCN-O2-NEXT: AMDGPU Assembly Printer			; GCN-O2-NEXT: AMDGPU Assembly Printer
	; GCN-O2-NEXT: Free MachineFunction			; GCN-O2-NEXT: Free MachineFunction
	; GCN-O2-NEXT:Pass Arguments: -domtree
	; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Dominator Tree Construction

	; GCN-O3:Target Library Information			; GCN-O3:Target Library Information
	; GCN-O3-NEXT:Target Pass Configuration			; GCN-O3-NEXT:Target Pass Configuration
	; GCN-O3-NEXT:Machine Module Information			; GCN-O3-NEXT:Machine Module Information
	; GCN-O3-NEXT:Target Transform Information			; GCN-O3-NEXT:Target Transform Information
	; GCN-O3-NEXT:Assumption Cache Tracker			; GCN-O3-NEXT:Assumption Cache Tracker
	; GCN-O3-NEXT:Profile summary info			; GCN-O3-NEXT:Profile summary info
	; GCN-O3-NEXT:AMDGPU Address space based Alias Analysis			; GCN-O3-NEXT:AMDGPU Address space based Alias Analysis
	; GCN-O3-NEXT:External Alias Analysis			; GCN-O3-NEXT:External Alias Analysis
	; GCN-O3-NEXT:Type-Based Alias Analysis			; GCN-O3-NEXT:Type-Based Alias Analysis
	; GCN-O3-NEXT:Scoped NoAlias Alias Analysis			; GCN-O3-NEXT:Scoped NoAlias Alias Analysis
	; GCN-O3-NEXT:Argument Register Usage Information Storage			; GCN-O3-NEXT:Argument Register Usage Information Storage
	; GCN-O3-NEXT:Create Garbage Collector Module Metadata			; GCN-O3-NEXT:Create Garbage Collector Module Metadata
	; GCN-O3-NEXT:Machine Branch Probability Analysis			; GCN-O3-NEXT:Machine Branch Probability Analysis
	; GCN-O3-NEXT:Register Usage Information Storage			; GCN-O3-NEXT:Register Usage Information Storage
	; GCN-O3-NEXT:Default Regalloc Eviction Advisor			; GCN-O3-NEXT:Default Regalloc Eviction Advisor
	; GCN-O3-NEXT:Default Regalloc Priority Advisor			; GCN-O3-NEXT:Default Regalloc Priority Advisor
	; GCN-O3-NEXT: ModulePass Manager			; GCN-O3-NEXT: ModulePass Manager
	; GCN-O3-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O3-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Expand large div/rem			; GCN-O3-NEXT: Expand large div/rem
	; GCN-O3-NEXT: Expand large fp convert			; GCN-O3-NEXT: Expand large fp convert
	; GCN-O3-NEXT: AMDGPU Printf lowering			; GCN-O3-NEXT: AMDGPU Printf lowering
	; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O3-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O3-NEXT: Lower OpenCL enqueued blocks			; GCN-O3-NEXT: Lower OpenCL enqueued blocks
	; GCN-O3-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O3-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O3-NEXT: AMDGPU Attributor			; GCN-O3-NEXT: AMDGPU Attributor
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Cycle Info Analysis			; GCN-O3-NEXT: Cycle Info Analysis
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Infer address spaces			; GCN-O3-NEXT: Infer address spaces
	▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Machine Optimization Remark Emitter			; GCN-O3-NEXT: Machine Optimization Remark Emitter
	; GCN-O3-NEXT: Stack Frame Layout Analysis			; GCN-O3-NEXT: Stack Frame Layout Analysis
	; GCN-O3-NEXT: Function register usage analysis			; GCN-O3-NEXT: Function register usage analysis
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis			; GCN-O3-NEXT: Lazy Machine Block Frequency Analysis
	; GCN-O3-NEXT: Machine Optimization Remark Emitter			; GCN-O3-NEXT: Machine Optimization Remark Emitter
	; GCN-O3-NEXT: AMDGPU Assembly Printer			; GCN-O3-NEXT: AMDGPU Assembly Printer
	; GCN-O3-NEXT: Free MachineFunction			; GCN-O3-NEXT: Free MachineFunction
	; GCN-O3-NEXT:Pass Arguments: -domtree
	; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Dominator Tree Construction

	define void @empty() {			define void @empty() {
	ret void			ret void
	}			}