This is an archive of the discontinued LLVM Phabricator instance.

If, as an example, we focus on a call to tanh taking float4, oneAPI will generate a call to __spirv_ocl_tanh(float vector[4]), which under the hood for AMD backend is implemented as a sequence of scalar calls to __ocml_tanh_f32 (provided by AMD's device lib implementation).

Because AMDGPURemoveIncompatibleFunctions is now run early on, and as __spirv_ocl_tanh(float vector[4]) requires (among others) target support of FeatureDot3Inst (which is not provided on the GPU in question gfx1030), it is being earmarked for deletion and we end up with modules containing calls to deleted funcs:

%call3.i.i.i = tail call fastcc noundef <4 x float> null(<4 x float> noundef %agg.tmp.sroa.0.0.copyload.i)

As all the libclc functions are marked with always inline, it used to work well, as we would always replace the problematic vector function calls with exploded scalar versions of __ocml_..., this was handled for us by always inliner pass.

Would it be possible to have the incompatible functions pass run after the inliners?

In D155987#4598269, @jchlanda wrote:

@arsenm would you be so kind and explain the reasoning behind reshuffling the order of the passes.

If the function is unsupportable and going to be deleted, there's no point in running any of the other passes on the function. I was trying to keep all the module passes together

In oneAPI math functions are handled through libclc which implements spir-v interface.

If, as an example, we focus on a call to tanh taking float4, oneAPI will generate a call to __spirv_ocl_tanh(float vector[4]), which under the hood for AMD backend is implemented as a sequence of scalar calls to __ocml_tanh_f32 (provided by AMD's device lib implementation).

Because AMDGPURemoveIncompatibleFunctions is now run early on, and as __spirv_ocl_tanh(float vector[4]) requires (among others) target support of FeatureDot3Inst (which is not provided on the GPU in question gfx1030), it is being earmarked for deletion and we end up with modules containing calls to deleted funcs:

But __ocml_tanh_f32 doesn't use a dot intrinsic? Not sure how or where you would be seeing that. ocml is currently free of subtarget feature dependence

%call3.i.i.i = tail call fastcc noundef <4 x float> null(<4 x float> noundef %agg.tmp.sroa.0.0.copyload.i)
As all the libclc functions are marked with always inline, it used to work well, as we would always replace the problematic vector function calls with exploded scalar versions of __ocml_..., this was handled for us by always inliner pass.

libclc should stop doing using always_inline (and we should stop running the always inliner in the backend (I'm moving towards deleting it in D152414). Function calls should work well now and we should act like a normal target, we don't need these old hacks to support old gaps in codegen support.

Would it be possible to have the incompatible functions pass run after the inliners?

The inliner should not be able to fix your code. If this is deleting the function, then it shouldn't have been an inlining candidate in the first place. Something is failing to consider the incompatible feature.

This pass is truly a disgusting and unmaintainable hack I want to eventually remove. The way every other target expects this to be handled is to not allow unsupportable code to taint the module in the first place. This situation would never work in a world where machine linking happens, and you're relying on a compiler implementation detail to produce a working program. A better implementation would be to conditionally load specialized versions of functions when the target is known, not pretend we have one generic implementation that the compiler fixes up. I've been working towards this recently, and have purged all of the subtarget dependence from ocml so I'm not sure why you're running into this. There are still many problematic subtarget dependencies in ockl functions though

Hi @arsen
Apologies for a late reply, this should teach me not to post a day before my holidays.

But __ocml_tanh_f32 doesn't use a dot intrinsic? Not sure how or where you would be seeing that. ocml is currently free of subtarget feature dependence

The inliner should not be able to fix your code. If this is deleting the function, then it shouldn't have been an inlining candidate in the first place. Something is failing to consider the incompatible feature.

This was exactly the problem with our implementation, we had assumptions with regards to the target embedded in the libclc bitcode that were not portable across architectures. Thank you very much for your help, much appreciated!

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

9 lines

test/

CodeGen/

AMDGPU/

llc-pipeline.ll

10 lines

Diff 543027

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 961 Lines • ▼ Show 20 Lines	void AMDGPUPassConfig::addStraightLineScalarOptimizationPasses() {
// NaryReassociate on GEPs creates redundant common expressions, so run		// NaryReassociate on GEPs creates redundant common expressions, so run
// EarlyCSE after it.		// EarlyCSE after it.
addPass(createEarlyCSEPass());		addPass(createEarlyCSEPass());
}		}

void AMDGPUPassConfig::addIRPasses() {		void AMDGPUPassConfig::addIRPasses() {
const AMDGPUTargetMachine &TM = getAMDGPUTargetMachine();		const AMDGPUTargetMachine &TM = getAMDGPUTargetMachine();

		Triple::ArchType Arch = TM.getTargetTriple().getArch();
		if (RemoveIncompatibleFunctions && Arch == Triple::amdgcn)
		addPass(createAMDGPURemoveIncompatibleFunctionsPass(&TM));

// There is no reason to run these.		// There is no reason to run these.
disablePass(&StackMapLivenessID);		disablePass(&StackMapLivenessID);
disablePass(&FuncletLayoutID);		disablePass(&FuncletLayoutID);
disablePass(&PatchableFunctionID);		disablePass(&PatchableFunctionID);

addPass(createAMDGPUPrintfRuntimeBinding());		addPass(createAMDGPUPrintfRuntimeBinding());
if (LowerCtorDtor)		if (LowerCtorDtor)
addPass(createAMDGPUCtorDtorLoweringLegacyPass());		addPass(createAMDGPUCtorDtorLoweringLegacyPass());

// Function calls are not supported, so make sure we inline everything.		// Function calls are not supported, so make sure we inline everything.
addPass(createAMDGPUAlwaysInlinePass());		addPass(createAMDGPUAlwaysInlinePass());
addPass(createAlwaysInlinerLegacyPass());		addPass(createAlwaysInlinerLegacyPass());

// Handle uses of OpenCL image2d_t, image3d_t and sampler_t arguments.		// Handle uses of OpenCL image2d_t, image3d_t and sampler_t arguments.
if (TM.getTargetTriple().getArch() == Triple::r600)		if (Arch == Triple::r600)
addPass(createR600OpenCLImageTypeLoweringPass());		addPass(createR600OpenCLImageTypeLoweringPass());

// Replace OpenCL enqueued block function pointers with global variables.		// Replace OpenCL enqueued block function pointers with global variables.
addPass(createAMDGPUOpenCLEnqueuedBlockLoweringPass());		addPass(createAMDGPUOpenCLEnqueuedBlockLoweringPass());

// Runs before PromoteAlloca so the latter can account for function uses		// Runs before PromoteAlloca so the latter can account for function uses
if (EnableLowerModuleLDS) {		if (EnableLowerModuleLDS) {
addPass(createAMDGPULowerModuleLDSPass());		addPass(createAMDGPULowerModuleLDSPass());
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	void AMDGPUPassConfig::addIRPasses() {
//		//
// but EarlyCSE can do neither of them.		// but EarlyCSE can do neither of them.
if (isPassEnabled(EnableScalarIRPasses))		if (isPassEnabled(EnableScalarIRPasses))
addEarlyCSEOrGVNPass();		addEarlyCSEOrGVNPass();
}		}

void AMDGPUPassConfig::addCodeGenPrepare() {		void AMDGPUPassConfig::addCodeGenPrepare() {
if (TM->getTargetTriple().getArch() == Triple::amdgcn) {		if (TM->getTargetTriple().getArch() == Triple::amdgcn) {
if (RemoveIncompatibleFunctions)
addPass(createAMDGPURemoveIncompatibleFunctionsPass(TM));

// FIXME: This pass adds 2 hacky attributes that can be replaced with an		// FIXME: This pass adds 2 hacky attributes that can be replaced with an
// analysis, and should be removed.		// analysis, and should be removed.
addPass(createAMDGPUAnnotateKernelFeaturesPass());		addPass(createAMDGPUAnnotateKernelFeaturesPass());
}		}

if (TM->getTargetTriple().getArch() == Triple::amdgcn &&		if (TM->getTargetTriple().getArch() == Triple::amdgcn &&
EnableLowerKernelArguments)		EnableLowerKernelArguments)
addPass(createAMDGPULowerKernelArgumentsPass());		addPass(createAMDGPULowerKernelArgumentsPass());
▲ Show 20 Lines • Show All 581 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show All 22 Lines
	; GCN-O0-NEXT:Create Garbage Collector Module Metadata			; GCN-O0-NEXT:Create Garbage Collector Module Metadata
	; GCN-O0-NEXT:Register Usage Information Storage			; GCN-O0-NEXT:Register Usage Information Storage
	; GCN-O0-NEXT:Machine Branch Probability Analysis			; GCN-O0-NEXT:Machine Branch Probability Analysis
	; GCN-O0-NEXT: ModulePass Manager			; GCN-O0-NEXT: ModulePass Manager
	; GCN-O0-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O0-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Expand large div/rem			; GCN-O0-NEXT: Expand large div/rem
	; GCN-O0-NEXT: Expand large fp convert			; GCN-O0-NEXT: Expand large fp convert
				; GCN-O0-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O0-NEXT: AMDGPU Printf lowering			; GCN-O0-NEXT: AMDGPU Printf lowering
	; GCN-O0-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O0-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O0-NEXT: AMDGPU Inline All Functions			; GCN-O0-NEXT: AMDGPU Inline All Functions
	; GCN-O0-NEXT: Inliner for always_inline functions			; GCN-O0-NEXT: Inliner for always_inline functions
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Dominator Tree Construction			; GCN-O0-NEXT: Dominator Tree Construction
	; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O0-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O0-NEXT: Function Alias Analysis Results			; GCN-O0-NEXT: Function Alias Analysis Results
	; GCN-O0-NEXT: Lower OpenCL enqueued blocks			; GCN-O0-NEXT: Lower OpenCL enqueued blocks
	; GCN-O0-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O0-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: Expand Atomic instructions			; GCN-O0-NEXT: Expand Atomic instructions
	; GCN-O0-NEXT: Lower constant intrinsics			; GCN-O0-NEXT: Lower constant intrinsics
	; GCN-O0-NEXT: Remove unreachable blocks from the CFG			; GCN-O0-NEXT: Remove unreachable blocks from the CFG
	; GCN-O0-NEXT: Expand vector predication intrinsics			; GCN-O0-NEXT: Expand vector predication intrinsics
	; GCN-O0-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O0-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O0-NEXT: Expand reduction intrinsics			; GCN-O0-NEXT: Expand reduction intrinsics
	; GCN-O0-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O0-NEXT: CallGraph Construction			; GCN-O0-NEXT: CallGraph Construction
	; GCN-O0-NEXT: Call Graph SCC Pass Manager			; GCN-O0-NEXT: Call Graph SCC Pass Manager
	; GCN-O0-NEXT: AMDGPU Annotate Kernel Features			; GCN-O0-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
	; GCN-O0-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O0-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O0-NEXT: Lazy Value Information Analysis			; GCN-O0-NEXT: Lazy Value Information Analysis
	; GCN-O0-NEXT: Lower SwitchInst's to branches			; GCN-O0-NEXT: Lower SwitchInst's to branches
	; GCN-O0-NEXT: Lower invoke and unwind, for unwindless code generators			; GCN-O0-NEXT: Lower invoke and unwind, for unwindless code generators
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT:Register Usage Information Storage			; GCN-O1-NEXT:Register Usage Information Storage
	; GCN-O1-NEXT:Default Regalloc Eviction Advisor			; GCN-O1-NEXT:Default Regalloc Eviction Advisor
	; GCN-O1-NEXT:Default Regalloc Priority Advisor			; GCN-O1-NEXT:Default Regalloc Priority Advisor
	; GCN-O1-NEXT: ModulePass Manager			; GCN-O1-NEXT: ModulePass Manager
	; GCN-O1-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O1-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Expand large div/rem			; GCN-O1-NEXT: Expand large div/rem
	; GCN-O1-NEXT: Expand large fp convert			; GCN-O1-NEXT: Expand large fp convert
				; GCN-O1-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O1-NEXT: AMDGPU Printf lowering			; GCN-O1-NEXT: AMDGPU Printf lowering
	; GCN-O1-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O1-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O1-NEXT: AMDGPU Inline All Functions			; GCN-O1-NEXT: AMDGPU Inline All Functions
	; GCN-O1-NEXT: Inliner for always_inline functions			; GCN-O1-NEXT: Inliner for always_inline functions
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Function Alias Analysis Results			; GCN-O1-NEXT: Function Alias Analysis Results
	Show All 35 Lines
	; GCN-O1-NEXT: Constant Hoisting			; GCN-O1-NEXT: Constant Hoisting
	; GCN-O1-NEXT: Replace intrinsics with calls to vector library			; GCN-O1-NEXT: Replace intrinsics with calls to vector library
	; GCN-O1-NEXT: Partially inline calls to library functions			; GCN-O1-NEXT: Partially inline calls to library functions
	; GCN-O1-NEXT: Expand vector predication intrinsics			; GCN-O1-NEXT: Expand vector predication intrinsics
	; GCN-O1-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O1-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O1-NEXT: Expand reduction intrinsics			; GCN-O1-NEXT: Expand reduction intrinsics
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: TLS Variable Hoist			; GCN-O1-NEXT: TLS Variable Hoist
	; GCN-O1-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O1-NEXT: CallGraph Construction			; GCN-O1-NEXT: CallGraph Construction
	; GCN-O1-NEXT: Call Graph SCC Pass Manager			; GCN-O1-NEXT: Call Graph SCC Pass Manager
	; GCN-O1-NEXT: AMDGPU Annotate Kernel Features			; GCN-O1-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O1-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: CodeGen Prepare			; GCN-O1-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT:Register Usage Information Storage			; GCN-O1-OPTS-NEXT:Register Usage Information Storage
	; GCN-O1-OPTS-NEXT:Default Regalloc Eviction Advisor			; GCN-O1-OPTS-NEXT:Default Regalloc Eviction Advisor
	; GCN-O1-OPTS-NEXT:Default Regalloc Priority Advisor			; GCN-O1-OPTS-NEXT:Default Regalloc Priority Advisor
	; GCN-O1-OPTS-NEXT: ModulePass Manager			; GCN-O1-OPTS-NEXT: ModulePass Manager
	; GCN-O1-OPTS-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O1-OPTS-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Expand large div/rem			; GCN-O1-OPTS-NEXT: Expand large div/rem
	; GCN-O1-OPTS-NEXT: Expand large fp convert			; GCN-O1-OPTS-NEXT: Expand large fp convert
				; GCN-O1-OPTS-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O1-OPTS-NEXT: AMDGPU Printf lowering			; GCN-O1-OPTS-NEXT: AMDGPU Printf lowering
	; GCN-O1-OPTS-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O1-OPTS-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O1-OPTS-NEXT: AMDGPU Inline All Functions			; GCN-O1-OPTS-NEXT: AMDGPU Inline All Functions
	; GCN-O1-OPTS-NEXT: Inliner for always_inline functions			; GCN-O1-OPTS-NEXT: Inliner for always_inline functions
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: Replace intrinsics with calls to vector library			; GCN-O1-OPTS-NEXT: Replace intrinsics with calls to vector library
	; GCN-O1-OPTS-NEXT: Partially inline calls to library functions			; GCN-O1-OPTS-NEXT: Partially inline calls to library functions
	; GCN-O1-OPTS-NEXT: Expand vector predication intrinsics			; GCN-O1-OPTS-NEXT: Expand vector predication intrinsics
	; GCN-O1-OPTS-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O1-OPTS-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O1-OPTS-NEXT: Expand reduction intrinsics			; GCN-O1-OPTS-NEXT: Expand reduction intrinsics
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: TLS Variable Hoist			; GCN-O1-OPTS-NEXT: TLS Variable Hoist
	; GCN-O1-OPTS-NEXT: Early CSE			; GCN-O1-OPTS-NEXT: Early CSE
	; GCN-O1-OPTS-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O1-OPTS-NEXT: CallGraph Construction			; GCN-O1-OPTS-NEXT: CallGraph Construction
	; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager			; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager
	; GCN-O1-OPTS-NEXT: AMDGPU Annotate Kernel Features			; GCN-O1-OPTS-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O1-OPTS-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: CodeGen Prepare			; GCN-O1-OPTS-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT:Register Usage Information Storage			; GCN-O2-NEXT:Register Usage Information Storage
	; GCN-O2-NEXT:Default Regalloc Eviction Advisor			; GCN-O2-NEXT:Default Regalloc Eviction Advisor
	; GCN-O2-NEXT:Default Regalloc Priority Advisor			; GCN-O2-NEXT:Default Regalloc Priority Advisor
	; GCN-O2-NEXT: ModulePass Manager			; GCN-O2-NEXT: ModulePass Manager
	; GCN-O2-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O2-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Expand large div/rem			; GCN-O2-NEXT: Expand large div/rem
	; GCN-O2-NEXT: Expand large fp convert			; GCN-O2-NEXT: Expand large fp convert
				; GCN-O2-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O2-NEXT: AMDGPU Printf lowering			; GCN-O2-NEXT: AMDGPU Printf lowering
	; GCN-O2-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O2-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O2-NEXT: AMDGPU Inline All Functions			; GCN-O2-NEXT: AMDGPU Inline All Functions
	; GCN-O2-NEXT: Inliner for always_inline functions			; GCN-O2-NEXT: Inliner for always_inline functions
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: Replace intrinsics with calls to vector library			; GCN-O2-NEXT: Replace intrinsics with calls to vector library
	; GCN-O2-NEXT: Partially inline calls to library functions			; GCN-O2-NEXT: Partially inline calls to library functions
	; GCN-O2-NEXT: Expand vector predication intrinsics			; GCN-O2-NEXT: Expand vector predication intrinsics
	; GCN-O2-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O2-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O2-NEXT: Expand reduction intrinsics			; GCN-O2-NEXT: Expand reduction intrinsics
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: TLS Variable Hoist			; GCN-O2-NEXT: TLS Variable Hoist
	; GCN-O2-NEXT: Early CSE			; GCN-O2-NEXT: Early CSE
	; GCN-O2-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O2-NEXT: CallGraph Construction			; GCN-O2-NEXT: CallGraph Construction
	; GCN-O2-NEXT: Call Graph SCC Pass Manager			; GCN-O2-NEXT: Call Graph SCC Pass Manager
	; GCN-O2-NEXT: AMDGPU Annotate Kernel Features			; GCN-O2-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O2-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: CodeGen Prepare			; GCN-O2-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 220 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT:Register Usage Information Storage			; GCN-O3-NEXT:Register Usage Information Storage
	; GCN-O3-NEXT:Default Regalloc Eviction Advisor			; GCN-O3-NEXT:Default Regalloc Eviction Advisor
	; GCN-O3-NEXT:Default Regalloc Priority Advisor			; GCN-O3-NEXT:Default Regalloc Priority Advisor
	; GCN-O3-NEXT: ModulePass Manager			; GCN-O3-NEXT: ModulePass Manager
	; GCN-O3-NEXT: Pre-ISel Intrinsic Lowering			; GCN-O3-NEXT: Pre-ISel Intrinsic Lowering
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Expand large div/rem			; GCN-O3-NEXT: Expand large div/rem
	; GCN-O3-NEXT: Expand large fp convert			; GCN-O3-NEXT: Expand large fp convert
				; GCN-O3-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O3-NEXT: AMDGPU Printf lowering			; GCN-O3-NEXT: AMDGPU Printf lowering
	; GCN-O3-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O3-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O3-NEXT: AMDGPU Inline All Functions			; GCN-O3-NEXT: AMDGPU Inline All Functions
	; GCN-O3-NEXT: Inliner for always_inline functions			; GCN-O3-NEXT: Inliner for always_inline functions
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: TLS Variable Hoist			; GCN-O3-NEXT: TLS Variable Hoist
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory Dependence Analysis			; GCN-O3-NEXT: Memory Dependence Analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Global Value Numbering			; GCN-O3-NEXT: Global Value Numbering
	; GCN-O3-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O3-NEXT: CallGraph Construction			; GCN-O3-NEXT: CallGraph Construction
	; GCN-O3-NEXT: Call Graph SCC Pass Manager			; GCN-O3-NEXT: Call Graph SCC Pass Manager
	; GCN-O3-NEXT: AMDGPU Annotate Kernel Features			; GCN-O3-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O3-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: CodeGen Prepare			; GCN-O3-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Move placement of RemoveIncompatibleFunctionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 543027

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

AMDGPU: Move placement of RemoveIncompatibleFunctions
ClosedPublic