This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Move AMDGPUAttributor run earlier
ClosedPublic

Authored by arsenm on Jun 8 2023, 10:22 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
jdoerfert
rampitec
Pierre-vh
sstefan1

Group Reviewers

Restricted Project

Summary

Move it up with other module passes. It's a higher level optimization
that should probably be done before hacking up the IR for codegen. It
should really be done earlier than this. We could possibly move this
with other IPO passes, but we'd have to stop inferring the lack of
lds.kernel.id calls and have the LDS module pass mark functions which
don't need the ID.

The one test change is because that pass is relying on the backend run
of SROA (which we ideally wouldn't have).

Diff Detail

Event Timeline

arsenm created this revision.Jun 8 2023, 10:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2023, 10:22 AM

Herald added subscribers: foad, okura, kerbowa and 6 others. · View Herald Transcript

arsenm requested review of this revision.Jun 8 2023, 10:22 AM

Herald added a reviewer: sstefan1. · View Herald TranscriptJun 8 2023, 10:22 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B237545: Diff 529666.Jun 8 2023, 10:23 AM

Pierre-vh accepted this revision as: Pierre-vh.Jun 9 2023, 12:40 AM

Pierre-vh added inline comments.

llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll
46–48	nit: Do you know why there's more registers being used? We go up to s31 for the call at the end so I guess the SGPR usage is the same overall, I'm just wondering.

This revision is now accepted and ready to land.Jun 9 2023, 12:40 AM

arsenm added inline comments.Jun 9 2023, 6:15 PM

llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll
46–48	This is what the change description referenced. Since InferAddressSpaces/SROA no longer eliminate the cast and alloca, it doesn't figure out it can turn off all the inputs

d7d4aa539c0d2f80c080a3b1e0fa45a78d5e9bfc

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

8 lines

test/

CodeGen/

AMDGPU/

llc-pipeline.ll

24 lines

simple-indirect-call.ll

6 lines

Diff 529666

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 1,000 Lines • ▼ Show 20 Lines	void AMDGPUPassConfig::addIRPasses() {
// Replace OpenCL enqueued block function pointers with global variables.		// Replace OpenCL enqueued block function pointers with global variables.
addPass(createAMDGPUOpenCLEnqueuedBlockLoweringPass());		addPass(createAMDGPUOpenCLEnqueuedBlockLoweringPass());

// Runs before PromoteAlloca so the latter can account for function uses		// Runs before PromoteAlloca so the latter can account for function uses
if (EnableLowerModuleLDS) {		if (EnableLowerModuleLDS) {
addPass(createAMDGPULowerModuleLDSPass());		addPass(createAMDGPULowerModuleLDSPass());
}		}

		// AMDGPUAttributor infers lack of llvm.amdgcn.lds.kernel.id calls, so run
		// after their introduction
		if (TM.getOptLevel() > CodeGenOpt::None)
		addPass(createAMDGPUAttributorPass());

if (TM.getOptLevel() > CodeGenOpt::None)		if (TM.getOptLevel() > CodeGenOpt::None)
addPass(createInferAddressSpacesPass());		addPass(createInferAddressSpacesPass());

addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass());

if (TM.getOptLevel() > CodeGenOpt::None) {		if (TM.getOptLevel() > CodeGenOpt::None) {
addPass(createAMDGPUPromoteAlloca());		addPass(createAMDGPUPromoteAlloca());

Show All 40 Lines	if (isPassEnabled(EnableScalarIRPasses))
addEarlyCSEOrGVNPass();		addEarlyCSEOrGVNPass();
}		}

void AMDGPUPassConfig::addCodeGenPrepare() {		void AMDGPUPassConfig::addCodeGenPrepare() {
if (TM->getTargetTriple().getArch() == Triple::amdgcn) {		if (TM->getTargetTriple().getArch() == Triple::amdgcn) {
if (RemoveIncompatibleFunctions)		if (RemoveIncompatibleFunctions)
addPass(createAMDGPURemoveIncompatibleFunctionsPass(TM));		addPass(createAMDGPURemoveIncompatibleFunctionsPass(TM));

if (TM->getOptLevel() > CodeGenOpt::None)
addPass(createAMDGPUAttributorPass());

// FIXME: This pass adds 2 hacky attributes that can be replaced with an		// FIXME: This pass adds 2 hacky attributes that can be replaced with an
// analysis, and should be removed.		// analysis, and should be removed.
addPass(createAMDGPUAnnotateKernelFeaturesPass());		addPass(createAMDGPUAnnotateKernelFeaturesPass());
}		}

if (TM->getTargetTriple().getArch() == Triple::amdgcn &&		if (TM->getTargetTriple().getArch() == Triple::amdgcn &&
EnableLowerKernelArguments)		EnableLowerKernelArguments)
addPass(createAMDGPULowerKernelArgumentsPass());		addPass(createAMDGPULowerKernelArgumentsPass());
▲ Show 20 Lines • Show All 563 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: AMDGPU Printf lowering			; GCN-O1-NEXT: AMDGPU Printf lowering
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O1-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Early propagate attributes from kernels to functions			; GCN-O1-NEXT: Early propagate attributes from kernels to functions
	; GCN-O1-NEXT: Lower OpenCL enqueued blocks			; GCN-O1-NEXT: Lower OpenCL enqueued blocks
	; GCN-O1-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O1-NEXT: Lower uses of LDS variables from non-kernel functions
				; GCN-O1-NEXT: AMDGPU Attributor
				; GCN-O1-NEXT: FunctionPass Manager
				; GCN-O1-NEXT: Cycle Info Analysis
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Infer address spaces			; GCN-O1-NEXT: Infer address spaces
	; GCN-O1-NEXT: Expand Atomic instructions			; GCN-O1-NEXT: Expand Atomic instructions
	; GCN-O1-NEXT: AMDGPU Promote Alloca			; GCN-O1-NEXT: AMDGPU Promote Alloca
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: SROA			; GCN-O1-NEXT: SROA
	; GCN-O1-NEXT: Cycle Info Analysis			; GCN-O1-NEXT: Cycle Info Analysis
	; GCN-O1-NEXT: Uniformity Analysis			; GCN-O1-NEXT: Uniformity Analysis
	Show All 23 Lines
	; GCN-O1-NEXT: Replace intrinsics with calls to vector library			; GCN-O1-NEXT: Replace intrinsics with calls to vector library
	; GCN-O1-NEXT: Partially inline calls to library functions			; GCN-O1-NEXT: Partially inline calls to library functions
	; GCN-O1-NEXT: Expand vector predication intrinsics			; GCN-O1-NEXT: Expand vector predication intrinsics
	; GCN-O1-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O1-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O1-NEXT: Expand reduction intrinsics			; GCN-O1-NEXT: Expand reduction intrinsics
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: TLS Variable Hoist			; GCN-O1-NEXT: TLS Variable Hoist
	; GCN-O1-NEXT: AMDGPU Remove Incompatible Functions			; GCN-O1-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O1-NEXT: AMDGPU Attributor
	; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Cycle Info Analysis
	; GCN-O1-NEXT: CallGraph Construction			; GCN-O1-NEXT: CallGraph Construction
	; GCN-O1-NEXT: Call Graph SCC Pass Manager			; GCN-O1-NEXT: Call Graph SCC Pass Manager
	; GCN-O1-NEXT: AMDGPU Annotate Kernel Features			; GCN-O1-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O1-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: CodeGen Prepare			; GCN-O1-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: AMDGPU Printf lowering			; GCN-O1-OPTS-NEXT: AMDGPU Printf lowering
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O1-OPTS-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Early propagate attributes from kernels to functions			; GCN-O1-OPTS-NEXT: Early propagate attributes from kernels to functions
	; GCN-O1-OPTS-NEXT: Lower OpenCL enqueued blocks			; GCN-O1-OPTS-NEXT: Lower OpenCL enqueued blocks
	; GCN-O1-OPTS-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O1-OPTS-NEXT: Lower uses of LDS variables from non-kernel functions
				; GCN-O1-OPTS-NEXT: AMDGPU Attributor
				; GCN-O1-OPTS-NEXT: FunctionPass Manager
				; GCN-O1-OPTS-NEXT: Cycle Info Analysis
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Infer address spaces			; GCN-O1-OPTS-NEXT: Infer address spaces
	; GCN-O1-OPTS-NEXT: Expand Atomic instructions			; GCN-O1-OPTS-NEXT: Expand Atomic instructions
	; GCN-O1-OPTS-NEXT: AMDGPU Promote Alloca			; GCN-O1-OPTS-NEXT: AMDGPU Promote Alloca
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: SROA			; GCN-O1-OPTS-NEXT: SROA
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Scalar Evolution Analysis			; GCN-O1-OPTS-NEXT: Scalar Evolution Analysis
	Show All 31 Lines
	; GCN-O1-OPTS-NEXT: Partially inline calls to library functions			; GCN-O1-OPTS-NEXT: Partially inline calls to library functions
	; GCN-O1-OPTS-NEXT: Expand vector predication intrinsics			; GCN-O1-OPTS-NEXT: Expand vector predication intrinsics
	; GCN-O1-OPTS-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O1-OPTS-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O1-OPTS-NEXT: Expand reduction intrinsics			; GCN-O1-OPTS-NEXT: Expand reduction intrinsics
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: TLS Variable Hoist			; GCN-O1-OPTS-NEXT: TLS Variable Hoist
	; GCN-O1-OPTS-NEXT: Early CSE			; GCN-O1-OPTS-NEXT: Early CSE
	; GCN-O1-OPTS-NEXT: AMDGPU Remove Incompatible Functions			; GCN-O1-OPTS-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O1-OPTS-NEXT: AMDGPU Attributor
	; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Cycle Info Analysis
	; GCN-O1-OPTS-NEXT: CallGraph Construction			; GCN-O1-OPTS-NEXT: CallGraph Construction
	; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager			; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager
	; GCN-O1-OPTS-NEXT: AMDGPU Annotate Kernel Features			; GCN-O1-OPTS-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O1-OPTS-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: CodeGen Prepare			; GCN-O1-OPTS-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: AMDGPU Printf lowering			; GCN-O2-NEXT: AMDGPU Printf lowering
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O2-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Early propagate attributes from kernels to functions			; GCN-O2-NEXT: Early propagate attributes from kernels to functions
	; GCN-O2-NEXT: Lower OpenCL enqueued blocks			; GCN-O2-NEXT: Lower OpenCL enqueued blocks
	; GCN-O2-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O2-NEXT: Lower uses of LDS variables from non-kernel functions
				; GCN-O2-NEXT: AMDGPU Attributor
				; GCN-O2-NEXT: FunctionPass Manager
				; GCN-O2-NEXT: Cycle Info Analysis
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Infer address spaces			; GCN-O2-NEXT: Infer address spaces
	; GCN-O2-NEXT: Expand Atomic instructions			; GCN-O2-NEXT: Expand Atomic instructions
	; GCN-O2-NEXT: AMDGPU Promote Alloca			; GCN-O2-NEXT: AMDGPU Promote Alloca
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: SROA			; GCN-O2-NEXT: SROA
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
	Show All 39 Lines
	; GCN-O2-NEXT: Partially inline calls to library functions			; GCN-O2-NEXT: Partially inline calls to library functions
	; GCN-O2-NEXT: Expand vector predication intrinsics			; GCN-O2-NEXT: Expand vector predication intrinsics
	; GCN-O2-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O2-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O2-NEXT: Expand reduction intrinsics			; GCN-O2-NEXT: Expand reduction intrinsics
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: TLS Variable Hoist			; GCN-O2-NEXT: TLS Variable Hoist
	; GCN-O2-NEXT: Early CSE			; GCN-O2-NEXT: Early CSE
	; GCN-O2-NEXT: AMDGPU Remove Incompatible Functions			; GCN-O2-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O2-NEXT: AMDGPU Attributor
	; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Cycle Info Analysis
	; GCN-O2-NEXT: CallGraph Construction			; GCN-O2-NEXT: CallGraph Construction
	; GCN-O2-NEXT: Call Graph SCC Pass Manager			; GCN-O2-NEXT: Call Graph SCC Pass Manager
	; GCN-O2-NEXT: AMDGPU Annotate Kernel Features			; GCN-O2-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O2-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: CodeGen Prepare			; GCN-O2-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: AMDGPU Printf lowering			; GCN-O3-NEXT: AMDGPU Printf lowering
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Lower ctors and dtors for AMDGPU			; GCN-O3-NEXT: Lower ctors and dtors for AMDGPU
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Early propagate attributes from kernels to functions			; GCN-O3-NEXT: Early propagate attributes from kernels to functions
	; GCN-O3-NEXT: Lower OpenCL enqueued blocks			; GCN-O3-NEXT: Lower OpenCL enqueued blocks
	; GCN-O3-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O3-NEXT: Lower uses of LDS variables from non-kernel functions
				; GCN-O3-NEXT: AMDGPU Attributor
				; GCN-O3-NEXT: FunctionPass Manager
				; GCN-O3-NEXT: Cycle Info Analysis
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Infer address spaces			; GCN-O3-NEXT: Infer address spaces
	; GCN-O3-NEXT: Expand Atomic instructions			; GCN-O3-NEXT: Expand Atomic instructions
	; GCN-O3-NEXT: AMDGPU Promote Alloca			; GCN-O3-NEXT: AMDGPU Promote Alloca
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: SROA			; GCN-O3-NEXT: SROA
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Scalar Evolution Analysis			; GCN-O3-NEXT: Scalar Evolution Analysis
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory Dependence Analysis			; GCN-O3-NEXT: Memory Dependence Analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	; GCN-O3-NEXT: Lazy Block Frequency Analysis			; GCN-O3-NEXT: Lazy Block Frequency Analysis
	; GCN-O3-NEXT: Optimization Remark Emitter			; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Global Value Numbering			; GCN-O3-NEXT: Global Value Numbering
	; GCN-O3-NEXT: AMDGPU Remove Incompatible Functions			; GCN-O3-NEXT: AMDGPU Remove Incompatible Functions
	; GCN-O3-NEXT: AMDGPU Attributor
	; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Cycle Info Analysis
	; GCN-O3-NEXT: CallGraph Construction			; GCN-O3-NEXT: CallGraph Construction
	; GCN-O3-NEXT: Call Graph SCC Pass Manager			; GCN-O3-NEXT: Call Graph SCC Pass Manager
	; GCN-O3-NEXT: AMDGPU Annotate Kernel Features			; GCN-O3-NEXT: AMDGPU Annotate Kernel Features
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: AMDGPU Lower Kernel Arguments			; GCN-O3-NEXT: AMDGPU Lower Kernel Arguments
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: CodeGen Prepare			; GCN-O3-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll

	Show All 37 Lines
	; ATTRIBUTOR_GCN-NEXT: store ptr @indirect, ptr [[FPTR_CAST]], align 8			; ATTRIBUTOR_GCN-NEXT: store ptr @indirect, ptr [[FPTR_CAST]], align 8
	; ATTRIBUTOR_GCN-NEXT: [[FP:%.*]] = load ptr, ptr [[FPTR_CAST]], align 8			; ATTRIBUTOR_GCN-NEXT: [[FP:%.*]] = load ptr, ptr [[FPTR_CAST]], align 8
	; ATTRIBUTOR_GCN-NEXT: call void [[FP]]()			; ATTRIBUTOR_GCN-NEXT: call void [[FP]]()
	; ATTRIBUTOR_GCN-NEXT: ret void			; ATTRIBUTOR_GCN-NEXT: ret void
	;			;
	; GFX9-LABEL: test_simple_indirect_call:			; GFX9-LABEL: test_simple_indirect_call:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x4			; GFX9-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x4
	; GFX9-NEXT: s_add_u32 flat_scratch_lo, s6, s9			; GFX9-NEXT: s_add_u32 flat_scratch_lo, s12, s17
	; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s13, 0
	; GFX9-NEXT: s_add_u32 s0, s0, s9			; GFX9-NEXT: s_add_u32 s0, s0, s17
				Pierre-vhUnsubmitted Not Done Reply Inline Actions nit: Do you know why there's more registers being used? We go up to s31 for the call at the end so I guess the SGPR usage is the same overall, I'm just wondering. Pierre-vh: nit: Do you know why there's more registers being used? We go up to s31 for the call at the end…
				arsenmAuthorUnsubmitted Done Reply Inline Actions This is what the change description referenced. Since InferAddressSpaces/SROA no longer eliminate the cast and alloca, it doesn't figure out it can turn off all the inputs arsenm: This is what the change description referenced. Since InferAddressSpaces/SROA no longer…
	; GFX9-NEXT: s_addc_u32 s1, s1, 0			; GFX9-NEXT: s_addc_u32 s1, s1, 0
	; GFX9-NEXT: s_waitcnt lgkmcnt(0)			; GFX9-NEXT: s_waitcnt lgkmcnt(0)
	; GFX9-NEXT: s_lshr_b32 s4, s4, 16			; GFX9-NEXT: s_lshr_b32 s4, s4, 16
	; GFX9-NEXT: s_mul_i32 s4, s4, s5			; GFX9-NEXT: s_mul_i32 s4, s4, s5
	; GFX9-NEXT: v_mul_lo_u32 v0, s4, v0			; GFX9-NEXT: v_mul_lo_u32 v0, s4, v0
	; GFX9-NEXT: s_getpc_b64 s[6:7]			; GFX9-NEXT: s_getpc_b64 s[6:7]
	; GFX9-NEXT: s_add_u32 s6, s6, indirect@rel32@lo+4			; GFX9-NEXT: s_add_u32 s6, s6, indirect@rel32@lo+4
	; GFX9-NEXT: s_addc_u32 s7, s7, indirect@rel32@hi+12			; GFX9-NEXT: s_addc_u32 s7, s7, indirect@rel32@hi+12
	Show All 22 Lines