This is an archive of the discontinued LLVM Phabricator instance.

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
182	This is a question for my curiosity. Presumably if the 80 bytes/lane for %alloca is now on the stack, shouldn't we expect some other value like VGPRs to go down by 80 bytes (-20 VGPRs)?

arsenm accepted this revision.Jul 27 2023, 11:22 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
182	Mechanically that's not really how it works. In this case the stack isn't actually used for anything other than filler content (it's kind of a bug this was optimized out to begin with, this memset probably should have been volatile)

This revision is now accepted and ready to land.Jul 27 2023, 11:22 AM

@arsenm There is a crash in amdgpu_isel.test from check-llvm-tools-updatetestchecks
This function fails ISel:

; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 | FileCheck %s

define i64 @i64_test(i64 %i) nounwind readnone {
  %loc = alloca i64
  %j = load i64, i64 * %loc
  %r = add i64 %i, %j
  ret i64 %r
}

The FrameIndex fails:

t25: i32,ch = load<(dereferenceable load (s32) from %ir.loc, align 8)> # D:1 t0, FrameIndex:i64<0>, undef:i64

LLVM ERROR: Cannot select: t6: i64 = FrameIndex<0>
In function: i64_test

Likewise there is a regression in amdgpu_generated_funcs.ll tests.

I think SROA was doing some (good) stuff for these tests.
A simple fix seems to be to enable optimizations (-O3 works) for the crash.

I would suggest to just use O3 there and open a separate (internal or external) ticket for that ISel failure. What do you think?

Add test updates

Herald added a subscriber: arichardson. · View Herald TranscriptJul 28 2023, 12:36 AM

Pierre-vh requested review of this revision.Jul 28 2023, 12:37 AM

Harbormaster completed remote builds in B248765: Diff 545037.Jul 28 2023, 1:01 AM

In D156398#4541401, @Pierre-vh wrote:
@arsenm There is a crash in amdgpu_isel.test from check-llvm-tools-updatetestchecks
This function fails ISel:
; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 | FileCheck %s

define i64 @i64_test(i64 %i) nounwind readnone {
  %loc = alloca i64
  %j = load i64, i64 * %loc
  %r = add i64 %i, %j
  ret i64 %r
}

This test is just broken. It's using the wrong address space for the alloca and sroa just happened to delete it. Also this should use opaque pointers

arsenm added inline comments.Aug 1 2023, 5:00 PM

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_isel.ll
4 ↗	(On Diff #545037)	Fixed this in 1f0d24ce24e92ed69f949d6974b87a10af27bd2b

Rebase, also fix other test

Harbormaster completed remote builds in B249959: Diff 546709.Aug 3 2023, 12:12 AM

arsenm accepted this revision.Aug 10 2023, 2:59 PM

This revision is now accepted and ready to land.Aug 10 2023, 2:59 PM

Closed by commit rG89e91e4c0c60: [AMDGPU] Remove post-PromoteAlloca SROA run (authored by Pierre-vh). · Explain WhyAug 10 2023, 11:29 PM

This revision was automatically updated to reflect the committed changes.

Pierre-vh added a commit: rG89e91e4c0c60: [AMDGPU] Remove post-PromoteAlloca SROA run.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUTargetMachine.cpp

8 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

irtranslator-sibling-call.ll

2 lines

captured-frame-index.ll

2 lines

cgp-addressing-modes.ll

8 lines

extload-private.ll

4 lines

frame-index-elimination.ll

6 lines

4 lines

4 lines

9 lines

8 lines

6 lines

parallelandifcollapse.ll

2 lines

resource-optimization-remarks.ll

8 lines

sibling-call.ll

6 lines

store-hi16.ll

8 lines

tools/

UpdateTestChecks/

update_llc_test_checks/

Inputs/

amdgpu_asm.ll

16 lines

amdgpu_asm.ll.expected

37 lines

amdgpu_generated_funcs.ll.generated.expected

49 lines

amdgpu_generated_funcs.ll.nogenerated.expected

49 lines

amdgpu_isel.ll.expected

61 lines

Diff 549255

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	static VGPRRegisterRegAlloc basicRegAllocVGPR(
"basic", "basic register allocator", createBasicVGPRRegisterAllocator);		"basic", "basic register allocator", createBasicVGPRRegisterAllocator);
static VGPRRegisterRegAlloc greedyRegAllocVGPR(		static VGPRRegisterRegAlloc greedyRegAllocVGPR(
"greedy", "greedy register allocator", createGreedyVGPRRegisterAllocator);		"greedy", "greedy register allocator", createGreedyVGPRRegisterAllocator);

static VGPRRegisterRegAlloc fastRegAllocVGPR(		static VGPRRegisterRegAlloc fastRegAllocVGPR(
"fast", "fast register allocator", createFastVGPRRegisterAllocator);		"fast", "fast register allocator", createFastVGPRRegisterAllocator);
}		}

static cl::opt<bool> EnableSROA(
"amdgpu-sroa",
cl::desc("Run SROA after promote alloca pass"),
cl::ReallyHidden,
cl::init(true));

static cl::opt<bool>		static cl::opt<bool>
EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,		EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
cl::desc("Run early if-conversion"),		cl::desc("Run early if-conversion"),
cl::init(false));		cl::init(false));

static cl::opt<bool>		static cl::opt<bool>
OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,		OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
cl::desc("Run pre-RA exec mask optimizations"),		cl::desc("Run pre-RA exec mask optimizations"),
▲ Show 20 Lines • Show All 816 Lines • ▼ Show 20 Lines	void AMDGPUPassConfig::addIRPasses() {
if (TM.getOptLevel() > CodeGenOpt::None)		if (TM.getOptLevel() > CodeGenOpt::None)
addPass(createInferAddressSpacesPass());		addPass(createInferAddressSpacesPass());

addPass(createAtomicExpandPass());		addPass(createAtomicExpandPass());

if (TM.getOptLevel() > CodeGenOpt::None) {		if (TM.getOptLevel() > CodeGenOpt::None) {
addPass(createAMDGPUPromoteAlloca());		addPass(createAMDGPUPromoteAlloca());

if (EnableSROA)
addPass(createSROAPass());
if (isPassEnabled(EnableScalarIRPasses))		if (isPassEnabled(EnableScalarIRPasses))
addStraightLineScalarOptimizationPasses();		addStraightLineScalarOptimizationPasses();

if (EnableAMDGPUAliasAnalysis) {		if (EnableAMDGPUAliasAnalysis) {
addPass(createAMDGPUAAWrapperPass());		addPass(createAMDGPUAAWrapperPass());
addPass(createExternalAAWrapperPass([](Pass &P, Function &,		addPass(createExternalAAWrapperPass([](Pass &P, Function &,
AAResults &AAR) {		AAResults &AAR) {
if (auto *WrapperPass = P.getAnalysisIfAvailable<AMDGPUAAWrapperPass>())		if (auto *WrapperPass = P.getAnalysisIfAvailable<AMDGPUAAWrapperPass>())
▲ Show 20 Lines • Show All 624 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -global-isel -stop-after=irtranslator -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -global-isel -stop-after=irtranslator -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
	; This is a copy of sibling-call.ll, but stops after the IRTranslator.			; This is a copy of sibling-call.ll, but stops after the IRTranslator.

	define fastcc i32 @i32_fastcc_i32_i32(i32 %arg0, i32 %arg1) #1 {			define fastcc i32 @i32_fastcc_i32_i32(i32 %arg0, i32 %arg1) #1 {
	; GCN-LABEL: name: i32_fastcc_i32_i32			; GCN-LABEL: name: i32_fastcc_i32_i32
	; GCN: bb.1 (%ir-block.0):			; GCN: bb.1 (%ir-block.0):
	; GCN-NEXT: liveins: $vgpr0, $vgpr1			; GCN-NEXT: liveins: $vgpr0, $vgpr1
	; GCN-NEXT: {{ $}}			; GCN-NEXT: {{ $}}
	; GCN-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0			; GCN-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
	▲ Show 20 Lines • Show All 1,506 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/captured-frame-index.ll

	; RUN: llc -mtriple=amdgcn-- -mcpu=tahiti -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-- -mcpu=tahiti -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}store_fi_lifetime:			; GCN-LABEL: {{^}}store_fi_lifetime:
	; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 4{{$}}			; GCN: v_mov_b32_e32 [[FI:v[0-9]+]], 4{{$}}
	; GCN: buffer_store_dword [[FI]]			; GCN: buffer_store_dword [[FI]]
	define amdgpu_kernel void @store_fi_lifetime(ptr addrspace(1) %out, i32 %in) #0 {			define amdgpu_kernel void @store_fi_lifetime(ptr addrspace(1) %out, i32 %in) #0 {
	entry:			entry:
	%b = alloca i8, addrspace(5)			%b = alloca i8, addrspace(5)
	call void @llvm.lifetime.start.p5(i64 1, ptr addrspace(5) %b)			call void @llvm.lifetime.start.p5(i64 1, ptr addrspace(5) %b)
	▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll

	; RUN: opt -S -codegenprepare -mtriple=amdgcn-unknown-unknown -mcpu=tahiti < %s \| FileCheck -check-prefix=OPT -check-prefix=OPT-SI -check-prefix=OPT-SICIVI %s			; RUN: opt -S -codegenprepare -mtriple=amdgcn-unknown-unknown -mcpu=tahiti < %s \| FileCheck -check-prefix=OPT -check-prefix=OPT-SI -check-prefix=OPT-SICIVI %s
	; RUN: opt -S -codegenprepare -mtriple=amdgcn-unknown-unknown -mcpu=bonaire < %s \| FileCheck -check-prefix=OPT -check-prefix=OPT-CI -check-prefix=OPT-SICIVI %s			; RUN: opt -S -codegenprepare -mtriple=amdgcn-unknown-unknown -mcpu=bonaire < %s \| FileCheck -check-prefix=OPT -check-prefix=OPT-CI -check-prefix=OPT-SICIVI %s
	; RUN: opt -S -codegenprepare -mtriple=amdgcn-unknown-unknown -mcpu=tonga -mattr=-flat-for-global < %s \| FileCheck -check-prefix=OPT -check-prefix=OPT-VI -check-prefix=OPT-SICIVI %s			; RUN: opt -S -codegenprepare -mtriple=amdgcn-unknown-unknown -mcpu=tonga -mattr=-flat-for-global < %s \| FileCheck -check-prefix=OPT -check-prefix=OPT-VI -check-prefix=OPT-SICIVI %s
	; RUN: opt -S -codegenprepare -mtriple=amdgcn-unknown-unknown -mcpu=gfx900 < %s \| FileCheck -check-prefix=OPT -check-prefix=OPT-GFX9 %s			; RUN: opt -S -codegenprepare -mtriple=amdgcn-unknown-unknown -mcpu=gfx900 < %s \| FileCheck -check-prefix=OPT -check-prefix=OPT-GFX9 %s
	; RUN: llc -march=amdgcn -mcpu=tahiti -mattr=-promote-alloca -amdgpu-scalarize-global-loads=false -amdgpu-sroa=0 < %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=SICIVI %s			; RUN: llc -march=amdgcn -mcpu=tahiti -mattr=-promote-alloca -amdgpu-scalarize-global-loads=false < %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=SICIVI %s
	; RUN: llc -march=amdgcn -mcpu=bonaire -mattr=-promote-alloca -amdgpu-scalarize-global-loads=false -amdgpu-sroa=0 < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=SICIVI %s			; RUN: llc -march=amdgcn -mcpu=bonaire -mattr=-promote-alloca -amdgpu-scalarize-global-loads=false < %s \| FileCheck -check-prefix=GCN -check-prefix=CI -check-prefix=SICIVI %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -amdgpu-scalarize-global-loads=false -mattr=-promote-alloca -amdgpu-sroa=0 < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=SICIVI %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -amdgpu-scalarize-global-loads=false -mattr=-promote-alloca < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=SICIVI %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-scalarize-global-loads=false -amdgpu-sroa=0 < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-scalarize-global-loads=false < %s \| FileCheck -check-prefix=GCN -check-prefix=GFX9 %s

	target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"			target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5"

	; OPT-LABEL: @test_sink_global_small_offset_i32(			; OPT-LABEL: @test_sink_global_small_offset_i32(
	; OPT-CI-NOT: getelementptr i32, ptr addrspace(1) %in			; OPT-CI-NOT: getelementptr i32, ptr addrspace(1) %in
	; OPT-VI: getelementptr i32, ptr addrspace(1) %in			; OPT-VI: getelementptr i32, ptr addrspace(1) %in
	; OPT: br i1			; OPT: br i1
	; OPT-CI: getelementptr i8,			; OPT-CI: getelementptr i8,
	▲ Show 20 Lines • Show All 735 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/extload-private.ll

	; RUN: llc -march=amdgcn -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}load_i8_sext_private:			; FUNC-LABEL: {{^}}load_i8_sext_private:
	; SI: buffer_load_sbyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}			; SI: buffer_load_sbyte v{{[0-9]+}}, off, s[{{[0-9]+:[0-9]+}}], 0 offset:4{{$}}
	define amdgpu_kernel void @load_i8_sext_private(ptr addrspace(1) %out) {			define amdgpu_kernel void @load_i8_sext_private(ptr addrspace(1) %out) {
	entry:			entry:
	%tmp0 = alloca i8, addrspace(5)			%tmp0 = alloca i8, addrspace(5)
	%tmp1 = load i8, ptr addrspace(5) %tmp0			%tmp1 = load i8, ptr addrspace(5) %tmp0
	%tmp2 = sext i8 %tmp1 to i32			%tmp2 = sext i8 %tmp1 to i32
	Show All 36 Lines

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI,MUBUF %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI,MUBUF %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-MUBUF,MUBUF %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-MUBUF,MUBUF %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca,+enable-flat-scratch -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-FLATSCR %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca,+enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-FLATSCR %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 < %s \| FileCheck --check-prefixes=GFX11 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1100 < %s \| FileCheck --check-prefixes=GFX11 %s

	; Test that non-entry function frame indices are expanded properly to			; Test that non-entry function frame indices are expanded properly to
	; give an index relative to the scratch wave offset register			; give an index relative to the scratch wave offset register

	; Materialize into a mov. Make sure there isn't an unnecessary copy.			; Materialize into a mov. Make sure there isn't an unnecessary copy.
	; GCN-LABEL: {{^}}func_mov_fi_i32:			; GCN-LABEL: {{^}}func_mov_fi_i32:
	; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/ipra.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -enable-ipra -amdgpu-sroa=0 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -enable-ipra < %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs -amdgpu-sroa=0 < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; Kernels are not called, so there is no call preserved mask.			; Kernels are not called, so there is no call preserved mask.
	; GCN-LABEL: {{^}}kernel:			; GCN-LABEL: {{^}}kernel:
	; GCN: flat_store_dword			; GCN: flat_store_dword
	define amdgpu_kernel void @kernel(ptr addrspace(1) %out) #0 {			define amdgpu_kernel void @kernel(ptr addrspace(1) %out) #0 {
	entry:			entry:
	store i32 0, ptr addrspace(1) %out			store i32 0, ptr addrspace(1) %out
	ret void			ret void
	▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: AMDGPU Attributor			; GCN-O1-NEXT: AMDGPU Attributor
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Cycle Info Analysis			; GCN-O1-NEXT: Cycle Info Analysis
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Infer address spaces			; GCN-O1-NEXT: Infer address spaces
	; GCN-O1-NEXT: Expand Atomic instructions			; GCN-O1-NEXT: Expand Atomic instructions
	; GCN-O1-NEXT: AMDGPU Promote Alloca			; GCN-O1-NEXT: AMDGPU Promote Alloca
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: SROA
	; GCN-O1-NEXT: Cycle Info Analysis			; GCN-O1-NEXT: Cycle Info Analysis
	; GCN-O1-NEXT: Uniformity Analysis			; GCN-O1-NEXT: Uniformity Analysis
	; GCN-O1-NEXT: AMDGPU IR optimizations			; GCN-O1-NEXT: AMDGPU IR optimizations
	; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Canonicalize natural loops			; GCN-O1-NEXT: Canonicalize natural loops
	; GCN-O1-NEXT: Scalar Evolution Analysis			; GCN-O1-NEXT: Scalar Evolution Analysis
	; GCN-O1-NEXT: Loop Pass Manager			; GCN-O1-NEXT: Loop Pass Manager
	▲ Show 20 Lines • Show All 255 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: AMDGPU Attributor			; GCN-O1-OPTS-NEXT: AMDGPU Attributor
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Cycle Info Analysis			; GCN-O1-OPTS-NEXT: Cycle Info Analysis
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Infer address spaces			; GCN-O1-OPTS-NEXT: Infer address spaces
	; GCN-O1-OPTS-NEXT: Expand Atomic instructions			; GCN-O1-OPTS-NEXT: Expand Atomic instructions
	; GCN-O1-OPTS-NEXT: AMDGPU Promote Alloca			; GCN-O1-OPTS-NEXT: AMDGPU Promote Alloca
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: SROA
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	; GCN-O1-OPTS-NEXT: Split GEPs to a variadic base and a constant offset for better CSE			; GCN-O1-OPTS-NEXT: Split GEPs to a variadic base and a constant offset for better CSE
	; GCN-O1-OPTS-NEXT: Scalar Evolution Analysis			; GCN-O1-OPTS-NEXT: Scalar Evolution Analysis
	; GCN-O1-OPTS-NEXT: Straight line strength reduction			; GCN-O1-OPTS-NEXT: Straight line strength reduction
	; GCN-O1-OPTS-NEXT: Early CSE			; GCN-O1-OPTS-NEXT: Early CSE
	; GCN-O1-OPTS-NEXT: Scalar Evolution Analysis			; GCN-O1-OPTS-NEXT: Scalar Evolution Analysis
	; GCN-O1-OPTS-NEXT: Nary reassociation			; GCN-O1-OPTS-NEXT: Nary reassociation
	; GCN-O1-OPTS-NEXT: Early CSE			; GCN-O1-OPTS-NEXT: Early CSE
	▲ Show 20 Lines • Show All 277 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: AMDGPU Attributor			; GCN-O2-NEXT: AMDGPU Attributor
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Cycle Info Analysis			; GCN-O2-NEXT: Cycle Info Analysis
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Infer address spaces			; GCN-O2-NEXT: Infer address spaces
	; GCN-O2-NEXT: Expand Atomic instructions			; GCN-O2-NEXT: Expand Atomic instructions
	; GCN-O2-NEXT: AMDGPU Promote Alloca			; GCN-O2-NEXT: AMDGPU Promote Alloca
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: SROA
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	; GCN-O2-NEXT: Split GEPs to a variadic base and a constant offset for better CSE			; GCN-O2-NEXT: Split GEPs to a variadic base and a constant offset for better CSE
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
	; GCN-O2-NEXT: Straight line strength reduction			; GCN-O2-NEXT: Straight line strength reduction
	; GCN-O2-NEXT: Early CSE			; GCN-O2-NEXT: Early CSE
	; GCN-O2-NEXT: Scalar Evolution Analysis			; GCN-O2-NEXT: Scalar Evolution Analysis
	; GCN-O2-NEXT: Nary reassociation			; GCN-O2-NEXT: Nary reassociation
	; GCN-O2-NEXT: Early CSE			; GCN-O2-NEXT: Early CSE
	▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: AMDGPU Attributor			; GCN-O3-NEXT: AMDGPU Attributor
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Cycle Info Analysis			; GCN-O3-NEXT: Cycle Info Analysis
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Infer address spaces			; GCN-O3-NEXT: Infer address spaces
	; GCN-O3-NEXT: Expand Atomic instructions			; GCN-O3-NEXT: Expand Atomic instructions
	; GCN-O3-NEXT: AMDGPU Promote Alloca			; GCN-O3-NEXT: AMDGPU Promote Alloca
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: SROA
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	; GCN-O3-NEXT: Split GEPs to a variadic base and a constant offset for better CSE			; GCN-O3-NEXT: Split GEPs to a variadic base and a constant offset for better CSE
	; GCN-O3-NEXT: Scalar Evolution Analysis			; GCN-O3-NEXT: Scalar Evolution Analysis
	; GCN-O3-NEXT: Straight line strength reduction			; GCN-O3-NEXT: Straight line strength reduction
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory Dependence Analysis			; GCN-O3-NEXT: Memory Dependence Analysis
	; GCN-O3-NEXT: Lazy Branch Probability Analysis			; GCN-O3-NEXT: Lazy Branch Probability Analysis
	▲ Show 20 Lines • Show All 270 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/load-hi16.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX900 %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX900 %s
; RUN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX906 %s		; RUN: llc -march=amdgcn -mcpu=gfx906 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX906 %s
; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX803 %s		; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX803 %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -mattr=+enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX900-FLATSCR %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -mattr=+enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX900-FLATSCR %s

define <2 x i16> @load_local_lo_hi_v2i16_multi_use_lo(ptr addrspace(3) noalias %in) #0 {		define <2 x i16> @load_local_lo_hi_v2i16_multi_use_lo(ptr addrspace(3) noalias %in) #0 {
; GFX900-LABEL: load_local_lo_hi_v2i16_multi_use_lo:		; GFX900-LABEL: load_local_lo_hi_v2i16_multi_use_lo:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: ds_read_u16 v2, v0		; GFX900-NEXT: ds_read_u16 v2, v0
; GFX900-NEXT: s_waitcnt lgkmcnt(0)		; GFX900-NEXT: s_waitcnt lgkmcnt(0)
; GFX900-NEXT: v_mov_b32_e32 v1, v2		; GFX900-NEXT: v_mov_b32_e32 v1, v2
▲ Show 20 Lines • Show All 2,686 Lines • ▼ Show 20 Lines	entry:
%load = load i16, ptr addrspace(3) %in		%load = load i16, ptr addrspace(3) %in
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1
store volatile i16 %reg, ptr addrspace(3) %in		store volatile i16 %reg, ptr addrspace(3) %in
ret <2 x i16> %build1		ret <2 x i16> %build1
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/load-lo16.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX900,GFX900-MUBUF %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GFX900,GFX900-MUBUF %s
	; RUN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca,+sram-ecc -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX906 %s			; RUN: llc -march=amdgcn -mcpu=gfx906 -mattr=-promote-alloca,+sram-ecc -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX906 %s
	; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX803 %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck --check-prefix=GFX803 %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs --mattr=+enable-flat-scratch < %s \| FileCheck -check-prefixes=GFX900,GFX900-FLATSCR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -verify-machineinstrs --mattr=+enable-flat-scratch < %s \| FileCheck -check-prefixes=GFX900,GFX900-FLATSCR %s

	define <2 x i16> @load_local_lo_v2i16_undeflo(ptr addrspace(3) %in) #0 {			define <2 x i16> @load_local_lo_v2i16_undeflo(ptr addrspace(3) %in) #0 {
	; GFX900-LABEL: load_local_lo_v2i16_undeflo:			; GFX900-LABEL: load_local_lo_v2i16_undeflo:
	; GFX900: ; %bb.0: ; %entry			; GFX900: ; %bb.0: ; %entry
	; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX900-NEXT: ds_read_u16_d16 v0, v0			; GFX900-NEXT: ds_read_u16_d16 v0, v0
	; GFX900-NEXT: s_waitcnt lgkmcnt(0)			; GFX900-NEXT: s_waitcnt lgkmcnt(0)
	; GFX900-NEXT: s_setpc_b64 s[30:31]			; GFX900-NEXT: s_setpc_b64 s[30:31]
	▲ Show 20 Lines • Show All 2,324 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/nested-calls.ll

	; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=hawaii -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s

	; Test calls when called by other callable functions rather than			; Test calls when called by other callable functions rather than
	; kernels.			; kernels.

	declare void @external_void_func_i32(i32) #0			declare void @external_void_func_i32(i32) #0

	; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:			; GCN-LABEL: {{^}}test_func_call_external_void_func_i32_imm:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/parallelandifcollapse.ll

	; RUN: llc -march=r600 -mcpu=redwood -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=r600 -mcpu=redwood -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck %s
	;			;
	; CFG flattening should use parallel-and mode to generate branch conditions and			; CFG flattening should use parallel-and mode to generate branch conditions and
	; then merge if-regions with the same bodies.			; then merge if-regions with the same bodies.
	;			;
	; CHECK: AND_INT			; CHECK: AND_INT
	; CHECK-NEXT: AND_INT			; CHECK-NEXT: AND_INT
	; CHECK-NEXT: OR_INT			; CHECK-NEXT: OR_INT

	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
; REMARK-NEXT: - ScratchSize: '0'		; REMARK-NEXT: - ScratchSize: '0'
; REMARK-NEXT: ..		; REMARK-NEXT: ..
; REMARK-NEXT: --- !Analysis		; REMARK-NEXT: --- !Analysis
; REMARK-NEXT: Pass: kernel-resource-usage		; REMARK-NEXT: Pass: kernel-resource-usage
; REMARK-NEXT: Name: DynamicStack		; REMARK-NEXT: Name: DynamicStack
; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }		; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
; REMARK-NEXT: Function: test_kernel		; REMARK-NEXT: Function: test_kernel
; REMARK-NEXT: Args:		; REMARK-NEXT: Args:
; REMARK-NEXT: - String: ' Dynamic Stack:		; REMARK-NEXT: - String: ' Dynamic Stack:
; REMARK-NEXT: - DynamicStack: 'False'		; REMARK-NEXT: - DynamicStack: 'False'
; REMARK-NEXT: ..		; REMARK-NEXT: ..
; REMARK-NEXT: --- !Analysis		; REMARK-NEXT: --- !Analysis
; REMARK-NEXT: Pass: kernel-resource-usage		; REMARK-NEXT: Pass: kernel-resource-usage
; REMARK-NEXT: Name: Occupancy		; REMARK-NEXT: Name: Occupancy
; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }		; REMARK-NEXT: DebugLoc: { File: foo.cl, Line: 27, Column: 0 }
; REMARK-NEXT: Function: test_kernel		; REMARK-NEXT: Function: test_kernel
; REMARK-NEXT: Args:		; REMARK-NEXT: Args:
; REMARK-NEXT: - String: ' Occupancy [waves/SIMD]: '		; REMARK-NEXT: - String: ' Occupancy [waves/SIMD]: '
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_indirect_call() !dbg !9 {
call void %fptr()		call void %fptr()
ret void		ret void
}		}

; STDERR: remark: foo.cl:74:0: Function Name: test_indirect_w_static_stack		; STDERR: remark: foo.cl:74:0: Function Name: test_indirect_w_static_stack
; STDERR-NEXT: remark: foo.cl:74:0: SGPRs: 39		; STDERR-NEXT: remark: foo.cl:74:0: SGPRs: 39
; STDERR-NEXT: remark: foo.cl:74:0: VGPRs: 32		; STDERR-NEXT: remark: foo.cl:74:0: VGPRs: 32
; STDERR-NEXT: remark: foo.cl:74:0: AGPRs: 10		; STDERR-NEXT: remark: foo.cl:74:0: AGPRs: 10
; STDERR-NEXT: remark: foo.cl:74:0: ScratchSize [bytes/lane]: 64		; STDERR-NEXT: remark: foo.cl:74:0: ScratchSize [bytes/lane]: 144
		Joe_NashUnsubmitted Not Done Reply Inline Actions This is a question for my curiosity. Presumably if the 80 bytes/lane for %alloca is now on the stack, shouldn't we expect some other value like VGPRs to go down by 80 bytes (-20 VGPRs)? Joe_Nash: This is a question for my curiosity. Presumably if the 80 bytes/lane for %alloca is now on the…
		arsenmUnsubmitted Not Done Reply Inline Actions Mechanically that's not really how it works. In this case the stack isn't actually used for anything other than filler content (it's kind of a bug this was optimized out to begin with, this memset probably should have been volatile) arsenm: Mechanically that's not really how it works. In this case the stack isn't actually used for…
; STDERR-NEXT: remark: foo.cl:74:0: Dynamic Stack: True		; STDERR-NEXT: remark: foo.cl:74:0: Dynamic Stack: True
; STDERR-NEXT: remark: foo.cl:74:0: Occupancy [waves/SIMD]: 8		; STDERR-NEXT: remark: foo.cl:74:0: Occupancy [waves/SIMD]: 8
; STDERR-NEXT: remark: foo.cl:74:0: SGPRs Spill: 0		; STDERR-NEXT: remark: foo.cl:74:0: SGPRs Spill: 0
; STDERR-NEXT: remark: foo.cl:74:0: VGPRs Spill: 0		; STDERR-NEXT: remark: foo.cl:74:0: VGPRs Spill: 0
; STDERR-NEXT: remark: foo.cl:74:0: LDS Size [bytes/block]: 0		; STDERR-NEXT: remark: foo.cl:74:0: LDS Size [bytes/block]: 0

declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture readonly, i8, i64, i1 immarg)		declare void @llvm.memset.p5.i64(ptr addrspace(5) nocapture readonly, i8, i64, i1 immarg)

define amdgpu_kernel void @test_indirect_w_static_stack() !dbg !10 {		define amdgpu_kernel void @test_indirect_w_static_stack() !dbg !10 {
%alloca = alloca <10 x i64>, align 16, addrspace(5)		%alloca = alloca <10 x i64>, align 16, addrspace(5)
call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 40, i1 false)		call void @llvm.memset.p5.i64(ptr addrspace(5) %alloca, i8 0, i64 40, i1 false)
%fptr = load ptr, ptr addrspace(4) @gv.fptr0		%fptr = load ptr, ptr addrspace(4) @gv.fptr0
call void %fptr()		call void %fptr()
ret void		ret void
}		}

Show All 16 Lines

llvm/test/CodeGen/AMDGPU/sibling-call.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -mattr=-flat-for-global -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -mattr=-flat-for-global -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-flat-for-global -enable-ipra=0 -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-flat-for-global -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s
	target datalayout = "A5"			target datalayout = "A5"

	; FIXME: Why is this commuted only sometimes?			; FIXME: Why is this commuted only sometimes?
	; GCN-LABEL: {{^}}i32_fastcc_i32_i32:			; GCN-LABEL: {{^}}i32_fastcc_i32_i32:
	; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v0, v1			; CIVI-NEXT: v_add_{{i\|u}}32_e32 v0, vcc, v0, v1
	; GFX9-NEXT: v_add_u32_e32 v0, v0, v1			; GFX9-NEXT: v_add_u32_e32 v0, v0, v1
	; GCN-NEXT: s_setpc_b64			; GCN-NEXT: s_setpc_b64
	▲ Show 20 Lines • Show All 461 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/store-hi16.ll

	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX9,GFX9-MUBUF %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX9,GFX9-MUBUF %s
	; RxN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca,+sram-ecc -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX9 %s			; RxN: llc -march=amdgcn -mcpu=gfx906 -mattr=-promote-alloca,+sram-ecc -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX9 %s
	; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX803,NO-D16-HI %s			; RUN: llc -march=amdgcn -mcpu=fiji -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX803,NO-D16-HI %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -mattr=+enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX9,GFX9-FLATSCR %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -mattr=+enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX9,GFX9-FLATSCR %s

	; GCN-LABEL: {{^}}store_global_hi_v2i16:			; GCN-LABEL: {{^}}store_global_hi_v2i16:
	; GCN: s_waitcnt			; GCN: s_waitcnt

	; GFX9-NEXT: global_store_short_d16_hi v[0:1], v2, off			; GFX9-NEXT: global_store_short_d16_hi v[0:1], v2, off

	; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v2, 16, v2			; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
	; GFX803-NEXT: flat_store_short v[0:1], v2			; GFX803-NEXT: flat_store_short v[0:1], v2
	▲ Show 20 Lines • Show All 647 Lines • Show Last 20 Lines

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_asm.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck %s

	define i64 @i64_test(i64 %i) nounwind readnone {			define i64 @i64_test(i64 %i) nounwind readnone {
	%loc = alloca i64			%loc = alloca i64, addrspace(5)
	%j = load i64, i64 * %loc			%j = load i64, ptr addrspace(5) %loc
	%r = add i64 %i, %j			%r = add i64 %i, %j
	ret i64 %r			ret i64 %r
	}			}

	define i64 @i32_test(i32 %i) nounwind readnone {			define i64 @i32_test(i32 %i) nounwind readnone {
	%loc = alloca i32			%loc = alloca i32, addrspace(5)
	%j = load i32, i32 * %loc			%j = load i32, ptr addrspace(5) %loc
	%r = add i32 %i, %j			%r = add i32 %i, %j
	%ext = zext i32 %r to i64			%ext = zext i32 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i16_test(i16 %i) nounwind readnone {			define i64 @i16_test(i16 %i) nounwind readnone {
	%loc = alloca i16			%loc = alloca i16, addrspace(5)
	%j = load i16, i16 * %loc			%j = load i16, ptr addrspace(5) %loc
	%r = add i16 %i, %j			%r = add i16 %i, %j
	%ext = zext i16 %r to i64			%ext = zext i16 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i8_test(i8 %i) nounwind readnone {			define i64 @i8_test(i8 %i) nounwind readnone {
	%loc = alloca i8			%loc = alloca i8, addrspace(5)
	%j = load i8, i8 * %loc			%j = load i8, ptr addrspace(5) %loc
	%r = add i8 %i, %j			%r = add i8 %i, %j
	%ext = zext i8 %r to i64			%ext = zext i8 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_asm.ll.expected

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck %s

	define i64 @i64_test(i64 %i) nounwind readnone {			define i64 @i64_test(i64 %i) nounwind readnone {
	; CHECK-LABEL: i64_test:			; CHECK-LABEL: i64_test:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: buffer_load_dword v2, off, s[0:3], s32
				; CHECK-NEXT: buffer_load_dword v3, off, s[0:3], s32 offset:4
				; CHECK-NEXT: s_waitcnt vmcnt(1)
				; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v2
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_addc_u32_e32 v1, vcc, v1, v3, vcc
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%loc = alloca i64			%loc = alloca i64, addrspace(5)
	%j = load i64, i64 * %loc			%j = load i64, ptr addrspace(5) %loc
	%r = add i64 %i, %j			%r = add i64 %i, %j
	ret i64 %r			ret i64 %r
	}			}

	define i64 @i32_test(i32 %i) nounwind readnone {			define i64 @i32_test(i32 %i) nounwind readnone {
	; CHECK-LABEL: i32_test:			; CHECK-LABEL: i32_test:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: buffer_load_dword v1, off, s[0:3], s32
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%loc = alloca i32			%loc = alloca i32, addrspace(5)
	%j = load i32, i32 * %loc			%j = load i32, ptr addrspace(5) %loc
	%r = add i32 %i, %j			%r = add i32 %i, %j
	%ext = zext i32 %r to i64			%ext = zext i32 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i16_test(i16 %i) nounwind readnone {			define i64 @i16_test(i16 %i) nounwind readnone {
	; CHECK-LABEL: i16_test:			; CHECK-LABEL: i16_test:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: buffer_load_ushort v1, off, s[0:3], s32
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1
				; CHECK-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%loc = alloca i16			%loc = alloca i16, addrspace(5)
	%j = load i16, i16 * %loc			%j = load i16, ptr addrspace(5) %loc
	%r = add i16 %i, %j			%r = add i16 %i, %j
	%ext = zext i16 %r to i64			%ext = zext i16 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i8_test(i8 %i) nounwind readnone {			define i64 @i8_test(i8 %i) nounwind readnone {
	; CHECK-LABEL: i8_test:			; CHECK-LABEL: i8_test:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: buffer_load_ubyte v1, off, s[0:3], s32
				; CHECK-NEXT: s_waitcnt vmcnt(0)
				; CHECK-NEXT: v_add_i32_e32 v0, vcc, v0, v1
				; CHECK-NEXT: v_and_b32_e32 v0, 0xff, v0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%loc = alloca i8			%loc = alloca i8, addrspace(5)
	%j = load i8, i8 * %loc			%j = load i8, ptr addrspace(5) %loc
	%r = add i8 %i, %j			%r = add i8 %i, %j
	%ext = zext i8 %r to i64			%ext = zext i8 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected

	Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines

	attributes #0 = { noredzone nounwind ssp uwtable "frame-pointer"="all" }			attributes #0 = { noredzone nounwind ssp uwtable "frame-pointer"="all" }
	; CHECK-LABEL: check_boundaries:			; CHECK-LABEL: check_boundaries:
	; CHECK: check_boundaries$local:			; CHECK: check_boundaries$local:
	; CHECK-NEXT: .type check_boundaries$local,@function			; CHECK-NEXT: .type check_boundaries$local,@function
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s4, s33			; CHECK-NEXT: s_mov_b32 s8, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
				; CHECK-NEXT: s_addk_i32 s32, 0x600
				; CHECK-NEXT: v_mov_b32_e32 v4, 0
				; CHECK-NEXT: v_mov_b32_e32 v0, 1
				; CHECK-NEXT: v_mov_b32_e32 v1, 2
				; CHECK-NEXT: v_mov_b32_e32 v2, 3
				; CHECK-NEXT: v_mov_b32_e32 v3, 4
				; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s33
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
				; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:12
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:16
				; CHECK-NEXT: s_mov_b64 s[4:5], 0
				; CHECK-NEXT: s_and_saveexec_b64 s[6:7], s[4:5]
				; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[6:7]
				; CHECK-NEXT: s_cbranch_execz .LBB0_2
				; CHECK-NEXT: ; %bb.1:
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
				; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:12
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:16
				; CHECK-NEXT: .LBB0_2: ; %Flow
				; CHECK-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; CHECK-NEXT: s_cbranch_execz .LBB0_4
				; CHECK-NEXT: ; %bb.3:
				; CHECK-NEXT: v_mov_b32_e32 v0, 1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:12
				; CHECK-NEXT: .LBB0_4:
				; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_addk_i32 s32, 0xfa00
				; CHECK-NEXT: s_mov_b32 s33, s8
				; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	;			;
	; CHECK-LABEL: main:			; CHECK-LABEL: main:
	; CHECK: main$local:			; CHECK: main$local:
	; CHECK-NEXT: .type main$local,@function			; CHECK-NEXT: .type main$local,@function
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s6, s33			; CHECK-NEXT: s_mov_b32 s6, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
				; CHECK-NEXT: s_addk_i32 s32, 0x600
				; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, x@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, x@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, x@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, x@rel32@hi+12
	; CHECK-NEXT: v_mov_b32_e32 v2, 1			; CHECK-NEXT: v_mov_b32_e32 v2, 1
				; CHECK-NEXT: v_mov_b32_e32 v3, 2
				; CHECK-NEXT: v_mov_b32_e32 v4, 3
				; CHECK-NEXT: v_mov_b32_e32 v5, 4
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:8
				; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:12
				; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:16
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: v_mov_b32_e32 v0, s4
	; CHECK-NEXT: v_mov_b32_e32 v1, s5			; CHECK-NEXT: v_mov_b32_e32 v1, s5
	; CHECK-NEXT: flat_store_dword v[0:1], v2			; CHECK-NEXT: flat_store_dword v[0:1], v2
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:8
				; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:12
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
				; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:16
				; CHECK-NEXT: s_addk_i32 s32, 0xfa00
	; CHECK-NEXT: s_mov_b32 s33, s6			; CHECK-NEXT: s_mov_b32 s33, s6
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.nogenerated.expected

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -enable-machine-outliner -mtriple=amdgcn-adm-amdhsa < %s \| FileCheck %s			; RUN: llc -enable-machine-outliner -mtriple=amdgcn-adm-amdhsa < %s \| FileCheck %s

	; NOTE: Machine outliner doesn't run.			; NOTE: Machine outliner doesn't run.
	@x = dso_local global i32 0, align 4			@x = dso_local global i32 0, align 4

	define dso_local i32 @check_boundaries() #0 {			define dso_local i32 @check_boundaries() #0 {
	; CHECK-LABEL: check_boundaries:			; CHECK-LABEL: check_boundaries:
	; CHECK: check_boundaries$local:			; CHECK: check_boundaries$local:
	; CHECK-NEXT: .type check_boundaries$local,@function			; CHECK-NEXT: .type check_boundaries$local,@function
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s4, s33			; CHECK-NEXT: s_mov_b32 s8, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
				; CHECK-NEXT: s_addk_i32 s32, 0x600
				; CHECK-NEXT: v_mov_b32_e32 v4, 0
				; CHECK-NEXT: v_mov_b32_e32 v0, 1
				; CHECK-NEXT: v_mov_b32_e32 v1, 2
				; CHECK-NEXT: v_mov_b32_e32 v2, 3
				; CHECK-NEXT: v_mov_b32_e32 v3, 4
				; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s33
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
				; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:12
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:16
				; CHECK-NEXT: s_mov_b64 s[4:5], 0
				; CHECK-NEXT: s_and_saveexec_b64 s[6:7], s[4:5]
				; CHECK-NEXT: s_xor_b64 s[4:5], exec, s[6:7]
				; CHECK-NEXT: s_cbranch_execz .LBB0_2
				; CHECK-NEXT: ; %bb.1:
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4
				; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s33 offset:8
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:12
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:16
				; CHECK-NEXT: .LBB0_2: ; %Flow
				; CHECK-NEXT: s_andn2_saveexec_b64 s[4:5], s[4:5]
				; CHECK-NEXT: s_cbranch_execz .LBB0_4
				; CHECK-NEXT: ; %bb.3:
				; CHECK-NEXT: v_mov_b32_e32 v0, 1
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:12
				; CHECK-NEXT: .LBB0_4:
				; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: s_mov_b32 s33, s4			; CHECK-NEXT: s_addk_i32 s32, 0xfa00
				; CHECK-NEXT: s_mov_b32 s33, s8
				; CHECK-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%1 = alloca i32, align 4, addrspace(5)			%1 = alloca i32, align 4, addrspace(5)
	%2 = alloca i32, align 4, addrspace(5)			%2 = alloca i32, align 4, addrspace(5)
	%3 = alloca i32, align 4, addrspace(5)			%3 = alloca i32, align 4, addrspace(5)
	%4 = alloca i32, align 4, addrspace(5)			%4 = alloca i32, align 4, addrspace(5)
	%5 = alloca i32, align 4, addrspace(5)			%5 = alloca i32, align 4, addrspace(5)
	store i32 0, i32 addrspace(5)* %1, align 4			store i32 0, i32 addrspace(5)* %1, align 4
	store i32 0, i32 addrspace(5)* %2, align 4			store i32 0, i32 addrspace(5)* %2, align 4
	Show All 30 Lines
	; CHECK-LABEL: main:			; CHECK-LABEL: main:
	; CHECK: main$local:			; CHECK: main$local:
	; CHECK-NEXT: .type main$local,@function			; CHECK-NEXT: .type main$local,@function
	; CHECK-NEXT: .cfi_startproc			; CHECK-NEXT: .cfi_startproc
	; CHECK-NEXT: ; %bb.0:			; CHECK-NEXT: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_mov_b32 s6, s33			; CHECK-NEXT: s_mov_b32 s6, s33
	; CHECK-NEXT: s_mov_b32 s33, s32			; CHECK-NEXT: s_mov_b32 s33, s32
				; CHECK-NEXT: s_addk_i32 s32, 0x600
				; CHECK-NEXT: v_mov_b32_e32 v0, 0
	; CHECK-NEXT: s_getpc_b64 s[4:5]			; CHECK-NEXT: s_getpc_b64 s[4:5]
	; CHECK-NEXT: s_add_u32 s4, s4, x@rel32@lo+4			; CHECK-NEXT: s_add_u32 s4, s4, x@rel32@lo+4
	; CHECK-NEXT: s_addc_u32 s5, s5, x@rel32@hi+12			; CHECK-NEXT: s_addc_u32 s5, s5, x@rel32@hi+12
	; CHECK-NEXT: v_mov_b32_e32 v2, 1			; CHECK-NEXT: v_mov_b32_e32 v2, 1
				; CHECK-NEXT: v_mov_b32_e32 v3, 2
				; CHECK-NEXT: v_mov_b32_e32 v4, 3
				; CHECK-NEXT: v_mov_b32_e32 v5, 4
				; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s33
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:8
				; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:12
				; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:16
	; CHECK-NEXT: v_mov_b32_e32 v0, s4			; CHECK-NEXT: v_mov_b32_e32 v0, s4
	; CHECK-NEXT: v_mov_b32_e32 v1, s5			; CHECK-NEXT: v_mov_b32_e32 v1, s5
	; CHECK-NEXT: flat_store_dword v[0:1], v2			; CHECK-NEXT: flat_store_dword v[0:1], v2
	; CHECK-NEXT: ;;#ASMSTART			; CHECK-NEXT: ;;#ASMSTART
	; CHECK-NEXT: ;;#ASMEND			; CHECK-NEXT: ;;#ASMEND
				; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s33 offset:4
				; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s33 offset:8
				; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s33 offset:12
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_mov_b32_e32 v0, 0
				; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s33 offset:16
				; CHECK-NEXT: s_addk_i32 s32, 0xfa00
	; CHECK-NEXT: s_mov_b32 s33, s6			; CHECK-NEXT: s_mov_b32 s33, s6
	; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%1 = alloca i32, align 4, addrspace(5)			%1 = alloca i32, align 4, addrspace(5)
	%2 = alloca i32, align 4, addrspace(5)			%2 = alloca i32, align 4, addrspace(5)
	%3 = alloca i32, align 4, addrspace(5)			%3 = alloca i32, align 4, addrspace(5)
	%4 = alloca i32, align 4, addrspace(5)			%4 = alloca i32, align 4, addrspace(5)
	%5 = alloca i32, align 4, addrspace(5)			%5 = alloca i32, align 4, addrspace(5)
	Show All 17 Lines

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_isel.ll.expected

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 \| FileCheck %s

	define i64 @i64_test(i64 %i) nounwind readnone {			define i64 @i64_test(i64 %i) nounwind readnone {
	; CHECK-LABEL: i64_test:			; CHECK-LABEL: i64_test:
	; CHECK: SelectionDAG has 9 nodes:			; CHECK: SelectionDAG has 25 nodes:
	; CHECK-NEXT: t0: ch,glue = EntryToken			; CHECK-NEXT: t0: ch,glue = EntryToken
	; CHECK-NEXT: t11: ch,glue = CopyToReg t0, Register:i32 $vgpr0, IMPLICIT_DEF:i32			; CHECK-NEXT: t2: i32,ch = CopyFromReg # D:1 t0, Register:i32 %0
	; CHECK-NEXT: t17: i32 = V_MOV_B32_e32 TargetConstant:i32<0>			; CHECK-NEXT: t4: i32,ch = CopyFromReg # D:1 t0, Register:i32 %1
	; CHECK-NEXT: t13: ch,glue = CopyToReg t11, Register:i32 $vgpr1, t17, t11:1			; CHECK-NEXT: t49: i64 = REG_SEQUENCE # D:1 TargetConstant:i32<53>, t2, TargetConstant:i32<3>, t4, TargetConstant:i32<11>
	; CHECK-NEXT: t14: ch = SI_RETURN Register:i32 $vgpr0, Register:i32 $vgpr1, t13, t13:1			; CHECK-NEXT: t26: i32,ch = BUFFER_LOAD_DWORD_OFFEN<Mem:(dereferenceable load (s32) from %ir.loc, align 8, addrspace 5)> TargetFrameIndex:i32<0>, Register:v4i32 $sgpr0_sgpr1_sgpr2_sgpr3, TargetConstant:i32<0>, TargetConstant:i32<0>, TargetConstant:i32<0>, TargetConstant:i1<0>, t0
				; CHECK-NEXT: t29: i32,ch = BUFFER_LOAD_DWORD_OFFEN<Mem:(dereferenceable load (s32) from %ir.loc + 4, basealign 8, addrspace 5)> TargetFrameIndex:i32<0>, Register:v4i32 $sgpr0_sgpr1_sgpr2_sgpr3, TargetConstant:i32<0>, TargetConstant:i32<4>, TargetConstant:i32<0>, TargetConstant:i1<0>, t0
				; CHECK-NEXT: t32: v2i32 = REG_SEQUENCE # D:1 TargetConstant:i32<53>, t26, TargetConstant:i32<3>, t29, TargetConstant:i32<11>
				; CHECK-NEXT: t10: i64 = V_ADD_U64_PSEUDO # D:1 t49, t32
				; CHECK-NEXT: t23: i32 = EXTRACT_SUBREG # D:1 t10, TargetConstant:i32<3>
				; CHECK-NEXT: t16: ch,glue = CopyToReg # D:1 t0, Register:i32 $vgpr0, t23
				; CHECK-NEXT: t38: i32 = EXTRACT_SUBREG # D:1 t10, TargetConstant:i32<11>
				; CHECK-NEXT: t18: ch,glue = CopyToReg # D:1 t16, Register:i32 $vgpr1, t38, t16:1
				; CHECK-NEXT: t19: ch = SI_RETURN # D:1 Register:i32 $vgpr0, Register:i32 $vgpr1, t18, t18:1
	; CHECK-EMPTY:			; CHECK-EMPTY:
	%loc = alloca i64, addrspace(5)			%loc = alloca i64, addrspace(5)
	%j = load i64, ptr addrspace(5) %loc			%j = load i64, ptr addrspace(5) %loc
	%r = add i64 %i, %j			%r = add i64 %i, %j
	ret i64 %r			ret i64 %r
	}			}

	define i64 @i32_test(i32 %i) nounwind readnone {			define i64 @i32_test(i32 %i) nounwind readnone {
	; CHECK-LABEL: i32_test:			; CHECK-LABEL: i32_test:
	; CHECK: SelectionDAG has 8 nodes:			; CHECK: SelectionDAG has 15 nodes:
	; CHECK-NEXT: t5: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
	; CHECK-NEXT: t0: ch,glue = EntryToken			; CHECK-NEXT: t0: ch,glue = EntryToken
	; CHECK-NEXT: t7: ch,glue = CopyToReg t0, Register:i32 $vgpr0, t5			; CHECK-NEXT: t2: i32,ch = CopyFromReg # D:1 t0, Register:i32 %0
	; CHECK-NEXT: t9: ch,glue = CopyToReg t7, Register:i32 $vgpr1, t5, t7:1			; CHECK-NEXT: t6: i32,ch = BUFFER_LOAD_DWORD_OFFEN<Mem:(dereferenceable load (s32) from %ir.loc, addrspace 5)> TargetFrameIndex:i32<0>, Register:v4i32 $sgpr0_sgpr1_sgpr2_sgpr3, TargetConstant:i32<0>, TargetConstant:i32<0>, TargetConstant:i32<0>, TargetConstant:i1<0>, t0
	; CHECK-NEXT: t10: ch = SI_RETURN Register:i32 $vgpr0, Register:i32 $vgpr1, t9, t9:1			; CHECK-NEXT: t7: i32,i1 = V_ADD_CO_U32_e64 # D:1 t2, t6, TargetConstant:i1<0>
				; CHECK-NEXT: t14: ch,glue = CopyToReg # D:1 t0, Register:i32 $vgpr0, t7
				; CHECK-NEXT: t22: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
				; CHECK-NEXT: t16: ch,glue = CopyToReg # D:1 t14, Register:i32 $vgpr1, t22, t14:1
				; CHECK-NEXT: t17: ch = SI_RETURN # D:1 Register:i32 $vgpr0, Register:i32 $vgpr1, t16, t16:1
	; CHECK-EMPTY:			; CHECK-EMPTY:
	%loc = alloca i32, addrspace(5)			%loc = alloca i32, addrspace(5)
	%j = load i32, ptr addrspace(5) %loc			%j = load i32, ptr addrspace(5) %loc
	%r = add i32 %i, %j			%r = add i32 %i, %j
	%ext = zext i32 %r to i64			%ext = zext i32 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i16_test(i16 %i) nounwind readnone {			define i64 @i16_test(i16 %i) nounwind readnone {
	; CHECK-LABEL: i16_test:			; CHECK-LABEL: i16_test:
	; CHECK: SelectionDAG has 8 nodes:			; CHECK: SelectionDAG has 18 nodes:
	; CHECK-NEXT: t5: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
	; CHECK-NEXT: t0: ch,glue = EntryToken			; CHECK-NEXT: t0: ch,glue = EntryToken
	; CHECK-NEXT: t7: ch,glue = CopyToReg t0, Register:i32 $vgpr0, t5			; CHECK-NEXT: t2: i32,ch = CopyFromReg # D:1 t0, Register:i32 %0
	; CHECK-NEXT: t9: ch,glue = CopyToReg t7, Register:i32 $vgpr1, t5, t7:1			; CHECK-NEXT: t19: i32,ch = BUFFER_LOAD_USHORT_OFFEN<Mem:(dereferenceable load (s16) from %ir.loc, addrspace 5)> TargetFrameIndex:i32<0>, Register:v4i32 $sgpr0_sgpr1_sgpr2_sgpr3, TargetConstant:i32<0>, TargetConstant:i32<0>, TargetConstant:i32<0>, TargetConstant:i1<0>, t0
	; CHECK-NEXT: t10: ch = SI_RETURN Register:i32 $vgpr0, Register:i32 $vgpr1, t9, t9:1			; CHECK-NEXT: t20: i32,i1 = V_ADD_CO_U32_e64 # D:1 t2, t19, TargetConstant:i1<0>
				; CHECK-NEXT: t24: i32 = S_MOV_B32 TargetConstant:i32<65535>
				; CHECK-NEXT: t25: i32 = V_AND_B32_e64 # D:1 t20, t24
				; CHECK-NEXT: t15: ch,glue = CopyToReg # D:1 t0, Register:i32 $vgpr0, t25
				; CHECK-NEXT: t31: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
				; CHECK-NEXT: t17: ch,glue = CopyToReg # D:1 t15, Register:i32 $vgpr1, t31, t15:1
				; CHECK-NEXT: t18: ch = SI_RETURN # D:1 Register:i32 $vgpr0, Register:i32 $vgpr1, t17, t17:1
	; CHECK-EMPTY:			; CHECK-EMPTY:
	%loc = alloca i16, addrspace(5)			%loc = alloca i16, addrspace(5)
	%j = load i16, ptr addrspace(5) %loc			%j = load i16, ptr addrspace(5) %loc
	%r = add i16 %i, %j			%r = add i16 %i, %j
	%ext = zext i16 %r to i64			%ext = zext i16 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i8_test(i8 %i) nounwind readnone {			define i64 @i8_test(i8 %i) nounwind readnone {
	; CHECK-LABEL: i8_test:			; CHECK-LABEL: i8_test:
	; CHECK: SelectionDAG has 8 nodes:			; CHECK: SelectionDAG has 18 nodes:
	; CHECK-NEXT: t5: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
	; CHECK-NEXT: t0: ch,glue = EntryToken			; CHECK-NEXT: t0: ch,glue = EntryToken
	; CHECK-NEXT: t7: ch,glue = CopyToReg t0, Register:i32 $vgpr0, t5			; CHECK-NEXT: t2: i32,ch = CopyFromReg # D:1 t0, Register:i32 %0
	; CHECK-NEXT: t9: ch,glue = CopyToReg t7, Register:i32 $vgpr1, t5, t7:1			; CHECK-NEXT: t19: i32,ch = BUFFER_LOAD_UBYTE_OFFEN<Mem:(dereferenceable load (s8) from %ir.loc, addrspace 5)> TargetFrameIndex:i32<0>, Register:v4i32 $sgpr0_sgpr1_sgpr2_sgpr3, TargetConstant:i32<0>, TargetConstant:i32<0>, TargetConstant:i32<0>, TargetConstant:i1<0>, t0
	; CHECK-NEXT: t10: ch = SI_RETURN Register:i32 $vgpr0, Register:i32 $vgpr1, t9, t9:1			; CHECK-NEXT: t20: i32,i1 = V_ADD_CO_U32_e64 # D:1 t2, t19, TargetConstant:i1<0>
				; CHECK-NEXT: t24: i32 = S_MOV_B32 TargetConstant:i32<255>
				; CHECK-NEXT: t25: i32 = V_AND_B32_e64 # D:1 t20, t24
				; CHECK-NEXT: t15: ch,glue = CopyToReg # D:1 t0, Register:i32 $vgpr0, t25
				; CHECK-NEXT: t31: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
				; CHECK-NEXT: t17: ch,glue = CopyToReg # D:1 t15, Register:i32 $vgpr1, t31, t15:1
				; CHECK-NEXT: t18: ch = SI_RETURN # D:1 Register:i32 $vgpr0, Register:i32 $vgpr1, t17, t17:1
	; CHECK-EMPTY:			; CHECK-EMPTY:
	%loc = alloca i8, addrspace(5)			%loc = alloca i8, addrspace(5)
	%j = load i8, ptr addrspace(5) %loc			%j = load i8, ptr addrspace(5) %loc
	%r = add i8 %i, %j			%r = add i8 %i, %j
	%ext = zext i8 %r to i64			%ext = zext i8 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove post-PromoteAlloca SROA runClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 549255

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll

llvm/test/CodeGen/AMDGPU/captured-frame-index.ll

llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll

llvm/test/CodeGen/AMDGPU/extload-private.ll

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

llvm/test/CodeGen/AMDGPU/ipra.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/AMDGPU/load-hi16.ll

llvm/test/CodeGen/AMDGPU/load-lo16.ll

llvm/test/CodeGen/AMDGPU/nested-calls.ll

llvm/test/CodeGen/AMDGPU/parallelandifcollapse.ll

llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll

llvm/test/CodeGen/AMDGPU/sibling-call.ll

llvm/test/CodeGen/AMDGPU/store-hi16.ll

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_asm.ll

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_asm.ll.expected

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.nogenerated.expected

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_isel.ll.expected

[AMDGPU] Remove post-PromoteAlloca SROA run
ClosedPublic