This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPU.td
2/2
AMDGPUISelDAGToDAG.cpp
-
AMDGPUSubtarget.h
-
AMDGPUSubtarget.cpp
-
BUFInstructions.td
1/1
FLATInstructions.td
-
SIFoldOperands.cpp
4/4
SIFrameLowering.cpp
-
SIISelLowering.cpp
-
SIInstrInfo.h
-
SIInstrInfo.td
-
SIMachineFunctionInfo.cpp
-
SIRegisterInfo.h
4/5
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
call-preserved-registers.ll
-
callee-frame-setup.ll
-
chain-hi-to-lo.ll
-
fast-unaligned-load-store.private.ll
-
flat-scratch.ll
-
frame-index-elimination.ll
-
load-hi16.ll
-
load-lo16.ll
-
local-stack-alloc-block-sp-reference.ll
-
memcpy-fixed-align.ll
-
multi-dword-vgpr-spill.ll
-
non-entry-alloca.ll
-
pei-scavenge-sgpr-gfx9.mir
-
pei-scavenge-vgpr-spill.mir
-
scratch-simple.ll
-
sgpr-spill.mir
-
spill-scavenge-offset.ll
-
stack-pointer-offset-relative-frameindex.ll
-
store-hi16.ll

Differential D89170

[AMDGPU] Use flat scratch instructions where available
ClosedPublic

Authored by rampitec on Oct 9 2020, 4:20 PM.

Download Raw Diff

Details

Reviewers

arsenm
sebastian-ne
Flakebi

Commits

rG038d884a50a4: [AMDGPU] Use flat scratch instructions where available

Summary

The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

GlobalISel;
Some optimizations in frame elimination in between vector and scalar ALU;
It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet;
Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF;
Operand folding cannot optimize FI like with MUBUF yet;
It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address;

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Oct 9 2020, 4:20 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 9 2020, 4:20 PM

Herald added subscribers: kerbowa, arphaman, hiraditya and 7 others. · View Herald Transcript

rampitec requested review of this revision.Oct 9 2020, 4:20 PM

Herald added a subscriber: wdng. · View Herald TranscriptOct 9 2020, 4:20 PM

I haven't done a meaningful review, but I wanted to note that this will require changes to the debug information (which isn't committed yet). I think this could be as simple as scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address.

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
430	s/Scracth/Scratch/

In D89170#2325299, @scott.linder wrote:

I haven't done a meaningful review, but I wanted to note that this will require changes to the debug information (which isn't committed yet). I think this could be as simple as scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address.

Thanks! I have fixed the typo and added DWARF update to the commit message.

In D89170#2325299, @scott.linder wrote:

I haven't done a meaningful review, but I wanted to note that this will require changes to the debug information (which isn't committed yet). I think this could be as simple as scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address.

Is this change changing the call convention ABI? For example, making the SP be a swizzled address as opposed to a swizzled address? If so then AMDGPUUsage will also need updating.

In D89170#2325862, @t-tye wrote:

In D89170#2325299, @scott.linder wrote:

I haven't done a meaningful review, but I wanted to note that this will require changes to the debug information (which isn't committed yet). I think this could be as simple as scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address.

Is this change changing the call convention ABI? For example, making the SP be a swizzled address as opposed to a swizzled address? If so then AMDGPUUsage will also need updating.

Yes it does. However, it is a little premature to update the documentation. This is WIP, disabled by default and more or less not working at least until spilling is implemented. When it is at least working we can consider documenting it. Documenting it earlier just gives an impression there is an option to use it.

Fixed typo in test check.

rampitec added a child revision: D89424: [AMDGPU] Spilling using flat scratch.Oct 14 2020, 1:35 PM

Testing showed couple problems:

Debug tablegen asserts with this.
Using null register in flat scratch does not work, but it needs a new ST addressing mode of GFX10. I will create a separate patch to support ST mode.

Fixed operand order in store pattern.

Still needs ST mode.

This will change the ABI, so I don't think belongs as a subtarget property

In D89170#2332955, @arsenm wrote:

This will change the ABI, so I don't think belongs as a subtarget property

The ABI will in fact depend on the subtarget. We can only use it starting from gfx9, and then even on gfx9 it might not be desirable. GFX10 is better in this respect.
Anyway, I need a subtarget to decide if we even have flat scratch instructions. So far this switch is experimental, but if you have an idea of a better placement please tell.

Use ST mode on GFX10 instead of NULL register.

rampitec added a parent revision: D89501: [AMDGPU] flat scratch ST addressing mode on gfx10.Oct 15 2020, 4:26 PM

Flakebi added a subscriber: Flakebi.Oct 16 2020, 3:57 AM

Flakebi added inline comments.

llvm/lib/Target/AMDGPU/FLATInstructions.td

873

Should this be called ScratchLoadSignedPat_D16?

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

1445–1450

I get a failing assert here with NewOpc = 4294967295:

llvm/include/llvm/MC/MCInstrInfo.h:63: const llvm::MCInstrDesc &llvm::MCInstrInfo::get(unsigned int) const: Assertion `Opcode < NumOpcodes && "Invalid opcode!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: compiler/llpc/amdllpc -gfxip=10.1 -amdgpu-enable-flat-scratch /pipelines/PipelineVsFs_0x1BEFB7D1A235B4F6.pipe -verify-machineinstrs
1.      Running pass 'CallGraph Pass Manager' on module 'lgcPipeline'.
2.      Running pass 'Prologue/Epilogue Insertion & Frame Finalization' on function '@_amdgpu_ps_main'
 #0 0x00000000023f0db1 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /llvm/lib/Support/Unix/Signals.inc:563:13
 #1 0x00000000023ef060 llvm::sys::RunSignalHandlers() /llvm/lib/Support/Signals.cpp:72:18
 #2 0x00000000023f1152 SignalHandler(int) /llvm/lib/Support/Unix/Signals.inc:0:3
 #3 0x00007fadd6ebfee0 __restore_rt (/glibc-2.31/lib/libpthread.so.0+0x12ee0)
 #4 0x00007fadd6d0c08a raise (/glibc-2.31/lib/libc.so.6+0x3808a)
 #5 0x00007fadd6cf6528 abort (/glibc-2.31/lib/libc.so.6+0x22528)
 #6 0x00007fadd6cf640f _nl_load_domain.cold.0 (/glibc-2.31/lib/libc.so.6+0x2240f)
 #7 0x00007fadd6d04a02 (/glibc-2.31/lib/libc.so.6+0x30a02)
 #8 0x0000000001a03170 llvm::SIRegisterInfo::eliminateFrameIndex(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, int, unsigned int, llvm::RegScavenger*) const /llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:1465:11
 #9 0x000000000214e0f3 (anonymous namespace)::PEI::replaceFrameIndices(llvm::MachineBasicBlock*, llvm::MachineFunction&, int&) /llvm/lib/CodeGen/PrologEpilogInserter.cpp:0:11
#10 0x000000000214caef llvm::MachineBasicBlock::getNumber() const /llvm/include/llvm/CodeGen/MachineBasicBlock.h:904:34
#11 0x000000000214caef (anonymous namespace)::PEI::replaceFrameIndices(llvm::MachineFunction&) /llvm/lib/CodeGen/PrologEpilogInserter.cpp:1161:17
#12 0x000000000214caef (anonymous namespace)::PEI::runOnMachineFunction(llvm::MachineFunction&) /llvm/lib/CodeGen/PrologEpilogInserter.cpp:269:3
#13 0x0000000002031e7e llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /llvm/lib/CodeGen/MachineFunctionPass.cpp:0:13
#14 0x0000000003136a85 llvm::FPPassManager::runOnFunction(llvm::Function&) /llvm/lib/IR/LegacyPassManager.cpp:1519:27
#15 0x0000000001c76b38 (anonymous namespace)::CGPassManager::RunPassOnSCC(llvm::Pass*, llvm::CallGraphSCC&, llvm::CallGraph&, bool&, bool&) /llvm/lib/Analysis/CallGraphSCCPass.cpp:178:25
#16 0x0000000001c76b38 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC(llvm::CallGraphSCC&, llvm::CallGraph&, bool&) /llvm/lib/Analysis/CallGraphSCCPass.cpp:476:9
#17 0x0000000001c76b38 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) /llvm/lib/Analysis/CallGraphSCCPass.cpp:541:18
#18 0x0000000003137149 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /llvm/lib/IR/LegacyPassManager.cpp:0:27
#19 0x0000000003137149 llvm::legacy::PassManagerImpl::run(llvm::Module&) /llvm/lib/IR/LegacyPassManager.cpp:615:44
…

Renamed pattern.

rampitec added inline comments.Oct 16 2020, 10:10 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1445–1450	I cannot reproduce this. Take in mind that D89424 is not updated to use ST mode yet, so they do not work together yet.

Rebased to parent.

Correct rebase patch.

arsenm added inline comments.Oct 19 2020, 3:30 PM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1678	Swap these?
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1478	What happens if this needs an SGPR spill?

rampitec updated this revision to Diff 299210.Oct 19 2020, 3:55 PM

rampitec marked an inline comment as done.

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1478	It it can scavenge it it shall be fine as offset shall not change. If not I guess I would need to adjust SP and revert it. I have added FIXME here.

Flakebi added inline comments.Oct 20 2020, 7:59 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
544	Should this be `MFI->hasFlatScratchInit() \|\| (ST.enableFlatScratch() && requiresStackPointerReference(MF))`? Otherwise, the scratch does not get initialized (I guess it’s fine to do that in a later patch).

rampitec added inline comments.Oct 20 2020, 10:51 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
544	Do you see it not initialized? In the SIMachineFunctionInfo() there is this code: if (ST.hasFlatAddressSpace() && isEntryFunction() && isAmdHsaOrMesa) { // TODO: This could be refined a lot. The attribute is a poor way of // detecting calls or stack objects that may require it before argument // lowering. if (HasCalls \|\| HasStackObjects) FlatScratchInit = true; } So I assume it has to be initialized. Probably the culprit is this isAmdHsaOrMesa condition? It may be needed to say (isAmdHsaOrMesa \|\| ST.enableFlatScratch()) instead. For some reason this code is not executed for amdpal, I do not see an obvious reason why.

rampitec added inline comments.Oct 20 2020, 11:22 AM

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
544	Actually I see it uninitialized in my own test. But the code as suggested does not always work because requiresStackPointerReference() is not necessarily true if we have just private loads, we need to make sure hasFlatScratchInit() is set.

Ensure flat scratch initialization;
Added asserts around scavenger calls until there is a better handling of failed scavenging;

rampitec removed a child revision: D89424: [AMDGPU] Spilling using flat scratch.Oct 21 2020, 3:28 PM

Integrated spilling from child revision, child is dropped;
Fixed situation when an SGPR has to be spilled while scavenging in frame elimination;

Herald added a subscriber: qcolombet. · View Herald TranscriptOct 21 2020, 3:31 PM

I also came to conclusion that the only robust way to have no failed scavenging during frame lowering is to always have an sp or fp. Otherwise it can fail regardless of the spilling method. The only other way is to have an instruction with full 32 bit immediate offset. I.e. it can fail in a kernel with MUBUF as well.

In D89170#2345943, @rampitec wrote:

I also came to conclusion that the only robust way to have no failed scavenging during frame lowering is to always have an sp or fp. Otherwise it can fail regardless of the spilling method. The only other way is to have an instruction with full 32 bit immediate offset. I.e. it can fail in a kernel with MUBUF as well.

I was considering requiring an FP if the stack size was starting to hit the offset limit, but was unable to come up with a testcase where it would really break

In D89170#2347139, @arsenm wrote:

In D89170#2345943, @rampitec wrote:

I also came to conclusion that the only robust way to have no failed scavenging during frame lowering is to always have an sp or fp. Otherwise it can fail regardless of the spilling method. The only other way is to have an instruction with full 32 bit immediate offset. I.e. it can fail in a kernel with MUBUF as well.

I was considering requiring an FP if the stack size was starting to hit the offset limit, but was unable to come up with a testcase where it would really break

This sounds like a good idea. We can run into a situation when we can scavenge nothing at all, even if it is not easy to forge a testcase. That is more so with flat scratch until ST mode is available as you always need a register as a base. In fact in this scenario it may be needed even if potential offsets are small. Then we do not need buffer descriptor with flat scratch, so we are saving 4 SGPRs. It sounds fair to use one for the base pointer instead.

Fixed a need of SGPR spill during VGPR spilling on targets w/o flat scratch ST mode, reused existing code adjusting offsets.

Fixed issue with flat scratch not always being initialized. It was not initialized if we had no stack objects or calls, but later did spilling.
It is too late to insert system SGPRs at frame lowering, so initialize it always if flat scratch is used.

arsenm added inline comments.Oct 23 2020, 8:35 AM

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1559–1560	This should really be a pattern predicate. I was recently working on fixing these explicit subtarget checks in the complex patterns recently but didn't finish

arsenm added inline comments.Oct 23 2020, 8:39 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4308–4311 ↗	(On Diff #300113)	Unrelated change?
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
779–783	This looks backwards with the negated conditions

Corrected IsOffsetLegal to remove negation.

Moved predicates from complex patterns into td files.

rampitec added inline comments.Oct 23 2020, 10:56 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
4308–4311 ↗	(On Diff #300113)	It is related, we just never hit it before. I am probing a physical SGPR to see if it is legal. RC is SReg_32, but DRC for scratch instructions is SReg_32_XEXEC_HI and test fails.

rampitec mentioned this in D90064: [AMDGPU] Fixed isLegalRegOperand() with physregs.Oct 23 2020, 11:13 AM

rampitec mentioned this in rG2e64ad949487: [AMDGPU] Fixed isLegalRegOperand() with physregs.Oct 23 2020, 11:33 AM

Rebased.

rampitec marked an inline comment as done.Oct 23 2020, 11:37 AM

Removed unrelated subtarget change.

Looks good to me.
I tested it with the amdvlk vulkan driver (needs a pal-specific patch) and a short Vulkan CTS test ran through fine (except for pal-related failures).

This revision is now accepted and ready to land.Oct 26 2020, 8:30 AM

rampitec requested review of this revision.Oct 26 2020, 2:31 PM

This revision was not accepted when it landed; it landed in state Needs Review.Oct 26 2020, 2:41 PM

Closed by commit rG038d884a50a4: [AMDGPU] Use flat scratch instructions where available (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rG038d884a50a4: [AMDGPU] Use flat scratch instructions where available.

It looks like this broke the windows lldb bot:

http://lab.llvm.org:8011/#/builders/83/builds/336

In D89170#2354963, @stella.stamenova wrote:

It looks like this broke the windows lldb bot:

http://lab.llvm.org:8011/#/builders/83/builds/336

Oops! Release build in fact. Will fix soon.

rampitec mentioned this in rGd176e13ca553: Fixed release build after D89170.Oct 26 2020, 4:01 PM

In D89170#2354963, @stella.stamenova wrote:

It looks like this broke the windows lldb bot:

http://lab.llvm.org:8011/#/builders/83/builds/336

Fixed in https://reviews.llvm.org/rGd176e13ca55353c7ee8d4da23be6eae9f82a64e1

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPU.td

4 lines

AMDGPUISelDAGToDAG.cpp

136 lines

2 lines

9 lines

9 lines

143 lines

2 lines

128 lines

3 lines

12 lines

10 lines

SIMachineFunctionInfo.cpp

5 lines

SIRegisterInfo.h

2 lines

SIRegisterInfo.cpp

237 lines

test/

CodeGen/

AMDGPU/

call-preserved-registers.ll

22 lines

callee-frame-setup.ll

203 lines

chain-hi-to-lo.ll

210 lines

fast-unaligned-load-store.private.ll

88 lines

flat-scratch.ll

1295 lines

frame-index-elimination.ll

99 lines

load-hi16.ll

65 lines

load-lo16.ll

445 lines

local-stack-alloc-block-sp-reference.ll

225 lines

memcpy-fixed-align.ll

73 lines

multi-dword-vgpr-spill.ll

209 lines

non-entry-alloca.ll

432 lines

pei-scavenge-sgpr-gfx9.mir

42 lines

pei-scavenge-vgpr-spill.mir

12 lines

scratch-simple.ll

173 lines

sgpr-spill.mir

89 lines

spill-scavenge-offset.ll

22 lines

stack-pointer-offset-relative-frameindex.ll

124 lines

store-hi16.ll

41 lines

Diff 300797

llvm/lib/Target/AMDGPU/AMDGPU.td

Show First 20 Lines • Show All 1,278 Lines • ▼ Show 20 Lines	def HasDsSrc2Insts : Predicate<"!Subtarget->hasDsSrc2Insts()">,
AssemblerPredicate<(all_of FeatureDsSrc2Insts)>;		AssemblerPredicate<(all_of FeatureDsSrc2Insts)>;

def HasOffset3fBug : Predicate<"!Subtarget->hasOffset3fBug()">,		def HasOffset3fBug : Predicate<"!Subtarget->hasOffset3fBug()">,
AssemblerPredicate<(all_of FeatureOffset3fBug)>;		AssemblerPredicate<(all_of FeatureOffset3fBug)>;

def EnableLateCFGStructurize : Predicate<		def EnableLateCFGStructurize : Predicate<
"EnableLateStructurizeCFG">;		"EnableLateStructurizeCFG">;

		def EnableFlatScratch : Predicate<"Subtarget->enableFlatScratch()">;

		def DisableFlatScratch : Predicate<"!Subtarget->enableFlatScratch()">;

// Include AMDGPU TD files		// Include AMDGPU TD files
include "SISchedule.td"		include "SISchedule.td"
include "GCNProcessors.td"		include "GCNProcessors.td"
include "AMDGPUInstrInfo.td"		include "AMDGPUInstrInfo.td"
include "SIRegisterInfo.td"		include "SIRegisterInfo.td"
include "AMDGPURegisterBanks.td"		include "AMDGPURegisterBanks.td"
include "AMDGPUInstructions.td"		include "AMDGPUInstructions.td"
include "SIInstrInfo.td"		include "SIInstrInfo.td"
include "AMDGPUCallingConv.td"		include "AMDGPUCallingConv.td"
include "AMDGPUSearchableTables.td"		include "AMDGPUSearchableTables.td"

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	private:
bool SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc, SDValue &Soffset,		bool SelectMUBUFOffset(SDValue Addr, SDValue &SRsrc, SDValue &Soffset,
SDValue &Offset) const;		SDValue &Offset) const;

template <bool IsSigned>		template <bool IsSigned>
bool SelectFlatOffset(SDNode *N, SDValue Addr, SDValue &VAddr,		bool SelectFlatOffset(SDNode *N, SDValue Addr, SDValue &VAddr,
SDValue &Offset) const;		SDValue &Offset) const;
bool SelectGlobalSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,		bool SelectGlobalSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,
SDValue &VOffset, SDValue &Offset) const;		SDValue &VOffset, SDValue &Offset) const;
		bool SelectScratchSAddr(SDNode *N, SDValue Addr, SDValue &SAddr,
		SDValue &Offset) const;

bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue &Offset,		bool SelectSMRDOffset(SDValue ByteOffsetNode, SDValue &Offset,
bool &Imm) const;		bool &Imm) const;
SDValue Expand32BitAddress(SDValue Addr) const;		SDValue Expand32BitAddress(SDValue Addr) const;
bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset,		bool SelectSMRD(SDValue Addr, SDValue &SBase, SDValue &Offset,
bool &Imm) const;		bool &Imm) const;
bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;		bool SelectSMRDImm32(SDValue Addr, SDValue &SBase, SDValue &Offset) const;
▲ Show 20 Lines • Show All 1,298 Lines • ▼ Show 20 Lines	bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffen(SDNode *Parent,
return true;		return true;
}		}

bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffset(SDNode *Parent,		bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffset(SDNode *Parent,
SDValue Addr,		SDValue Addr,
SDValue &SRsrc,		SDValue &SRsrc,
SDValue &SOffset,		SDValue &SOffset,
SDValue &Offset) const {		SDValue &Offset) const {
ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr);		ConstantSDNode *CAddr = dyn_cast<ConstantSDNode>(Addr);
if (!CAddr \|\| !SIInstrInfo::isLegalMUBUFImmOffset(CAddr->getZExtValue()))		if (!CAddr \|\| !SIInstrInfo::isLegalMUBUFImmOffset(CAddr->getZExtValue()))
		arsenmUnsubmitted Done Reply Inline Actions This should really be a pattern predicate. I was recently working on fixing these explicit subtarget checks in the complex patterns recently but didn't finish arsenm: This should really be a pattern predicate. I was recently working on fixing these explicit…
return false;		return false;

SDLoc DL(Addr);		SDLoc DL(Addr);
MachineFunction &MF = CurDAG->getMachineFunction();		MachineFunction &MF = CurDAG->getMachineFunction();
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();

SRsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);		SRsrc = CurDAG->getRegister(Info->getScratchRSrcReg(), MVT::v4i32);

▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines

template <bool IsSigned>		template <bool IsSigned>
bool AMDGPUDAGToDAGISel::SelectFlatOffset(SDNode *N,		bool AMDGPUDAGToDAGISel::SelectFlatOffset(SDNode *N,
SDValue Addr,		SDValue Addr,
SDValue &VAddr,		SDValue &VAddr,
SDValue &Offset) const {		SDValue &Offset) const {
int64_t OffsetVal = 0;		int64_t OffsetVal = 0;

		unsigned AS = findMemSDNode(N)->getAddressSpace();

		arsenmUnsubmitted Done Reply Inline Actions Swap these? arsenm: Swap these?
if (Subtarget->hasFlatInstOffsets() &&		if (Subtarget->hasFlatInstOffsets() &&
(!Subtarget->hasFlatSegmentOffsetBug() \|\|		(!Subtarget->hasFlatSegmentOffsetBug() \|\|
findMemSDNode(N)->getAddressSpace() != AMDGPUAS::FLAT_ADDRESS)) {		AS != AMDGPUAS::FLAT_ADDRESS)) {
SDValue N0, N1;		SDValue N0, N1;
if (CurDAG->isBaseWithConstantOffset(Addr)) {		if (CurDAG->isBaseWithConstantOffset(Addr)) {
N0 = Addr.getOperand(0);		N0 = Addr.getOperand(0);
N1 = Addr.getOperand(1);		N1 = Addr.getOperand(1);
} else if (getBaseWithOffsetUsingSplitOR(*CurDAG, Addr, N0, N1)) {		} else if (getBaseWithOffsetUsingSplitOR(*CurDAG, Addr, N0, N1)) {
assert(N0 && N1 && isa<ConstantSDNode>(N1));		assert(N0 && N1 && isa<ConstantSDNode>(N1));
}		}
if (N0 && N1) {		if (N0 && N1) {
uint64_t COffsetVal = cast<ConstantSDNode>(N1)->getSExtValue();		uint64_t COffsetVal = cast<ConstantSDNode>(N1)->getSExtValue();

const SIInstrInfo *TII = Subtarget->getInstrInfo();		const SIInstrInfo *TII = Subtarget->getInstrInfo();
unsigned AS = findMemSDNode(N)->getAddressSpace();
if (TII->isLegalFLATOffset(COffsetVal, AS, IsSigned)) {		if (TII->isLegalFLATOffset(COffsetVal, AS, IsSigned)) {
Addr = N0;		Addr = N0;
OffsetVal = COffsetVal;		OffsetVal = COffsetVal;
} else {		} else {
// If the offset doesn't fit, put the low bits into the offset field and		// If the offset doesn't fit, put the low bits into the offset field and
// add the rest.		// add the rest.
//		//
// For a FLAT instruction the hardware decides whether to access		// For a FLAT instruction the hardware decides whether to access
Show All 16 Lines	if (N0 && N1) {
ImmField = COffsetVal & maskTrailingOnes<uint64_t>(NumBits);		ImmField = COffsetVal & maskTrailingOnes<uint64_t>(NumBits);
RemainderOffset = COffsetVal - ImmField;		RemainderOffset = COffsetVal - ImmField;
}		}
assert(TII->isLegalFLATOffset(ImmField, AS, IsSigned));		assert(TII->isLegalFLATOffset(ImmField, AS, IsSigned));
assert(RemainderOffset + ImmField == COffsetVal);		assert(RemainderOffset + ImmField == COffsetVal);

OffsetVal = ImmField;		OffsetVal = ImmField;

		SDValue AddOffsetLo =
		getMaterializedScalarImm32(Lo_32(RemainderOffset), DL);
		SDValue Clamp = CurDAG->getTargetConstant(0, DL, MVT::i1);

		if (Addr.getValueType().getSizeInBits() == 32) {
		SmallVector<SDValue, 3> Opnds;
		Opnds.push_back(N0);
		Opnds.push_back(AddOffsetLo);
		unsigned AddOp = AMDGPU::V_ADD_CO_U32_e32;
		if (Subtarget->hasAddNoCarry()) {
		AddOp = AMDGPU::V_ADD_U32_e64;
		Opnds.push_back(Clamp);
		}
		Addr = SDValue(CurDAG->getMachineNode(AddOp, DL, MVT::i32, Opnds), 0);
		} else {
// TODO: Should this try to use a scalar add pseudo if the base address		// TODO: Should this try to use a scalar add pseudo if the base address
// is uniform and saddr is usable?		// is uniform and saddr is usable?
SDValue Sub0 = CurDAG->getTargetConstant(AMDGPU::sub0, DL, MVT::i32);		SDValue Sub0 = CurDAG->getTargetConstant(AMDGPU::sub0, DL, MVT::i32);
SDValue Sub1 = CurDAG->getTargetConstant(AMDGPU::sub1, DL, MVT::i32);		SDValue Sub1 = CurDAG->getTargetConstant(AMDGPU::sub1, DL, MVT::i32);

SDNode *N0Lo = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL,		SDNode *N0Lo = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG,
MVT::i32, N0, Sub0);		DL, MVT::i32, N0, Sub0);
SDNode *N0Hi = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG, DL,		SDNode *N0Hi = CurDAG->getMachineNode(TargetOpcode::EXTRACT_SUBREG,
MVT::i32, N0, Sub1);		DL, MVT::i32, N0, Sub1);

SDValue AddOffsetLo =
getMaterializedScalarImm32(Lo_32(RemainderOffset), DL);
SDValue AddOffsetHi =		SDValue AddOffsetHi =
getMaterializedScalarImm32(Hi_32(RemainderOffset), DL);		getMaterializedScalarImm32(Hi_32(RemainderOffset), DL);

SDVTList VTs = CurDAG->getVTList(MVT::i32, MVT::i1);		SDVTList VTs = CurDAG->getVTList(MVT::i32, MVT::i1);
SDValue Clamp = CurDAG->getTargetConstant(0, DL, MVT::i1);

SDNode *Add =		SDNode *Add =
CurDAG->getMachineNode(AMDGPU::V_ADD_CO_U32_e64, DL, VTs,		CurDAG->getMachineNode(AMDGPU::V_ADD_CO_U32_e64, DL, VTs,
{AddOffsetLo, SDValue(N0Lo, 0), Clamp});		{AddOffsetLo, SDValue(N0Lo, 0), Clamp});

SDNode *Addc = CurDAG->getMachineNode(		SDNode *Addc = CurDAG->getMachineNode(
AMDGPU::V_ADDC_U32_e64, DL, VTs,		AMDGPU::V_ADDC_U32_e64, DL, VTs,
{AddOffsetHi, SDValue(N0Hi, 0), SDValue(Add, 1), Clamp});		{AddOffsetHi, SDValue(N0Hi, 0), SDValue(Add, 1), Clamp});

SDValue RegSequenceArgs[] = {		SDValue RegSequenceArgs[] = {
CurDAG->getTargetConstant(AMDGPU::VReg_64RegClassID, DL, MVT::i32),		CurDAG->getTargetConstant(AMDGPU::VReg_64RegClassID, DL, MVT::i32),
SDValue(Add, 0), Sub0, SDValue(Addc, 0), Sub1};		SDValue(Add, 0), Sub0, SDValue(Addc, 0), Sub1};

Addr = SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, DL,		Addr = SDValue(CurDAG->getMachineNode(AMDGPU::REG_SEQUENCE, DL,
MVT::i64, RegSequenceArgs),		MVT::i64, RegSequenceArgs),
0);		0);
}		}
}		}
}		}
		}

VAddr = Addr;		VAddr = Addr;
Offset = CurDAG->getTargetConstant(OffsetVal, SDLoc(), MVT::i16);		Offset = CurDAG->getTargetConstant(OffsetVal, SDLoc(), MVT::i16);
return true;		return true;
}		}

// If this matches zero_extend i32:x, return x		// If this matches zero_extend i32:x, return x
static SDValue matchZExtFromI32(SDValue Op) {		static SDValue matchZExtFromI32(SDValue Op) {
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	bool AMDGPUDAGToDAGISel::SelectGlobalSAddr(SDNode *N,

if (!SAddr)		if (!SAddr)
return false;		return false;

Offset = CurDAG->getTargetConstant(ImmOffset, SDLoc(), MVT::i16);		Offset = CurDAG->getTargetConstant(ImmOffset, SDLoc(), MVT::i16);
return true;		return true;
}		}

		// Match (32-bit SGPR base) + sext(imm offset)
		bool AMDGPUDAGToDAGISel::SelectScratchSAddr(SDNode *N,
		SDValue Addr,
		SDValue &SAddr,
		SDValue &Offset) const {
		if (Addr->isDivergent())
		return false;

		SAddr = Addr;
		int64_t COffsetVal = 0;

		if (CurDAG->isBaseWithConstantOffset(Addr)) {
		COffsetVal = cast<ConstantSDNode>(Addr.getOperand(1))->getSExtValue();
		SAddr = Addr.getOperand(0);
		}

		if (auto FI = dyn_cast<FrameIndexSDNode>(SAddr)) {
		SAddr = CurDAG->getTargetFrameIndex(FI->getIndex(), FI->getValueType(0));
		} else if (SAddr.getOpcode() == ISD::ADD &&
		isa<FrameIndexSDNode>(SAddr.getOperand(0))) {
		// Materialize this into a scalar move for scalar address to avoid
		// readfirstlane.
		auto FI = cast<FrameIndexSDNode>(SAddr.getOperand(0));
		SDValue TFI = CurDAG->getTargetFrameIndex(FI->getIndex(),
		FI->getValueType(0));
		SAddr = SDValue(CurDAG->getMachineNode(AMDGPU::S_ADD_U32, SDLoc(SAddr),
		MVT::i32, TFI, SAddr.getOperand(1)),
		0);
		}

		const SIInstrInfo *TII = Subtarget->getInstrInfo();

		if (!TII->isLegalFLATOffset(COffsetVal, AMDGPUAS::PRIVATE_ADDRESS, true)) {
		int64_t RemainderOffset = COffsetVal;
		int64_t ImmField = 0;
		const unsigned NumBits = TII->getNumFlatOffsetBits(true);
		// Use signed division by a power of two to truncate towards 0.
		int64_t D = 1LL << (NumBits - 1);
		RemainderOffset = (COffsetVal / D) * D;
		ImmField = COffsetVal - RemainderOffset;

		assert(TII->isLegalFLATOffset(ImmField, AMDGPUAS::PRIVATE_ADDRESS, true));
		assert(RemainderOffset + ImmField == COffsetVal);

		COffsetVal = ImmField;

		SDLoc DL(N);
		SDValue AddOffset =
		getMaterializedScalarImm32(Lo_32(RemainderOffset), DL);
		SAddr = SDValue(CurDAG->getMachineNode(AMDGPU::S_ADD_U32, DL, MVT::i32,
		SAddr, AddOffset), 0);
		}

		Offset = CurDAG->getTargetConstant(COffsetVal, SDLoc(), MVT::i16);

		return true;
		}

bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,		bool AMDGPUDAGToDAGISel::SelectSMRDOffset(SDValue ByteOffsetNode,
SDValue &Offset, bool &Imm) const {		SDValue &Offset, bool &Imm) const {
ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);		ConstantSDNode *C = dyn_cast<ConstantSDNode>(ByteOffsetNode);
if (!C) {		if (!C) {
if (ByteOffsetNode.getValueType().isScalarInteger() &&		if (ByteOffsetNode.getValueType().isScalarInteger() &&
ByteOffsetNode.getValueType().getSizeInBits() == 32) {		ByteOffsetNode.getValueType().getSizeInBits() == 32) {
Offset = ByteOffsetNode;		Offset = ByteOffsetNode;
Imm = false;		Imm = false;
▲ Show 20 Lines • Show All 1,151 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 953 Lines • ▼ Show 20 Lines	public:
// static wrappers		// static wrappers
static bool hasHalfRate64Ops(const TargetSubtargetInfo &STI);		static bool hasHalfRate64Ops(const TargetSubtargetInfo &STI);

// XXX - Why is this here if it isn't in the default pass set?		// XXX - Why is this here if it isn't in the default pass set?
bool enableEarlyIfConversion() const override {		bool enableEarlyIfConversion() const override {
return true;		return true;
}		}

		bool enableFlatScratch() const;

void overrideSchedPolicy(MachineSchedPolicy &Policy,		void overrideSchedPolicy(MachineSchedPolicy &Policy,
unsigned NumRegionInstrs) const override;		unsigned NumRegionInstrs) const override;

unsigned getMaxNumUserSGPRs() const {		unsigned getMaxNumUserSGPRs() const {
return 16;		return 16;
}		}

bool hasSMemRealTime() const {		bool hasSMemRealTime() const {
▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	static cl::opt<bool> DisablePowerSched(
cl::desc("Disable scheduling to minimize mAI power bursts"),		cl::desc("Disable scheduling to minimize mAI power bursts"),
cl::init(false));		cl::init(false));

static cl::opt<bool> EnableVGPRIndexMode(		static cl::opt<bool> EnableVGPRIndexMode(
"amdgpu-vgpr-index-mode",		"amdgpu-vgpr-index-mode",
cl::desc("Use GPR indexing mode instead of movrel for vector indexing"),		cl::desc("Use GPR indexing mode instead of movrel for vector indexing"),
cl::init(false));		cl::init(false));

		static cl::opt<bool> EnableFlatScratch(
		"amdgpu-enable-flat-scratch",
		cl::desc("Use flat scratch instructions"),
		cl::init(false));

GCNSubtarget::~GCNSubtarget() = default;		GCNSubtarget::~GCNSubtarget() = default;

R600Subtarget &		R600Subtarget &
R600Subtarget::initializeSubtargetDependencies(const Triple &TT,		R600Subtarget::initializeSubtargetDependencies(const Triple &TT,
StringRef GPU, StringRef FS) {		StringRef GPU, StringRef FS) {
SmallString<256> FullFS("+promote-alloca,");		SmallString<256> FullFS("+promote-alloca,");
FullFS += FS;		FullFS += FS;
ParseSubtargetFeatures(GPU, /TuneCPU/ GPU, FullFS);		ParseSubtargetFeatures(GPU, /TuneCPU/ GPU, FullFS);
▲ Show 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	GCNSubtarget::GCNSubtarget(const Triple &TT, StringRef GPU, StringRef FS,
CallLoweringInfo.reset(new AMDGPUCallLowering(*getTargetLowering()));		CallLoweringInfo.reset(new AMDGPUCallLowering(*getTargetLowering()));
InlineAsmLoweringInfo.reset(new InlineAsmLowering(getTargetLowering()));		InlineAsmLoweringInfo.reset(new InlineAsmLowering(getTargetLowering()));
Legalizer.reset(new AMDGPULegalizerInfo(*this, TM));		Legalizer.reset(new AMDGPULegalizerInfo(*this, TM));
RegBankInfo.reset(new AMDGPURegisterBankInfo(*this));		RegBankInfo.reset(new AMDGPURegisterBankInfo(*this));
InstSelector.reset(new AMDGPUInstructionSelector(		InstSelector.reset(new AMDGPUInstructionSelector(
this, static_cast<AMDGPURegisterBankInfo *>(RegBankInfo.get()), TM));		this, static_cast<AMDGPURegisterBankInfo *>(RegBankInfo.get()), TM));
}		}

		bool GCNSubtarget::enableFlatScratch() const {
		return EnableFlatScratch && hasFlatScratchInsts();
		}

unsigned GCNSubtarget::getConstantBusLimit(unsigned Opcode) const {		unsigned GCNSubtarget::getConstantBusLimit(unsigned Opcode) const {
if (getGeneration() < GFX10)		if (getGeneration() < GFX10)
return 1;		return 1;

switch (Opcode) {		switch (Opcode) {
case AMDGPU::V_LSHLREV_B64:		case AMDGPU::V_LSHLREV_B64:
case AMDGPU::V_LSHLREV_B64_gfx10:		case AMDGPU::V_LSHLREV_B64_gfx10:
case AMDGPU::V_LSHL_B64:		case AMDGPU::V_LSHL_B64:
▲ Show 20 Lines • Show All 630 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/BUFInstructions.td

Show First 20 Lines • Show All 1,586 Lines • ▼ Show 20 Lines	multiclass MUBUFScratchLoadPat_D16 <MUBUF_Pseudo InstrOffen,
>;		>;

def : GCNPat <		def : GCNPat <
(ld_frag (MUBUFScratchOffset v4i32:$srsrc, i32:$soffset, u16imm:$offset), vt:$in),		(ld_frag (MUBUFScratchOffset v4i32:$srsrc, i32:$soffset, u16imm:$offset), vt:$in),
(InstrOffset $srsrc, $soffset, $offset, 0, 0, 0, 0, 0, $in)		(InstrOffset $srsrc, $soffset, $offset, 0, 0, 0, 0, 0, $in)
>;		>;
}		}

		let OtherPredicates = [DisableFlatScratch] in {
defm : MUBUFScratchLoadPat <BUFFER_LOAD_SBYTE_OFFEN, BUFFER_LOAD_SBYTE_OFFSET, i32, sextloadi8_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_SBYTE_OFFEN, BUFFER_LOAD_SBYTE_OFFSET, i32, sextloadi8_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, BUFFER_LOAD_UBYTE_OFFSET, i32, extloadi8_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, BUFFER_LOAD_UBYTE_OFFSET, i32, extloadi8_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, BUFFER_LOAD_UBYTE_OFFSET, i32, zextloadi8_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, BUFFER_LOAD_UBYTE_OFFSET, i32, zextloadi8_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_SBYTE_OFFEN, BUFFER_LOAD_SBYTE_OFFSET, i16, sextloadi8_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_SBYTE_OFFEN, BUFFER_LOAD_SBYTE_OFFSET, i16, sextloadi8_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, BUFFER_LOAD_UBYTE_OFFSET, i16, extloadi8_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, BUFFER_LOAD_UBYTE_OFFSET, i16, extloadi8_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, BUFFER_LOAD_UBYTE_OFFSET, i16, zextloadi8_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_UBYTE_OFFEN, BUFFER_LOAD_UBYTE_OFFSET, i16, zextloadi8_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_SSHORT_OFFEN, BUFFER_LOAD_SSHORT_OFFSET, i32, sextloadi16_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_SSHORT_OFFEN, BUFFER_LOAD_SSHORT_OFFSET, i32, sextloadi16_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_USHORT_OFFEN, BUFFER_LOAD_USHORT_OFFSET, i32, extloadi16_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_USHORT_OFFEN, BUFFER_LOAD_USHORT_OFFSET, i32, extloadi16_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_USHORT_OFFEN, BUFFER_LOAD_USHORT_OFFSET, i32, zextloadi16_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_USHORT_OFFEN, BUFFER_LOAD_USHORT_OFFSET, i32, zextloadi16_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_USHORT_OFFEN, BUFFER_LOAD_USHORT_OFFSET, i16, load_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_USHORT_OFFEN, BUFFER_LOAD_USHORT_OFFSET, i16, load_private>;

foreach vt = Reg32Types.types in {		foreach vt = Reg32Types.types in {
defm : MUBUFScratchLoadPat <BUFFER_LOAD_DWORD_OFFEN, BUFFER_LOAD_DWORD_OFFSET, vt, load_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_DWORD_OFFEN, BUFFER_LOAD_DWORD_OFFSET, vt, load_private>;
}		}
defm : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX2_OFFEN, BUFFER_LOAD_DWORDX2_OFFSET, v2i32, load_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX2_OFFEN, BUFFER_LOAD_DWORDX2_OFFSET, v2i32, load_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX3_OFFEN, BUFFER_LOAD_DWORDX3_OFFSET, v3i32, load_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX3_OFFEN, BUFFER_LOAD_DWORDX3_OFFSET, v3i32, load_private>;
defm : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX4_OFFEN, BUFFER_LOAD_DWORDX4_OFFSET, v4i32, load_private>;		defm : MUBUFScratchLoadPat <BUFFER_LOAD_DWORDX4_OFFEN, BUFFER_LOAD_DWORDX4_OFFSET, v4i32, load_private>;

let OtherPredicates = [D16PreservesUnusedBits] in {		let OtherPredicates = [D16PreservesUnusedBits, DisableFlatScratch] in {
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SHORT_D16_HI_OFFEN, BUFFER_LOAD_SHORT_D16_HI_OFFSET, v2i16, load_d16_hi_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SHORT_D16_HI_OFFEN, BUFFER_LOAD_SHORT_D16_HI_OFFSET, v2i16, load_d16_hi_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_UBYTE_D16_HI_OFFEN, BUFFER_LOAD_UBYTE_D16_HI_OFFSET, v2i16, az_extloadi8_d16_hi_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_UBYTE_D16_HI_OFFEN, BUFFER_LOAD_UBYTE_D16_HI_OFFSET, v2i16, az_extloadi8_d16_hi_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SBYTE_D16_HI_OFFEN, BUFFER_LOAD_SBYTE_D16_HI_OFFSET, v2i16, sextloadi8_d16_hi_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SBYTE_D16_HI_OFFEN, BUFFER_LOAD_SBYTE_D16_HI_OFFSET, v2i16, sextloadi8_d16_hi_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SHORT_D16_HI_OFFEN, BUFFER_LOAD_SHORT_D16_HI_OFFSET, v2f16, load_d16_hi_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SHORT_D16_HI_OFFEN, BUFFER_LOAD_SHORT_D16_HI_OFFSET, v2f16, load_d16_hi_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_UBYTE_D16_HI_OFFEN, BUFFER_LOAD_UBYTE_D16_HI_OFFSET, v2f16, az_extloadi8_d16_hi_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_UBYTE_D16_HI_OFFEN, BUFFER_LOAD_UBYTE_D16_HI_OFFSET, v2f16, az_extloadi8_d16_hi_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SBYTE_D16_HI_OFFEN, BUFFER_LOAD_SBYTE_D16_HI_OFFSET, v2f16, sextloadi8_d16_hi_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SBYTE_D16_HI_OFFEN, BUFFER_LOAD_SBYTE_D16_HI_OFFSET, v2f16, sextloadi8_d16_hi_private>;

defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SHORT_D16_OFFEN, BUFFER_LOAD_SHORT_D16_OFFSET, v2i16, load_d16_lo_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SHORT_D16_OFFEN, BUFFER_LOAD_SHORT_D16_OFFSET, v2i16, load_d16_lo_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_UBYTE_D16_OFFEN, BUFFER_LOAD_UBYTE_D16_OFFSET, v2i16, az_extloadi8_d16_lo_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_UBYTE_D16_OFFEN, BUFFER_LOAD_UBYTE_D16_OFFSET, v2i16, az_extloadi8_d16_lo_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SBYTE_D16_OFFEN, BUFFER_LOAD_SBYTE_D16_OFFSET, v2i16, sextloadi8_d16_lo_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SBYTE_D16_OFFEN, BUFFER_LOAD_SBYTE_D16_OFFSET, v2i16, sextloadi8_d16_lo_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SHORT_D16_OFFEN, BUFFER_LOAD_SHORT_D16_OFFSET, v2f16, load_d16_lo_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SHORT_D16_OFFEN, BUFFER_LOAD_SHORT_D16_OFFSET, v2f16, load_d16_lo_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_UBYTE_D16_OFFEN, BUFFER_LOAD_UBYTE_D16_OFFSET, v2f16, az_extloadi8_d16_lo_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_UBYTE_D16_OFFEN, BUFFER_LOAD_UBYTE_D16_OFFSET, v2f16, az_extloadi8_d16_lo_private>;
defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SBYTE_D16_OFFEN, BUFFER_LOAD_SBYTE_D16_OFFSET, v2f16, sextloadi8_d16_lo_private>;		defm : MUBUFScratchLoadPat_D16<BUFFER_LOAD_SBYTE_D16_OFFEN, BUFFER_LOAD_SBYTE_D16_OFFSET, v2f16, sextloadi8_d16_lo_private>;
}		}

		} // End OtherPredicates = [DisableFlatScratch]

multiclass MUBUFStore_Atomic_Pattern <MUBUF_Pseudo Instr_ADDR64, MUBUF_Pseudo Instr_OFFSET,		multiclass MUBUFStore_Atomic_Pattern <MUBUF_Pseudo Instr_ADDR64, MUBUF_Pseudo Instr_OFFSET,
ValueType vt, PatFrag atomic_st> {		ValueType vt, PatFrag atomic_st> {
// Store follows atomic op convention so address is first		// Store follows atomic op convention so address is first
def : GCNPat <		def : GCNPat <
(atomic_st (MUBUFAddr64 v4i32:$srsrc, i64:$vaddr, i32:$soffset,		(atomic_st (MUBUFAddr64 v4i32:$srsrc, i64:$vaddr, i32:$soffset,
i16:$offset, i1:$slc), vt:$val),		i16:$offset, i1:$slc), vt:$val),
(Instr_ADDR64 $val, $vaddr, $srsrc, $soffset, $offset, 0, $slc, 0, 0, 0)		(Instr_ADDR64 $val, $vaddr, $srsrc, $soffset, $offset, 0, $slc, 0, 0, 0)
>;		>;
Show All 34 Lines	multiclass MUBUFScratchStorePat <MUBUF_Pseudo InstrOffen,

def : GCNPat <		def : GCNPat <
(st vt:$value, (MUBUFScratchOffset v4i32:$srsrc, i32:$soffset,		(st vt:$value, (MUBUFScratchOffset v4i32:$srsrc, i32:$soffset,
u16imm:$offset)),		u16imm:$offset)),
(InstrOffset rc:$value, $srsrc, $soffset, $offset, 0, 0, 0, 0, 0)		(InstrOffset rc:$value, $srsrc, $soffset, $offset, 0, 0, 0, 0, 0)
>;		>;
}		}

		let OtherPredicates = [DisableFlatScratch] in {
defm : MUBUFScratchStorePat <BUFFER_STORE_BYTE_OFFEN, BUFFER_STORE_BYTE_OFFSET, i32, truncstorei8_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_BYTE_OFFEN, BUFFER_STORE_BYTE_OFFSET, i32, truncstorei8_private>;
defm : MUBUFScratchStorePat <BUFFER_STORE_SHORT_OFFEN, BUFFER_STORE_SHORT_OFFSET, i32, truncstorei16_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_SHORT_OFFEN, BUFFER_STORE_SHORT_OFFSET, i32, truncstorei16_private>;
defm : MUBUFScratchStorePat <BUFFER_STORE_BYTE_OFFEN, BUFFER_STORE_BYTE_OFFSET, i16, truncstorei8_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_BYTE_OFFEN, BUFFER_STORE_BYTE_OFFSET, i16, truncstorei8_private>;
defm : MUBUFScratchStorePat <BUFFER_STORE_SHORT_OFFEN, BUFFER_STORE_SHORT_OFFSET, i16, store_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_SHORT_OFFEN, BUFFER_STORE_SHORT_OFFSET, i16, store_private>;

foreach vt = Reg32Types.types in {		foreach vt = Reg32Types.types in {
defm : MUBUFScratchStorePat <BUFFER_STORE_DWORD_OFFEN, BUFFER_STORE_DWORD_OFFSET, vt, store_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_DWORD_OFFEN, BUFFER_STORE_DWORD_OFFSET, vt, store_private>;
}		}

defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX2_OFFEN, BUFFER_STORE_DWORDX2_OFFSET, v2i32, store_private, VReg_64>;		defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX2_OFFEN, BUFFER_STORE_DWORDX2_OFFSET, v2i32, store_private, VReg_64>;
defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX3_OFFEN, BUFFER_STORE_DWORDX3_OFFSET, v3i32, store_private, VReg_96>;		defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX3_OFFEN, BUFFER_STORE_DWORDX3_OFFSET, v3i32, store_private, VReg_96>;
defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX4_OFFEN, BUFFER_STORE_DWORDX4_OFFSET, v4i32, store_private, VReg_128>;		defm : MUBUFScratchStorePat <BUFFER_STORE_DWORDX4_OFFEN, BUFFER_STORE_DWORDX4_OFFSET, v4i32, store_private, VReg_128>;


let OtherPredicates = [D16PreservesUnusedBits] in {		let OtherPredicates = [D16PreservesUnusedBits, DisableFlatScratch] in {
// Hiding the extract high pattern in the PatFrag seems to not		// Hiding the extract high pattern in the PatFrag seems to not
// automatically increase the complexity.		// automatically increase the complexity.
let AddedComplexity = 1 in {		let AddedComplexity = 1 in {
defm : MUBUFScratchStorePat <BUFFER_STORE_SHORT_D16_HI_OFFEN, BUFFER_STORE_SHORT_D16_HI_OFFSET, i32, store_hi16_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_SHORT_D16_HI_OFFEN, BUFFER_STORE_SHORT_D16_HI_OFFSET, i32, store_hi16_private>;
defm : MUBUFScratchStorePat <BUFFER_STORE_BYTE_D16_HI_OFFEN, BUFFER_STORE_BYTE_D16_HI_OFFSET, i32, truncstorei8_hi16_private>;		defm : MUBUFScratchStorePat <BUFFER_STORE_BYTE_D16_HI_OFFEN, BUFFER_STORE_BYTE_D16_HI_OFFSET, i32, truncstorei8_hi16_private>;
}		}
}		}
		} // End OtherPredicates = [DisableFlatScratch]

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// MTBUF Patterns		// MTBUF Patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// tbuffer_load/store_format patterns		// tbuffer_load/store_format patterns
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 799 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/FLATInstructions.td

//===-- FLATInstructions.td - FLAT Instruction Definitions ----------------===//		//===-- FLATInstructions.td - FLAT Instruction Definitions ----------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def FLATOffset : ComplexPattern<i64, 2, "SelectFlatOffset<false>", [], [SDNPWantRoot], -10>;		def FLATOffset : ComplexPattern<i64, 2, "SelectFlatOffset<false>", [], [SDNPWantRoot], -10>;
def FLATOffsetSigned : ComplexPattern<i64, 2, "SelectFlatOffset<true>", [], [SDNPWantRoot], -10>;		def FLATOffsetSigned : ComplexPattern<i64, 2, "SelectFlatOffset<true>", [], [SDNPWantRoot], -10>;
		def ScratchOffset : ComplexPattern<i32, 2, "SelectFlatOffset<true>", [], [SDNPWantRoot], -10>;

def GlobalSAddr : ComplexPattern<i64, 3, "SelectGlobalSAddr", [], [SDNPWantRoot], -10>;		def GlobalSAddr : ComplexPattern<i64, 3, "SelectGlobalSAddr", [], [SDNPWantRoot], -10>;
		def ScratchSAddr : ComplexPattern<i32, 2, "SelectScratchSAddr", [], [SDNPWantRoot], -10>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// FLAT classes		// FLAT classes
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class FLAT_Pseudo<string opName, dag outs, dag ins,		class FLAT_Pseudo<string opName, dag outs, dag ins,
string asmOps, list<dag> pattern=[]> :		string asmOps, list<dag> pattern=[]> :
InstSI<outs, ins, "", pattern>,		InstSI<outs, ins, "", pattern>,
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	class FLAT_Global_Store_AddTid_Pseudo <string opName, RegisterClass vdataClass,
let mayStore = 1;		let mayStore = 1;
let has_vdst = 0;		let has_vdst = 0;
let has_vaddr = 0;		let has_vaddr = 0;
let has_saddr = 1;		let has_saddr = 1;
let enabled_saddr = 1;		let enabled_saddr = 1;
let maybeAtomic = 1;		let maybeAtomic = 1;
}		}

		class FlatScratchInst <string sv_op, string mode> {
		string SVOp = sv_op;
		string Mode = mode;
		}

class FLAT_Scratch_Load_Pseudo <string opName, RegisterClass regClass,		class FLAT_Scratch_Load_Pseudo <string opName, RegisterClass regClass,
bit HasTiedOutput = 0,		bit HasTiedOutput = 0,
bit EnableSaddr = 0,		bit EnableSaddr = 0,
bit EnableVaddr = !not(EnableSaddr)>		bit EnableVaddr = !not(EnableSaddr)>
: FLAT_Pseudo<		: FLAT_Pseudo<
opName,		opName,
(outs regClass:$vdst),		(outs regClass:$vdst),
!con(		!con(
Show All 34 Lines	class FLAT_Scratch_Store_Pseudo <string opName, RegisterClass vdataClass, bit EnableSaddr = 0,
let enabled_saddr = EnableSaddr;		let enabled_saddr = EnableSaddr;
let has_vaddr = EnableVaddr;		let has_vaddr = EnableVaddr;
let PseudoInstr = opName#!if(EnableSaddr, "_SADDR", !if(EnableVaddr, "", "_ST"));		let PseudoInstr = opName#!if(EnableSaddr, "_SADDR", !if(EnableVaddr, "", "_ST"));
let maybeAtomic = 1;		let maybeAtomic = 1;
}		}

multiclass FLAT_Scratch_Load_Pseudo<string opName, RegisterClass regClass, bit HasTiedOutput = 0> {		multiclass FLAT_Scratch_Load_Pseudo<string opName, RegisterClass regClass, bit HasTiedOutput = 0> {
let is_flat_scratch = 1 in {		let is_flat_scratch = 1 in {
def "" : FLAT_Scratch_Load_Pseudo<opName, regClass, HasTiedOutput>;		def "" : FLAT_Scratch_Load_Pseudo<opName, regClass, HasTiedOutput>,
def _SADDR : FLAT_Scratch_Load_Pseudo<opName, regClass, HasTiedOutput, 1>;		FlatScratchInst<opName, "SV">;
		def _SADDR : FLAT_Scratch_Load_Pseudo<opName, regClass, HasTiedOutput, 1>,
		FlatScratchInst<opName, "SS">;

let SubtargetPredicate = HasFlatScratchSTMode in		let SubtargetPredicate = HasFlatScratchSTMode in
def _ST : FLAT_Scratch_Load_Pseudo<opName, regClass, HasTiedOutput, 0, 0>;		def _ST : FLAT_Scratch_Load_Pseudo<opName, regClass, HasTiedOutput, 0, 0>,
		FlatScratchInst<opName, "ST">;
}		}
}		}

multiclass FLAT_Scratch_Store_Pseudo<string opName, RegisterClass regClass> {		multiclass FLAT_Scratch_Store_Pseudo<string opName, RegisterClass regClass> {
let is_flat_scratch = 1 in {		let is_flat_scratch = 1 in {
def "" : FLAT_Scratch_Store_Pseudo<opName, regClass>;		def "" : FLAT_Scratch_Store_Pseudo<opName, regClass>,
def _SADDR : FLAT_Scratch_Store_Pseudo<opName, regClass, 1>;		FlatScratchInst<opName, "SV">;
		def _SADDR : FLAT_Scratch_Store_Pseudo<opName, regClass, 1>,
		FlatScratchInst<opName, "SS">;

let SubtargetPredicate = HasFlatScratchSTMode in		let SubtargetPredicate = HasFlatScratchSTMode in
def _ST : FLAT_Scratch_Store_Pseudo<opName, regClass, 0, 0>;		def _ST : FLAT_Scratch_Store_Pseudo<opName, regClass, 0, 0>,
		FlatScratchInst<opName, "ST">;
}		}
}		}

class FLAT_AtomicNoRet_Pseudo<string opName, dag outs, dag ins,		class FLAT_AtomicNoRet_Pseudo<string opName, dag outs, dag ins,
string asm, list<dag> pattern = []> :		string asm, list<dag> pattern = []> :
FLAT_Pseudo<opName, outs, ins, asm, pattern> {		FLAT_Pseudo<opName, outs, ins, asm, pattern> {
let mayLoad = 1;		let mayLoad = 1;
let mayStore = 1;		let mayStore = 1;
▲ Show 20 Lines • Show All 538 Lines • ▼ Show 20 Lines
>;		>;

class FlatSignedAtomicPat <FLAT_Pseudo inst, SDPatternOperator node, ValueType vt,		class FlatSignedAtomicPat <FLAT_Pseudo inst, SDPatternOperator node, ValueType vt,
ValueType data_vt = vt> : GCNPat <		ValueType data_vt = vt> : GCNPat <
(vt (node (FLATOffsetSigned i64:$vaddr, i16:$offset), data_vt:$data)),		(vt (node (FLATOffsetSigned i64:$vaddr, i16:$offset), data_vt:$data)),
(inst $vaddr, $data, $offset)		(inst $vaddr, $data, $offset)
>;		>;

		class ScratchLoadSignedPat <FLAT_Pseudo inst, SDPatternOperator node, ValueType vt> : GCNPat <
		(vt (node (ScratchOffset (i32 VGPR_32:$vaddr), i16:$offset))),
		(inst $vaddr, $offset)
		>;

		class ScratchLoadSignedPat_D16 <FLAT_Pseudo inst, SDPatternOperator node, ValueType vt> : GCNPat <
		FlakebiUnsubmitted Done Reply Inline Actions Should this be called `ScratchLoadSignedPat_D16`? Flakebi: Should this be called `ScratchLoadSignedPat_D16`?
		(node (ScratchOffset (i32 VGPR_32:$vaddr), i16:$offset), vt:$in),
		(inst $vaddr, $offset, 0, 0, 0, $in)
		>;

		class ScratchStoreSignedPat <FLAT_Pseudo inst, SDPatternOperator node, ValueType vt> : GCNPat <
		(node vt:$data, (ScratchOffset (i32 VGPR_32:$vaddr), i16:$offset)),
		(inst getVregSrcForVT<vt>.ret:$data, $vaddr, $offset)
		>;

		class ScratchLoadSaddrPat <FLAT_Pseudo inst, SDPatternOperator node, ValueType vt> : GCNPat <
		(vt (node (ScratchSAddr (i32 SGPR_32:$saddr), i16:$offset))),
		(inst $saddr, $offset)
		>;

		class ScratchLoadSaddrPat_D16 <FLAT_Pseudo inst, SDPatternOperator node, ValueType vt> : GCNPat <
		(vt (node (ScratchSAddr (i32 SGPR_32:$saddr), i16:$offset), vt:$in)),
		(inst $saddr, $offset, 0, 0, 0, $in)
		>;

		class ScratchStoreSaddrPat <FLAT_Pseudo inst, SDPatternOperator node,
		ValueType vt> : GCNPat <
		(node vt:$data, (ScratchSAddr (i32 SGPR_32:$saddr), i16:$offset)),
		(inst getVregSrcForVT<vt>.ret:$data, $saddr, $offset)
		>;

let OtherPredicates = [HasFlatAddressSpace] in {		let OtherPredicates = [HasFlatAddressSpace] in {

def : FlatLoadPat <FLAT_LOAD_UBYTE, extloadi8_flat, i32>;		def : FlatLoadPat <FLAT_LOAD_UBYTE, extloadi8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, zextloadi8_flat, i32>;		def : FlatLoadPat <FLAT_LOAD_UBYTE, zextloadi8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_SBYTE, sextloadi8_flat, i32>;		def : FlatLoadPat <FLAT_LOAD_SBYTE, sextloadi8_flat, i32>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, extloadi8_flat, i16>;		def : FlatLoadPat <FLAT_LOAD_UBYTE, extloadi8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_UBYTE, zextloadi8_flat, i16>;		def : FlatLoadPat <FLAT_LOAD_UBYTE, zextloadi8_flat, i16>;
def : FlatLoadPat <FLAT_LOAD_SBYTE, sextloadi8_flat, i16>;		def : FlatLoadPat <FLAT_LOAD_SBYTE, sextloadi8_flat, i16>;
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	def : FlatSignedAtomicPatNoRtn <inst, node, vt> {
let AddedComplexity = 10;		let AddedComplexity = 10;
}		}

def : GlobalAtomicNoRtnSaddrPat<!cast<FLAT_Pseudo>(!cast<string>(inst)#"_SADDR"), node, vt> {		def : GlobalAtomicNoRtnSaddrPat<!cast<FLAT_Pseudo>(!cast<string>(inst)#"_SADDR"), node, vt> {
let AddedComplexity = 11;		let AddedComplexity = 11;
}		}
}		}

		multiclass ScratchFLATLoadPats<FLAT_Pseudo inst, SDPatternOperator node, ValueType vt> {
		def : ScratchLoadSignedPat <inst, node, vt> {
		let AddedComplexity = 25;
		}

		def : ScratchLoadSaddrPat<!cast<FLAT_Pseudo>(!cast<string>(inst)#"_SADDR"), node, vt> {
		let AddedComplexity = 26;
		}
		}

		multiclass ScratchFLATStorePats<FLAT_Pseudo inst, SDPatternOperator node,
		ValueType vt> {
		def : ScratchStoreSignedPat <inst, node, vt> {
		let AddedComplexity = 25;
		}

		def : ScratchStoreSaddrPat<!cast<FLAT_Pseudo>(!cast<string>(inst)#"_SADDR"), node, vt> {
		let AddedComplexity = 26;
		}
		}

		multiclass ScratchFLATLoadPats_D16<FLAT_Pseudo inst, SDPatternOperator node, ValueType vt> {
		def : ScratchLoadSignedPat_D16 <inst, node, vt> {
		let AddedComplexity = 25;
		}

		def : ScratchLoadSaddrPat_D16<!cast<FLAT_Pseudo>(!cast<string>(inst)#"_SADDR"), node, vt> {
		let AddedComplexity = 26;
		}
		}

let OtherPredicates = [HasFlatGlobalInsts] in {		let OtherPredicates = [HasFlatGlobalInsts] in {

defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, extloadi8_global, i32>;		defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, extloadi8_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, zextloadi8_global, i32>;		defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, zextloadi8_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_SBYTE, sextloadi8_global, i32>;		defm : GlobalFLATLoadPats <GLOBAL_LOAD_SBYTE, sextloadi8_global, i32>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, extloadi8_global, i16>;		defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, extloadi8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, zextloadi8_global, i16>;		defm : GlobalFLATLoadPats <GLOBAL_LOAD_UBYTE, zextloadi8_global, i16>;
defm : GlobalFLATLoadPats <GLOBAL_LOAD_SBYTE, sextloadi8_global, i16>;		defm : GlobalFLATLoadPats <GLOBAL_LOAD_SBYTE, sextloadi8_global, i16>;
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines

let OtherPredicates = [HasAtomicFaddInsts] in {		let OtherPredicates = [HasAtomicFaddInsts] in {
defm : GlobalFLATNoRtnAtomicPats <GLOBAL_ATOMIC_ADD_F32, atomic_load_fadd_global_noret_32, f32>;		defm : GlobalFLATNoRtnAtomicPats <GLOBAL_ATOMIC_ADD_F32, atomic_load_fadd_global_noret_32, f32>;
defm : GlobalFLATNoRtnAtomicPats <GLOBAL_ATOMIC_PK_ADD_F16, atomic_load_fadd_v2f16_global_noret_32, v2f16>;		defm : GlobalFLATNoRtnAtomicPats <GLOBAL_ATOMIC_PK_ADD_F16, atomic_load_fadd_v2f16_global_noret_32, v2f16>;
}		}

} // End OtherPredicates = [HasFlatGlobalInsts], AddedComplexity = 10		} // End OtherPredicates = [HasFlatGlobalInsts], AddedComplexity = 10

		let OtherPredicates = [HasFlatScratchInsts, EnableFlatScratch] in {

		defm : ScratchFLATLoadPats <SCRATCH_LOAD_UBYTE, extloadi8_private, i32>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_UBYTE, zextloadi8_private, i32>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_SBYTE, sextloadi8_private, i32>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_UBYTE, extloadi8_private, i16>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_UBYTE, zextloadi8_private, i16>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_SBYTE, sextloadi8_private, i16>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_USHORT, extloadi16_private, i32>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_USHORT, zextloadi16_private, i32>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_SSHORT, sextloadi16_private, i32>;
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_USHORT, load_private, i16>;

		foreach vt = Reg32Types.types in {
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_DWORD, load_private, vt>;
		defm : ScratchFLATStorePats <SCRATCH_STORE_DWORD, store_private, vt>;
		}

		foreach vt = VReg_64.RegTypes in {
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_DWORDX2, load_private, vt>;
		defm : ScratchFLATStorePats <SCRATCH_STORE_DWORDX2, store_private, vt>;
		}

		defm : ScratchFLATLoadPats <SCRATCH_LOAD_DWORDX3, load_private, v3i32>;

		foreach vt = VReg_128.RegTypes in {
		defm : ScratchFLATLoadPats <SCRATCH_LOAD_DWORDX4, load_private, vt>;
		defm : ScratchFLATStorePats <SCRATCH_STORE_DWORDX4, store_private, vt>;
		}

		defm : ScratchFLATStorePats <SCRATCH_STORE_BYTE, truncstorei8_private, i32>;
		defm : ScratchFLATStorePats <SCRATCH_STORE_BYTE, truncstorei8_private, i16>;
		defm : ScratchFLATStorePats <SCRATCH_STORE_SHORT, truncstorei16_private, i32>;
		defm : ScratchFLATStorePats <SCRATCH_STORE_SHORT, store_private, i16>;
		defm : ScratchFLATStorePats <SCRATCH_STORE_DWORDX3, store_private, v3i32>;

		let OtherPredicates = [D16PreservesUnusedBits, HasFlatScratchInsts, EnableFlatScratch] in {
		defm : ScratchFLATStorePats <SCRATCH_STORE_SHORT_D16_HI, truncstorei16_hi16_private, i32>;
		defm : ScratchFLATStorePats <SCRATCH_STORE_BYTE_D16_HI, truncstorei8_hi16_private, i32>;

		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_UBYTE_D16_HI, az_extloadi8_d16_hi_private, v2i16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_UBYTE_D16_HI, az_extloadi8_d16_hi_private, v2f16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_SBYTE_D16_HI, sextloadi8_d16_hi_private, v2i16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_SBYTE_D16_HI, sextloadi8_d16_hi_private, v2f16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_SHORT_D16_HI, load_d16_hi_private, v2i16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_SHORT_D16_HI, load_d16_hi_private, v2f16>;

		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_UBYTE_D16, az_extloadi8_d16_lo_private, v2i16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_UBYTE_D16, az_extloadi8_d16_lo_private, v2f16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_SBYTE_D16, sextloadi8_d16_lo_private, v2i16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_SBYTE_D16, sextloadi8_d16_lo_private, v2f16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_SHORT_D16, load_d16_lo_private, v2i16>;
		defm : ScratchFLATLoadPats_D16 <SCRATCH_LOAD_SHORT_D16, load_d16_lo_private, v2f16>;
		}

		} // End OtherPredicates = [HasFlatScratchInsts,EnableFlatScratch]

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target		// Target
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// CI		// CI
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 434 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

	Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines

	// TODO: Add heuristic that the frame index might not fit in the addressing mode			// TODO: Add heuristic that the frame index might not fit in the addressing mode
	// immediate offset to avoid materializing in loops.			// immediate offset to avoid materializing in loops.
	static bool frameIndexMayFold(const SIInstrInfo *TII,			static bool frameIndexMayFold(const SIInstrInfo *TII,
	const MachineInstr &UseMI,			const MachineInstr &UseMI,
	int OpNo,			int OpNo,
	const MachineOperand &OpToFold) {			const MachineOperand &OpToFold) {
	return OpToFold.isFI() &&			return OpToFold.isFI() &&
	(TII->isMUBUF(UseMI) \|\| TII->isFLATScratch(UseMI)) &&			TII->isMUBUF(UseMI) &&
	OpNo == AMDGPU::getNamedOperandIdx(UseMI.getOpcode(), AMDGPU::OpName::vaddr);			OpNo == AMDGPU::getNamedOperandIdx(UseMI.getOpcode(), AMDGPU::OpName::vaddr);
	}			}

	FunctionPass *llvm::createSIFoldOperandsPass() {			FunctionPass *llvm::createSIFoldOperandsPass() {
	return new SIFoldOperands();			return new SIFoldOperands();
	}			}

	static bool updateOperand(FoldCandidate &Fold,			static bool updateOperand(FoldCandidate &Fold,
	▲ Show 20 Lines • Show All 1,390 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	if (!TempSGPR) {
LLVM_DEBUG(dbgs() << "Saving " << (IsFP ? "FP" : "BP") << " with copy to "		LLVM_DEBUG(dbgs() << "Saving " << (IsFP ? "FP" : "BP") << " with copy to "
<< printReg(TempSGPR, TRI) << '\n');		<< printReg(TempSGPR, TRI) << '\n');
}		}
}		}

// We need to specially emit stack operations here because a different frame		// We need to specially emit stack operations here because a different frame
// register is used than in the rest of the function, as getFrameRegister would		// register is used than in the rest of the function, as getFrameRegister would
// use.		// use.
static void buildPrologSpill(LivePhysRegs &LiveRegs, MachineBasicBlock &MBB,		static void buildPrologSpill(const GCNSubtarget &ST, LivePhysRegs &LiveRegs,
		MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
const SIInstrInfo *TII, Register SpillReg,		const SIInstrInfo *TII, Register SpillReg,
Register ScratchRsrcReg, Register SPReg, int FI) {		Register ScratchRsrcReg, Register SPReg, int FI) {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
MachineFrameInfo &MFI = MF->getFrameInfo();		MachineFrameInfo &MFI = MF->getFrameInfo();

int64_t Offset = MFI.getObjectOffset(FI);		int64_t Offset = MFI.getObjectOffset(FI);

MachineMemOperand *MMO = MF->getMachineMemOperand(		MachineMemOperand *MMO = MF->getMachineMemOperand(
MachinePointerInfo::getFixedStack(*MF, FI), MachineMemOperand::MOStore, 4,		MachinePointerInfo::getFixedStack(*MF, FI), MachineMemOperand::MOStore, 4,
MFI.getObjectAlign(FI));		MFI.getObjectAlign(FI));

if (SIInstrInfo::isLegalMUBUFImmOffset(Offset)) {		if (ST.enableFlatScratch()) {
		if (TII->isLegalFLATOffset(Offset, AMDGPUAS::PRIVATE_ADDRESS, true)) {
		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::SCRATCH_STORE_DWORD_SADDR))
		.addReg(SpillReg, RegState::Kill)
		.addReg(SPReg)
		.addImm(Offset)
		.addImm(0) // glc
		.addImm(0) // slc
		.addImm(0) // dlc
		.addMemOperand(MMO);
		return;
		}
		} else if (SIInstrInfo::isLegalMUBUFImmOffset(Offset)) {
BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::BUFFER_STORE_DWORD_OFFSET))		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::BUFFER_STORE_DWORD_OFFSET))
.addReg(SpillReg, RegState::Kill)		.addReg(SpillReg, RegState::Kill)
.addReg(ScratchRsrcReg)		.addReg(ScratchRsrcReg)
.addReg(SPReg)		.addReg(SPReg)
.addImm(Offset)		.addImm(Offset)
.addImm(0) // glc		.addImm(0) // glc
.addImm(0) // slc		.addImm(0) // slc
.addImm(0) // tfe		.addImm(0) // tfe
.addImm(0) // dlc		.addImm(0) // dlc
.addImm(0) // swz		.addImm(0) // swz
.addMemOperand(MMO);		.addMemOperand(MMO);
return;		return;
}		}

// Don't clobber the TmpVGPR if we also need a scratch reg for the stack		// Don't clobber the TmpVGPR if we also need a scratch reg for the stack
// offset in the spill.		// offset in the spill.
LiveRegs.addReg(SpillReg);		LiveRegs.addReg(SpillReg);

		if (ST.enableFlatScratch()) {
		MCPhysReg OffsetReg = findScratchNonCalleeSaveRegister(
		MF->getRegInfo(), LiveRegs, AMDGPU::SReg_32_XM0RegClass);

		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_ADD_U32), OffsetReg)
		.addReg(SPReg)
		.addImm(Offset);

		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::SCRATCH_STORE_DWORD_SADDR))
		.addReg(SpillReg, RegState::Kill)
		.addReg(OffsetReg, RegState::Kill)
		.addImm(0)
		.addImm(0) // glc
		.addImm(0) // slc
		.addImm(0) // dlc
		.addMemOperand(MMO);
		} else {
MCPhysReg OffsetReg = findScratchNonCalleeSaveRegister(		MCPhysReg OffsetReg = findScratchNonCalleeSaveRegister(
MF->getRegInfo(), LiveRegs, AMDGPU::VGPR_32RegClass);		MF->getRegInfo(), LiveRegs, AMDGPU::VGPR_32RegClass);

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::V_MOV_B32_e32), OffsetReg)		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::V_MOV_B32_e32), OffsetReg)
.addImm(Offset);		.addImm(Offset);

BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::BUFFER_STORE_DWORD_OFFEN))		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::BUFFER_STORE_DWORD_OFFEN))
.addReg(SpillReg, RegState::Kill)		.addReg(SpillReg, RegState::Kill)
.addReg(OffsetReg, RegState::Kill)		.addReg(OffsetReg, RegState::Kill)
.addReg(ScratchRsrcReg)		.addReg(ScratchRsrcReg)
.addReg(SPReg)		.addReg(SPReg)
.addImm(0)		.addImm(0)
.addImm(0) // glc		.addImm(0) // glc
.addImm(0) // slc		.addImm(0) // slc
.addImm(0) // tfe		.addImm(0) // tfe
.addImm(0) // dlc		.addImm(0) // dlc
.addImm(0) // swz		.addImm(0) // swz
.addMemOperand(MMO);		.addMemOperand(MMO);
		}

LiveRegs.removeReg(SpillReg);		LiveRegs.removeReg(SpillReg);
}		}

static void buildEpilogReload(LivePhysRegs &LiveRegs, MachineBasicBlock &MBB,		static void buildEpilogReload(const GCNSubtarget &ST, LivePhysRegs &LiveRegs,
		MachineBasicBlock &MBB,
MachineBasicBlock::iterator I,		MachineBasicBlock::iterator I,
const SIInstrInfo *TII, Register SpillReg,		const SIInstrInfo *TII, Register SpillReg,
Register ScratchRsrcReg, Register SPReg, int FI) {		Register ScratchRsrcReg, Register SPReg, int FI) {
MachineFunction *MF = MBB.getParent();		MachineFunction *MF = MBB.getParent();
MachineFrameInfo &MFI = MF->getFrameInfo();		MachineFrameInfo &MFI = MF->getFrameInfo();
int64_t Offset = MFI.getObjectOffset(FI);		int64_t Offset = MFI.getObjectOffset(FI);

MachineMemOperand *MMO = MF->getMachineMemOperand(		MachineMemOperand *MMO = MF->getMachineMemOperand(
MachinePointerInfo::getFixedStack(*MF, FI), MachineMemOperand::MOLoad, 4,		MachinePointerInfo::getFixedStack(*MF, FI), MachineMemOperand::MOLoad, 4,
MFI.getObjectAlign(FI));		MFI.getObjectAlign(FI));

		if (ST.enableFlatScratch()) {
		if (TII->isLegalFLATOffset(Offset, AMDGPUAS::PRIVATE_ADDRESS, true)) {
		BuildMI(MBB, I, DebugLoc(),
		TII->get(AMDGPU::SCRATCH_LOAD_DWORD_SADDR), SpillReg)
		.addReg(SPReg)
		.addImm(Offset)
		.addImm(0) // glc
		.addImm(0) // slc
		.addImm(0) // dlc
		.addMemOperand(MMO);
		return;
		}
		MCPhysReg OffsetReg = findScratchNonCalleeSaveRegister(
		MF->getRegInfo(), LiveRegs, AMDGPU::SReg_32_XM0RegClass);

		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::S_ADD_U32), OffsetReg)
		.addReg(SPReg)
		.addImm(Offset);
		BuildMI(MBB, I, DebugLoc(), TII->get(AMDGPU::SCRATCH_LOAD_DWORD_SADDR),
		SpillReg)
		.addReg(OffsetReg, RegState::Kill)
		.addImm(0)
		.addImm(0) // glc
		.addImm(0) // slc
		.addImm(0) // dlc
		.addMemOperand(MMO);
		return;
		}

if (SIInstrInfo::isLegalMUBUFImmOffset(Offset)) {		if (SIInstrInfo::isLegalMUBUFImmOffset(Offset)) {
BuildMI(MBB, I, DebugLoc(),		BuildMI(MBB, I, DebugLoc(),
TII->get(AMDGPU::BUFFER_LOAD_DWORD_OFFSET), SpillReg)		TII->get(AMDGPU::BUFFER_LOAD_DWORD_OFFSET), SpillReg)
.addReg(ScratchRsrcReg)		.addReg(ScratchRsrcReg)
.addReg(SPReg)		.addReg(SPReg)
.addImm(Offset)		.addImm(Offset)
.addImm(0) // glc		.addImm(0) // glc
.addImm(0) // slc		.addImm(0) // slc
Show All 40 Lines	void SIFrameLowering::emitEntryFunctionFlatScratchInit(
// this from the input registers.		// this from the input registers.
//		//
// TODO: We only need to know if we access scratch space through a flat		// TODO: We only need to know if we access scratch space through a flat
// pointer. Because we only detect if flat instructions are used at all,		// pointer. Because we only detect if flat instructions are used at all,
// this will be used more often than necessary on VI.		// this will be used more often than necessary on VI.

Register FlatScratchInitReg =		Register FlatScratchInitReg =
MFI->getPreloadedReg(AMDGPUFunctionArgInfo::FLAT_SCRATCH_INIT);		MFI->getPreloadedReg(AMDGPUFunctionArgInfo::FLAT_SCRATCH_INIT);
		assert(FlatScratchInitReg);

MachineRegisterInfo &MRI = MF.getRegInfo();		MachineRegisterInfo &MRI = MF.getRegInfo();
MRI.addLiveIn(FlatScratchInitReg);		MRI.addLiveIn(FlatScratchInitReg);
MBB.addLiveIn(FlatScratchInitReg);		MBB.addLiveIn(FlatScratchInitReg);

Register FlatScrInitLo = TRI->getSubReg(FlatScratchInitReg, AMDGPU::sub0);		Register FlatScrInitLo = TRI->getSubReg(FlatScratchInitReg, AMDGPU::sub0);
Register FlatScrInitHi = TRI->getSubReg(FlatScratchInitReg, AMDGPU::sub1);		Register FlatScrInitHi = TRI->getSubReg(FlatScratchInitReg, AMDGPU::sub1);

▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (!MRI.isPhysRegUsed(Reg) && MRI.isAllocatable(Reg) &&
MFI->setScratchRSrcReg(Reg);		MFI->setScratchRSrcReg(Reg);
return Reg;		return Reg;
}		}
}		}

return ScratchRsrcReg;		return ScratchRsrcReg;
}		}

		static unsigned getScratchScaleFactor(const GCNSubtarget &ST) {
		scott.linderUnsubmitted Done Reply Inline Actions s/Scracth/Scratch/ scott.linder: s/Scracth/Scratch/
		return ST.enableFlatScratch() ? 1 : ST.getWavefrontSize();
		}

void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,		void SIFrameLowering::emitEntryFunctionPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");		assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");

// FIXME: If we only have SGPR spills, we won't actually be using scratch		// FIXME: If we only have SGPR spills, we won't actually be using scratch
// memory since these spill to VGPRs. We should be cleaning up these unused		// memory since these spill to VGPRs. We should be cleaning up these unused
// SGPR spill frame indices somewhere.		// SGPR spill frame indices somewhere.

▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	if (TRI->isSubRegisterEq(ScratchRsrcReg, PreloadedScratchWaveOffsetReg)) {
ScratchWaveOffsetReg = PreloadedScratchWaveOffsetReg;		ScratchWaveOffsetReg = PreloadedScratchWaveOffsetReg;
}		}
assert(ScratchWaveOffsetReg);		assert(ScratchWaveOffsetReg);

if (requiresStackPointerReference(MF)) {		if (requiresStackPointerReference(MF)) {
Register SPReg = MFI->getStackPtrOffsetReg();		Register SPReg = MFI->getStackPtrOffsetReg();
assert(SPReg != AMDGPU::SP_REG);		assert(SPReg != AMDGPU::SP_REG);
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), SPReg)		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), SPReg)
.addImm(MF.getFrameInfo().getStackSize() * ST.getWavefrontSize());		.addImm(MF.getFrameInfo().getStackSize() * getScratchScaleFactor(ST));
}		}

if (hasFP(MF)) {		if (hasFP(MF)) {
Register FPReg = MFI->getFrameOffsetReg();		Register FPReg = MFI->getFrameOffsetReg();
assert(FPReg != AMDGPU::FP_REG);		assert(FPReg != AMDGPU::FP_REG);
BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), FPReg).addImm(0);		BuildMI(MBB, I, DL, TII->get(AMDGPU::S_MOV_B32), FPReg).addImm(0);
}		}

if (MFI->hasFlatScratchInit() \|\| ScratchRsrcReg) {		if (MFI->hasFlatScratchInit() \|\| ScratchRsrcReg) {
MRI.addLiveIn(PreloadedScratchWaveOffsetReg);		MRI.addLiveIn(PreloadedScratchWaveOffsetReg);
MBB.addLiveIn(PreloadedScratchWaveOffsetReg);		MBB.addLiveIn(PreloadedScratchWaveOffsetReg);
}		}

if (MFI->hasFlatScratchInit()) {		if (MFI->hasFlatScratchInit()) {
		FlakebiUnsubmitted Done Reply Inline Actions Should this be `MFI->hasFlatScratchInit() \|\| (ST.enableFlatScratch() && requiresStackPointerReference(MF))`? Otherwise, the scratch does not get initialized (I guess it’s fine to do that in a later patch). Flakebi: Should this be `MFI->hasFlatScratchInit() \|\| (ST.enableFlatScratch() &&…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Do you see it not initialized? In the SIMachineFunctionInfo() there is this code: if (ST.hasFlatAddressSpace() && isEntryFunction() && isAmdHsaOrMesa) { // TODO: This could be refined a lot. The attribute is a poor way of // detecting calls or stack objects that may require it before argument // lowering. if (HasCalls \|\| HasStackObjects) FlatScratchInit = true; } So I assume it has to be initialized. Probably the culprit is this isAmdHsaOrMesa condition? It may be needed to say (isAmdHsaOrMesa \|\| ST.enableFlatScratch()) instead. For some reason this code is not executed for amdpal, I do not see an obvious reason why. rampitec: Do you see it not initialized? In the SIMachineFunctionInfo() there is this code: ``` if…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Actually I see it uninitialized in my own test. But the code as suggested does not always work because requiresStackPointerReference() is not necessarily true if we have just private loads, we need to make sure hasFlatScratchInit() is set. rampitec: Actually I see it uninitialized in my own test. But the code as suggested does not always work…
emitEntryFunctionFlatScratchInit(MF, MBB, I, DL, ScratchWaveOffsetReg);		emitEntryFunctionFlatScratchInit(MF, MBB, I, DL, ScratchWaveOffsetReg);
}		}

if (ScratchRsrcReg) {		if (ScratchRsrcReg) {
emitEntryFunctionScratchRsrcRegSetup(MF, MBB, I, DL,		emitEntryFunctionScratchRsrcRegSetup(MF, MBB, I, DL,
PreloadedScratchRsrcReg,		PreloadedScratchRsrcReg,
ScratchRsrcReg, ScratchWaveOffsetReg);		ScratchRsrcReg, ScratchWaveOffsetReg);
}		}
▲ Show 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	void SIFrameLowering::emitPrologue(MachineFunction &MF,
for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg		for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg
: FuncInfo->getSGPRSpillVGPRs()) {		: FuncInfo->getSGPRSpillVGPRs()) {
if (!Reg.FI.hasValue())		if (!Reg.FI.hasValue())
continue;		continue;

if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);

buildPrologSpill(LiveRegs, MBB, MBBI, TII, Reg.VGPR,		buildPrologSpill(ST, LiveRegs, MBB, MBBI, TII, Reg.VGPR,
FuncInfo->getScratchRSrcReg(),		FuncInfo->getScratchRSrcReg(),
StackPtrReg,		StackPtrReg,
Reg.FI.getValue());		Reg.FI.getValue());
}		}

if (HasFPSaveIndex && SpillFPToMemory) {		if (HasFPSaveIndex && SpillFPToMemory) {
assert(!MFI.isDeadObjectIndex(FuncInfo->FramePointerSaveIndex.getValue()));		assert(!MFI.isDeadObjectIndex(FuncInfo->FramePointerSaveIndex.getValue()));

if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);

MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(		MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::VGPR_32RegClass);		MRI, LiveRegs, AMDGPU::VGPR_32RegClass);

BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpVGPR)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpVGPR)
.addReg(FramePtrReg);		.addReg(FramePtrReg);

buildPrologSpill(LiveRegs, MBB, MBBI, TII, TmpVGPR,		buildPrologSpill(ST, LiveRegs, MBB, MBBI, TII, TmpVGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg,		FuncInfo->getScratchRSrcReg(), StackPtrReg,
FuncInfo->FramePointerSaveIndex.getValue());		FuncInfo->FramePointerSaveIndex.getValue());
}		}

if (HasBPSaveIndex && SpillBPToMemory) {		if (HasBPSaveIndex && SpillBPToMemory) {
assert(!MFI.isDeadObjectIndex(*FuncInfo->BasePointerSaveIndex));		assert(!MFI.isDeadObjectIndex(*FuncInfo->BasePointerSaveIndex));

if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, true);

MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(		MCPhysReg TmpVGPR = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::VGPR_32RegClass);		MRI, LiveRegs, AMDGPU::VGPR_32RegClass);

BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpVGPR)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpVGPR)
.addReg(BasePtrReg);		.addReg(BasePtrReg);

buildPrologSpill(LiveRegs, MBB, MBBI, TII, TmpVGPR,		buildPrologSpill(ST, LiveRegs, MBB, MBBI, TII, TmpVGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg,		FuncInfo->getScratchRSrcReg(), StackPtrReg,
*FuncInfo->BasePointerSaveIndex);		*FuncInfo->BasePointerSaveIndex);
}		}

if (ScratchExecCopy) {		if (ScratchExecCopy) {
// FIXME: Split block and make terminator.		// FIXME: Split block and make terminator.
unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;		unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;		MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	Register ScratchSPReg = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::SReg_32_XM0RegClass);		MRI, LiveRegs, AMDGPU::SReg_32_XM0RegClass);
assert(ScratchSPReg && ScratchSPReg != FuncInfo->SGPRForFPSaveRestoreCopy &&		assert(ScratchSPReg && ScratchSPReg != FuncInfo->SGPRForFPSaveRestoreCopy &&
ScratchSPReg != FuncInfo->SGPRForBPSaveRestoreCopy);		ScratchSPReg != FuncInfo->SGPRForBPSaveRestoreCopy);

// s_add_u32 tmp_reg, s32, NumBytes		// s_add_u32 tmp_reg, s32, NumBytes
// s_and_b32 s32, tmp_reg, 0b111...0000		// s_and_b32 s32, tmp_reg, 0b111...0000
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_U32), ScratchSPReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_U32), ScratchSPReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.addImm((Alignment - 1) * ST.getWavefrontSize())		.addImm((Alignment - 1) * getScratchScaleFactor(ST))
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_AND_B32), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_AND_B32), FramePtrReg)
.addReg(ScratchSPReg, RegState::Kill)		.addReg(ScratchSPReg, RegState::Kill)
.addImm(-Alignment * ST.getWavefrontSize())		.addImm(-Alignment * getScratchScaleFactor(ST))
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
FuncInfo->setIsStackRealigned(true);		FuncInfo->setIsStackRealigned(true);
} else if ((HasFP = hasFP(MF))) {		} else if ((HasFP = hasFP(MF))) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

// If we need a base pointer, set it up here. It's whatever the value of		// If we need a base pointer, set it up here. It's whatever the value of
// the stack pointer is at this point. Any variable size objects will be		// the stack pointer is at this point. Any variable size objects will be
// allocated after this, so we can still use the base pointer to reference		// allocated after this, so we can still use the base pointer to reference
// the incoming arguments.		// the incoming arguments.
if ((HasBP = TRI.hasBasePointer(MF))) {		if ((HasBP = TRI.hasBasePointer(MF))) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), BasePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), BasePtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

if (HasFP && RoundedSize != 0) {		if (HasFP && RoundedSize != 0) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_U32), StackPtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_ADD_U32), StackPtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.addImm(RoundedSize * ST.getWavefrontSize())		.addImm(RoundedSize * getScratchScaleFactor(ST))
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}

assert((!HasFP \|\| (FuncInfo->SGPRForFPSaveRestoreCopy \|\|		assert((!HasFP \|\| (FuncInfo->SGPRForFPSaveRestoreCopy \|\|
FuncInfo->FramePointerSaveIndex)) &&		FuncInfo->FramePointerSaveIndex)) &&
"Needed to save FP but didn't save it anywhere");		"Needed to save FP but didn't save it anywhere");

assert((HasFP \|\| (!FuncInfo->SGPRForFPSaveRestoreCopy &&		assert((HasFP \|\| (!FuncInfo->SGPRForFPSaveRestoreCopy &&
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	void SIFrameLowering::emitEpilogue(MachineFunction &MF,
if (HasBPSaveIndex) {		if (HasBPSaveIndex) {
SpillBPToMemory = MFI.getStackID(*FuncInfo->BasePointerSaveIndex) !=		SpillBPToMemory = MFI.getStackID(*FuncInfo->BasePointerSaveIndex) !=
TargetStackID::SGPRSpill;		TargetStackID::SGPRSpill;
}		}

if (RoundedSize != 0 && hasFP(MF)) {		if (RoundedSize != 0 && hasFP(MF)) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_SUB_U32), StackPtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::S_SUB_U32), StackPtrReg)
.addReg(StackPtrReg)		.addReg(StackPtrReg)
.addImm(RoundedSize * ST.getWavefrontSize())		.addImm(RoundedSize * getScratchScaleFactor(ST))
.setMIFlag(MachineInstr::FrameDestroy);		.setMIFlag(MachineInstr::FrameDestroy);
}		}

if (FuncInfo->SGPRForFPSaveRestoreCopy) {		if (FuncInfo->SGPRForFPSaveRestoreCopy) {
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::COPY), FramePtrReg)
.addReg(FuncInfo->SGPRForFPSaveRestoreCopy)		.addReg(FuncInfo->SGPRForFPSaveRestoreCopy)
.setMIFlag(MachineInstr::FrameSetup);		.setMIFlag(MachineInstr::FrameSetup);
}		}
Show All 9 Lines	if (HasFPSaveIndex) {
const int FI = FuncInfo->FramePointerSaveIndex.getValue();		const int FI = FuncInfo->FramePointerSaveIndex.getValue();
assert(!MFI.isDeadObjectIndex(FI));		assert(!MFI.isDeadObjectIndex(FI));
if (SpillFPToMemory) {		if (SpillFPToMemory) {
if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);

MCPhysReg TempVGPR = findScratchNonCalleeSaveRegister(		MCPhysReg TempVGPR = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::VGPR_32RegClass);		MRI, LiveRegs, AMDGPU::VGPR_32RegClass);
buildEpilogReload(LiveRegs, MBB, MBBI, TII, TempVGPR,		buildEpilogReload(ST, LiveRegs, MBB, MBBI, TII, TempVGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg, FI);		FuncInfo->getScratchRSrcReg(), StackPtrReg, FI);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), FramePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), FramePtrReg)
.addReg(TempVGPR, RegState::Kill);		.addReg(TempVGPR, RegState::Kill);
} else {		} else {
// Reload from VGPR spill.		// Reload from VGPR spill.
assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(FI) == TargetStackID::SGPRSpill);
ArrayRef<SIMachineFunctionInfo::SpilledReg> Spill =		ArrayRef<SIMachineFunctionInfo::SpilledReg> Spill =
FuncInfo->getSGPRToVGPRSpills(FI);		FuncInfo->getSGPRToVGPRSpills(FI);
Show All 9 Lines	if (HasBPSaveIndex) {
const int BasePtrFI = *FuncInfo->BasePointerSaveIndex;		const int BasePtrFI = *FuncInfo->BasePointerSaveIndex;
assert(!MFI.isDeadObjectIndex(BasePtrFI));		assert(!MFI.isDeadObjectIndex(BasePtrFI));
if (SpillBPToMemory) {		if (SpillBPToMemory) {
if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);

MCPhysReg TempVGPR = findScratchNonCalleeSaveRegister(		MCPhysReg TempVGPR = findScratchNonCalleeSaveRegister(
MRI, LiveRegs, AMDGPU::VGPR_32RegClass);		MRI, LiveRegs, AMDGPU::VGPR_32RegClass);
buildEpilogReload(LiveRegs, MBB, MBBI, TII, TempVGPR,		buildEpilogReload(ST, LiveRegs, MBB, MBBI, TII, TempVGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg, BasePtrFI);		FuncInfo->getScratchRSrcReg(), StackPtrReg, BasePtrFI);
BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), BasePtrReg)		BuildMI(MBB, MBBI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32), BasePtrReg)
.addReg(TempVGPR, RegState::Kill);		.addReg(TempVGPR, RegState::Kill);
} else {		} else {
// Reload from VGPR spill.		// Reload from VGPR spill.
assert(MFI.getStackID(BasePtrFI) == TargetStackID::SGPRSpill);		assert(MFI.getStackID(BasePtrFI) == TargetStackID::SGPRSpill);
ArrayRef<SIMachineFunctionInfo::SpilledReg> Spill =		ArrayRef<SIMachineFunctionInfo::SpilledReg> Spill =
FuncInfo->getSGPRToVGPRSpills(BasePtrFI);		FuncInfo->getSGPRToVGPRSpills(BasePtrFI);
assert(Spill.size() == 1);		assert(Spill.size() == 1);
BuildMI(MBB, MBBI, DL, TII->getMCOpcodeFromPseudo(AMDGPU::V_READLANE_B32),		BuildMI(MBB, MBBI, DL, TII->getMCOpcodeFromPseudo(AMDGPU::V_READLANE_B32),
BasePtrReg)		BasePtrReg)
.addReg(Spill[0].VGPR)		.addReg(Spill[0].VGPR)
.addImm(Spill[0].Lane);		.addImm(Spill[0].Lane);
}		}
}		}

for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg :		for (const SIMachineFunctionInfo::SGPRSpillVGPRCSR &Reg :
FuncInfo->getSGPRSpillVGPRs()) {		FuncInfo->getSGPRSpillVGPRs()) {
if (!Reg.FI.hasValue())		if (!Reg.FI.hasValue())
continue;		continue;

if (!ScratchExecCopy)		if (!ScratchExecCopy)
ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);		ScratchExecCopy = buildScratchExecCopy(LiveRegs, MF, MBB, MBBI, false);

buildEpilogReload(LiveRegs, MBB, MBBI, TII, Reg.VGPR,		buildEpilogReload(ST, LiveRegs, MBB, MBBI, TII, Reg.VGPR,
FuncInfo->getScratchRSrcReg(), StackPtrReg,		FuncInfo->getScratchRSrcReg(), StackPtrReg,
Reg.FI.getValue());		Reg.FI.getValue());
}		}

if (ScratchExecCopy) {		if (ScratchExecCopy) {
// FIXME: Split block and make terminator.		// FIXME: Split block and make terminator.
unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;		unsigned ExecMov = ST.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;		MCRegister Exec = ST.isWave32() ? AMDGPU::EXEC_LO : AMDGPU::EXEC;
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	if (!hasReservedCallFrame(MF)) {
Amount = alignTo(Amount, getStackAlign());		Amount = alignTo(Amount, getStackAlign());
assert(isUInt<32>(Amount) && "exceeded stack address space size");		assert(isUInt<32>(Amount) && "exceeded stack address space size");
const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
Register SPReg = MFI->getStackPtrOffsetReg();		Register SPReg = MFI->getStackPtrOffsetReg();

unsigned Op = IsDestroy ? AMDGPU::S_SUB_U32 : AMDGPU::S_ADD_U32;		unsigned Op = IsDestroy ? AMDGPU::S_SUB_U32 : AMDGPU::S_ADD_U32;
BuildMI(MBB, I, DL, TII->get(Op), SPReg)		BuildMI(MBB, I, DL, TII->get(Op), SPReg)
.addReg(SPReg)		.addReg(SPReg)
.addImm(Amount * ST.getWavefrontSize());		.addImm(Amount * getScratchScaleFactor(ST));
} else if (CalleePopAmount != 0) {		} else if (CalleePopAmount != 0) {
llvm_unreachable("is this used?");		llvm_unreachable("is this used?");
}		}

return MBB.erase(I);		return MBB.erase(I);
}		}

/// Returns true if the frame will require a reference to the stack pointer.		/// Returns true if the frame will require a reference to the stack pointer.
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,292 Lines • ▼ Show 20 Lines	if (CallConv == CallingConv::AMDGPU_PS) {
((PsInputBits & 0xF) == 0 &&		((PsInputBits & 0xF) == 0 &&
(PsInputBits >> 11 & 1)))		(PsInputBits >> 11 & 1)))
Info->markPSInputEnabled(		Info->markPSInputEnabled(
countTrailingZeros(Info->getPSInputAddr(), ZB_Undefined));		countTrailingZeros(Info->getPSInputAddr(), ZB_Undefined));
}		}
}		}

assert(!Info->hasDispatchPtr() &&		assert(!Info->hasDispatchPtr() &&
!Info->hasKernargSegmentPtr() && !Info->hasFlatScratchInit() &&		!Info->hasKernargSegmentPtr() &&
		(!Info->hasFlatScratchInit() \|\| Subtarget->enableFlatScratch()) &&
!Info->hasWorkGroupIDX() && !Info->hasWorkGroupIDY() &&		!Info->hasWorkGroupIDX() && !Info->hasWorkGroupIDY() &&
!Info->hasWorkGroupIDZ() && !Info->hasWorkGroupInfo() &&		!Info->hasWorkGroupIDZ() && !Info->hasWorkGroupInfo() &&
!Info->hasWorkItemIDX() && !Info->hasWorkItemIDY() &&		!Info->hasWorkItemIDX() && !Info->hasWorkItemIDY() &&
!Info->hasWorkItemIDZ());		!Info->hasWorkItemIDZ());
} else if (IsKernel) {		} else if (IsKernel) {
assert(Info->hasWorkGroupIDX() && Info->hasWorkItemIDX());		assert(Info->hasWorkGroupIDX() && Info->hasWorkItemIDX());
} else {		} else {
Splits.append(Ins.begin(), Ins.end());		Splits.append(Ins.begin(), Ins.end());
▲ Show 20 Lines • Show All 9,653 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.h

Show First 20 Lines • Show All 501 Lines • ▼ Show 20 Lines	public:

// Is a FLAT encoded instruction which accesses a specific segment,		// Is a FLAT encoded instruction which accesses a specific segment,
// i.e. global_* or scratch_*.		// i.e. global_* or scratch_*.
static bool isSegmentSpecificFLAT(const MachineInstr &MI) {		static bool isSegmentSpecificFLAT(const MachineInstr &MI) {
auto Flags = MI.getDesc().TSFlags;		auto Flags = MI.getDesc().TSFlags;
return (Flags & SIInstrFlags::FLAT) && !(Flags & SIInstrFlags::LGKM_CNT);		return (Flags & SIInstrFlags::FLAT) && !(Flags & SIInstrFlags::LGKM_CNT);
}		}

		bool isSegmentSpecificFLAT(uint16_t Opcode) const {
		auto Flags = get(Opcode).TSFlags;
		return (Flags & SIInstrFlags::FLAT) && !(Flags & SIInstrFlags::LGKM_CNT);
		}

// FIXME: Make this more precise		// FIXME: Make this more precise
static bool isFLATScratch(const MachineInstr &MI) {		static bool isFLATScratch(const MachineInstr &MI) {
return isSegmentSpecificFLAT(MI);		return isSegmentSpecificFLAT(MI);
}		}

		bool isFLATScratch(uint16_t Opcode) const {
		return isSegmentSpecificFLAT(Opcode);
		}

// Any FLAT encoded instruction, including global_* and scratch_*.		// Any FLAT encoded instruction, including global_* and scratch_*.
bool isFLAT(uint16_t Opcode) const {		bool isFLAT(uint16_t Opcode) const {
return get(Opcode).TSFlags & SIInstrFlags::FLAT;		return get(Opcode).TSFlags & SIInstrFlags::FLAT;
}		}

static bool isEXP(const MachineInstr &MI) {		static bool isEXP(const MachineInstr &MI) {
return MI.getDesc().TSFlags & SIInstrFlags::EXP;		return MI.getDesc().TSFlags & SIInstrFlags::EXP;
}		}
▲ Show 20 Lines • Show All 619 Lines • ▼ Show 20 Lines	namespace AMDGPU {
int getSOPKOp(uint16_t Opcode);		int getSOPKOp(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getGlobalSaddrOp(uint16_t Opcode);		int getGlobalSaddrOp(uint16_t Opcode);

LLVM_READONLY		LLVM_READONLY
int getVCMPXNoSDstOp(uint16_t Opcode);		int getVCMPXNoSDstOp(uint16_t Opcode);

		LLVM_READONLY
		int getFlatScratchInstSTfromSS(uint16_t Opcode);

const uint64_t RSRC_DATA_FORMAT = 0xf00000000000LL;		const uint64_t RSRC_DATA_FORMAT = 0xf00000000000LL;
const uint64_t RSRC_ELEMENT_SIZE_SHIFT = (32 + 19);		const uint64_t RSRC_ELEMENT_SIZE_SHIFT = (32 + 19);
const uint64_t RSRC_INDEX_STRIDE_SHIFT = (32 + 21);		const uint64_t RSRC_INDEX_STRIDE_SHIFT = (32 + 21);
const uint64_t RSRC_TID_ENABLE = UINT64_C(1) << (32 + 23);		const uint64_t RSRC_TID_ENABLE = UINT64_C(1) << (32 + 23);

} // end namespace AMDGPU		} // end namespace AMDGPU

namespace SI {		namespace SI {
Show All 21 Lines

llvm/lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 2,558 Lines • ▼ Show 20 Lines
	def getSOPPWithRelaxation : InstrMapping {			def getSOPPWithRelaxation : InstrMapping {
	let FilterClass = "SOPPRelaxTable";			let FilterClass = "SOPPRelaxTable";
	let RowFields = ["KeyName"];			let RowFields = ["KeyName"];
	let ColFields = ["IsRelaxed"];			let ColFields = ["IsRelaxed"];
	let KeyCol = ["0"];			let KeyCol = ["0"];
	let ValueCols = [["1"]];			let ValueCols = [["1"]];
	}			}

				// Maps flat scratch opcodes by addressing modes
				def getFlatScratchInstSTfromSS : InstrMapping {
				let FilterClass = "FlatScratchInst";
				let RowFields = ["SVOp"];
				let ColFields = ["Mode"];
				let KeyCol = ["SS"];
				let ValueCols = [["ST"]];
				}


	include "SIInstructions.td"			include "SIInstructions.td"

	include "DSInstructions.td"			include "DSInstructions.td"
	include "MIMGInstructions.td"			include "MIMGInstructions.td"

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	if (isAmdHsaOrMesa) {
}		}
} else if (ST.isMesaGfxShader(F)) {		} else if (ST.isMesaGfxShader(F)) {
ImplicitBufferPtr = true;		ImplicitBufferPtr = true;
}		}

if (UseFixedABI \|\| F.hasFnAttribute("amdgpu-kernarg-segment-ptr"))		if (UseFixedABI \|\| F.hasFnAttribute("amdgpu-kernarg-segment-ptr"))
KernargSegmentPtr = true;		KernargSegmentPtr = true;

if (ST.hasFlatAddressSpace() && isEntryFunction() && isAmdHsaOrMesa) {		if (ST.hasFlatAddressSpace() && isEntryFunction() &&
		(isAmdHsaOrMesa \|\| ST.enableFlatScratch())) {
// TODO: This could be refined a lot. The attribute is a poor way of		// TODO: This could be refined a lot. The attribute is a poor way of
// detecting calls or stack objects that may require it before argument		// detecting calls or stack objects that may require it before argument
// lowering.		// lowering.
if (HasCalls \|\| HasStackObjects)		if (HasCalls \|\| HasStackObjects \|\| ST.enableFlatScratch())
FlatScratchInit = true;		FlatScratchInit = true;
}		}

Attribute A = F.getFnAttribute("amdgpu-git-ptr-high");		Attribute A = F.getFnAttribute("amdgpu-git-ptr-high");
StringRef S = A.getValueAsString();		StringRef S = A.getValueAsString();
if (!S.empty())		if (!S.empty())
S.consumeInteger(0, GITPtrHigh);		S.consumeInteger(0, GITPtrHigh);

▲ Show 20 Lines • Show All 411 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	public:
bool canRealignStack(const MachineFunction &MF) const override;		bool canRealignStack(const MachineFunction &MF) const override;
bool requiresRegisterScavenging(const MachineFunction &Fn) const override;		bool requiresRegisterScavenging(const MachineFunction &Fn) const override;

bool requiresFrameIndexScavenging(const MachineFunction &MF) const override;		bool requiresFrameIndexScavenging(const MachineFunction &MF) const override;
bool requiresFrameIndexReplacementScavenging(		bool requiresFrameIndexReplacementScavenging(
const MachineFunction &MF) const override;		const MachineFunction &MF) const override;
bool requiresVirtualBaseRegisters(const MachineFunction &Fn) const override;		bool requiresVirtualBaseRegisters(const MachineFunction &Fn) const override;

int64_t getMUBUFInstrOffset(const MachineInstr *MI) const;		int64_t getScratchInstrOffset(const MachineInstr *MI) const;

int64_t getFrameIndexInstrOffset(const MachineInstr *MI,		int64_t getFrameIndexInstrOffset(const MachineInstr *MI,
int Idx) const override;		int Idx) const override;

bool needsFrameBaseReg(MachineInstr *MI, int64_t Offset) const override;		bool needsFrameBaseReg(MachineInstr *MI, int64_t Offset) const override;

void materializeFrameBaseRegister(MachineBasicBlock *MBB, Register BaseReg,		void materializeFrameBaseRegister(MachineBasicBlock *MBB, Register BaseReg,
int FrameIdx,		int FrameIdx,
▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 382 Lines • ▼ Show 20 Lines
}		}

bool SIRegisterInfo::requiresVirtualBaseRegisters(		bool SIRegisterInfo::requiresVirtualBaseRegisters(
const MachineFunction &) const {		const MachineFunction &) const {
// There are no special dedicated stack or frame pointers.		// There are no special dedicated stack or frame pointers.
return true;		return true;
}		}

int64_t SIRegisterInfo::getMUBUFInstrOffset(const MachineInstr *MI) const {		int64_t SIRegisterInfo::getScratchInstrOffset(const MachineInstr *MI) const {
assert(SIInstrInfo::isMUBUF(*MI));		assert(SIInstrInfo::isMUBUF(MI) \|\| SIInstrInfo::isFLATScratch(MI));

int OffIdx = AMDGPU::getNamedOperandIdx(MI->getOpcode(),		int OffIdx = AMDGPU::getNamedOperandIdx(MI->getOpcode(),
AMDGPU::OpName::offset);		AMDGPU::OpName::offset);
return MI->getOperand(OffIdx).getImm();		return MI->getOperand(OffIdx).getImm();
}		}

int64_t SIRegisterInfo::getFrameIndexInstrOffset(const MachineInstr *MI,		int64_t SIRegisterInfo::getFrameIndexInstrOffset(const MachineInstr *MI,
int Idx) const {		int Idx) const {
if (!SIInstrInfo::isMUBUF(*MI))		if (!SIInstrInfo::isMUBUF(MI) && !SIInstrInfo::isFLATScratch(MI))
return 0;		return 0;

assert(Idx == AMDGPU::getNamedOperandIdx(MI->getOpcode(),		assert((Idx == AMDGPU::getNamedOperandIdx(MI->getOpcode(),
AMDGPU::OpName::vaddr) &&		AMDGPU::OpName::vaddr) \|\|
		(Idx == AMDGPU::getNamedOperandIdx(MI->getOpcode(),
		AMDGPU::OpName::saddr))) &&
"Should never see frame index on non-address operand");		"Should never see frame index on non-address operand");

return getMUBUFInstrOffset(MI);		return getScratchInstrOffset(MI);
}		}

bool SIRegisterInfo::needsFrameBaseReg(MachineInstr *MI, int64_t Offset) const {		bool SIRegisterInfo::needsFrameBaseReg(MachineInstr *MI, int64_t Offset) const {
if (!MI->mayLoadOrStore())		if (!MI->mayLoadOrStore())
return false;		return false;

int64_t FullOffset = Offset + getMUBUFInstrOffset(MI);		int64_t FullOffset = Offset + getScratchInstrOffset(MI);

		if (SIInstrInfo::isMUBUF(*MI))
return !SIInstrInfo::isLegalMUBUFImmOffset(FullOffset);		return !SIInstrInfo::isLegalMUBUFImmOffset(FullOffset);

		const SIInstrInfo *TII = ST.getInstrInfo();
		return TII->isLegalFLATOffset(FullOffset, AMDGPUAS::PRIVATE_ADDRESS, true);
}		}

void SIRegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,		void SIRegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,
Register BaseReg,		Register BaseReg,
int FrameIdx,		int FrameIdx,
int64_t Offset) const {		int64_t Offset) const {
MachineBasicBlock::iterator Ins = MBB->begin();		MachineBasicBlock::iterator Ins = MBB->begin();
DebugLoc DL; // Defaults to "unknown"		DebugLoc DL; // Defaults to "unknown"

if (Ins != MBB->end())		if (Ins != MBB->end())
DL = Ins->getDebugLoc();		DL = Ins->getDebugLoc();

MachineFunction *MF = MBB->getParent();		MachineFunction *MF = MBB->getParent();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
		unsigned MovOpc = ST.enableFlatScratch() ? AMDGPU::S_MOV_B32
		: AMDGPU::V_MOV_B32_e32;

if (Offset == 0) {		if (Offset == 0) {
BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::V_MOV_B32_e32), BaseReg)		BuildMI(*MBB, Ins, DL, TII->get(MovOpc), BaseReg)
.addFrameIndex(FrameIdx);		.addFrameIndex(FrameIdx);
return;		return;
}		}

MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
Register OffsetReg = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);		Register OffsetReg = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);

Register FIReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		Register FIReg = MRI.createVirtualRegister(
		ST.enableFlatScratch() ? &AMDGPU::SReg_32_XM0RegClass
		: &AMDGPU::VGPR_32RegClass);

BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)
.addImm(Offset);		.addImm(Offset);
BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::V_MOV_B32_e32), FIReg)		BuildMI(*MBB, Ins, DL, TII->get(MovOpc), FIReg)
.addFrameIndex(FrameIdx);		.addFrameIndex(FrameIdx);

		if (ST.enableFlatScratch() ) {
		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_ADD_U32), BaseReg)
		.addReg(OffsetReg, RegState::Kill)
		.addReg(FIReg);
		return;
		}

TII->getAddNoCarry(*MBB, Ins, DL, BaseReg)		TII->getAddNoCarry(*MBB, Ins, DL, BaseReg)
.addReg(OffsetReg, RegState::Kill)		.addReg(OffsetReg, RegState::Kill)
.addReg(FIReg)		.addReg(FIReg)
.addImm(0); // clamp bit		.addImm(0); // clamp bit
}		}

void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, Register BaseReg,		void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, Register BaseReg,
int64_t Offset) const {		int64_t Offset) const {
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
		bool IsFlat = TII->isFLATScratch(MI);

#ifndef NDEBUG		#ifndef NDEBUG
// FIXME: Is it possible to be storing a frame index to itself?		// FIXME: Is it possible to be storing a frame index to itself?
bool SeenFI = false;		bool SeenFI = false;
for (const MachineOperand &MO: MI.operands()) {		for (const MachineOperand &MO: MI.operands()) {
if (MO.isFI()) {		if (MO.isFI()) {
if (SeenFI)		if (SeenFI)
llvm_unreachable("should not see multiple frame indices");		llvm_unreachable("should not see multiple frame indices");

SeenFI = true;		SeenFI = true;
}		}
}		}
#endif		#endif

MachineOperand *FIOp = TII->getNamedOperand(MI, AMDGPU::OpName::vaddr);		MachineOperand *FIOp =
		TII->getNamedOperand(MI, IsFlat ? AMDGPU::OpName::saddr
		: AMDGPU::OpName::vaddr);
#ifndef NDEBUG		#ifndef NDEBUG
MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
MachineFunction *MF = MBB->getParent();		MachineFunction *MF = MBB->getParent();
assert(FIOp && FIOp->isFI() && "frame index must be address operand");		assert(FIOp && FIOp->isFI() && "frame index must be address operand");
assert(TII->isMUBUF(MI));		assert(TII->isMUBUF(MI) \|\| TII->isFLATScratch(MI));

		MachineOperand *OffsetOp = TII->getNamedOperand(MI, AMDGPU::OpName::offset);
		int64_t NewOffset = OffsetOp->getImm() + Offset;

		if (IsFlat) {
		assert(TII->isLegalFLATOffset(NewOffset, AMDGPUAS::PRIVATE_ADDRESS, true) &&
		"offset should be legal");
		FIOp->ChangeToRegister(BaseReg, false);
		OffsetOp->setImm(NewOffset);
		return;
		}

MachineOperand *SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);		MachineOperand *SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);
assert((SOffset->isReg() &&		assert((SOffset->isReg() &&
SOffset->getReg() ==		SOffset->getReg() ==
MF->getInfo<SIMachineFunctionInfo>()->getStackPtrOffsetReg()) \|\|		MF->getInfo<SIMachineFunctionInfo>()->getStackPtrOffsetReg()) \|\|
(SOffset->isImm() && SOffset->getImm() == 0));		(SOffset->isImm() && SOffset->getImm() == 0));
#endif		#endif

MachineOperand *OffsetOp = TII->getNamedOperand(MI, AMDGPU::OpName::offset);
int64_t NewOffset = OffsetOp->getImm() + Offset;
assert(SIInstrInfo::isLegalMUBUFImmOffset(NewOffset) &&		assert(SIInstrInfo::isLegalMUBUFImmOffset(NewOffset) &&
"offset should be legal");		"offset should be legal");

FIOp->ChangeToRegister(BaseReg, false);		FIOp->ChangeToRegister(BaseReg, false);
OffsetOp->setImm(NewOffset);		OffsetOp->setImm(NewOffset);
}		}

bool SIRegisterInfo::isFrameOffsetLegal(const MachineInstr *MI,		bool SIRegisterInfo::isFrameOffsetLegal(const MachineInstr *MI,
Register BaseReg,		Register BaseReg,
int64_t Offset) const {		int64_t Offset) const {
if (!SIInstrInfo::isMUBUF(*MI))		if (!SIInstrInfo::isMUBUF(MI) && !!SIInstrInfo::isFLATScratch(MI))
return false;		return false;

int64_t NewOffset = Offset + getMUBUFInstrOffset(MI);		int64_t NewOffset = Offset + getScratchInstrOffset(MI);

		if (SIInstrInfo::isMUBUF(*MI))
return SIInstrInfo::isLegalMUBUFImmOffset(NewOffset);		return SIInstrInfo::isLegalMUBUFImmOffset(NewOffset);

		const SIInstrInfo *TII = ST.getInstrInfo();
		return TII->isLegalFLATOffset(NewOffset, AMDGPUAS::PRIVATE_ADDRESS, true);
}		}

const TargetRegisterClass *SIRegisterInfo::getPointerRegClass(		const TargetRegisterClass *SIRegisterInfo::getPointerRegClass(
const MachineFunction &MF, unsigned Kind) const {		const MachineFunction &MF, unsigned Kind) const {
// This is inaccurate. It depends on the instruction and address space. The		// This is inaccurate. It depends on the instruction and address space. The
// only place where we should hit this is for dealing with frame indexes /		// only place where we should hit this is for dealing with frame indexes /
// private accesses, so this is correct in that case.		// private accesses, so this is correct in that case.
return &AMDGPU::VGPR_32RegClass;		return &AMDGPU::VGPR_32RegClass;
▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines	void SIRegisterInfo::buildSpillLoadStore(MachineBasicBlock::iterator MI,
MachineMemOperand *MMO,		MachineMemOperand *MMO,
RegScavenger *RS) const {		RegScavenger *RS) const {
MachineBasicBlock *MBB = MI->getParent();		MachineBasicBlock *MBB = MI->getParent();
MachineFunction *MF = MI->getParent()->getParent();		MachineFunction *MF = MI->getParent()->getParent();
const SIInstrInfo *TII = ST.getInstrInfo();		const SIInstrInfo *TII = ST.getInstrInfo();
const MachineFrameInfo &MFI = MF->getFrameInfo();		const MachineFrameInfo &MFI = MF->getFrameInfo();
const SIMachineFunctionInfo *FuncInfo = MF->getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *FuncInfo = MF->getInfo<SIMachineFunctionInfo>();

const MCInstrDesc &Desc = TII->get(LoadStoreOp);		const MCInstrDesc *Desc = &TII->get(LoadStoreOp);
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();
bool IsStore = Desc.mayStore();		bool IsStore = Desc->mayStore();
		bool IsFlat = TII->isFLATScratch(LoadStoreOp);

bool Scavenged = false;		bool Scavenged = false;
MCRegister SOffset = ScratchOffsetReg;		MCRegister SOffset = ScratchOffsetReg;

const unsigned EltSize = 4;		const unsigned EltSize = 4;
const TargetRegisterClass *RC = getRegClassForReg(MF->getRegInfo(), ValueReg);		const TargetRegisterClass *RC = getRegClassForReg(MF->getRegInfo(), ValueReg);
unsigned NumSubRegs = AMDGPU::getRegBitWidth(RC->getID()) / (EltSize * CHAR_BIT);		unsigned NumSubRegs = AMDGPU::getRegBitWidth(RC->getID()) / (EltSize * CHAR_BIT);
unsigned Size = NumSubRegs * EltSize;		unsigned Size = NumSubRegs * EltSize;
int64_t Offset = InstOffset + MFI.getObjectOffset(Index);		int64_t Offset = InstOffset + MFI.getObjectOffset(Index);
		int64_t MaxOffset = Offset + Size - EltSize;
int64_t ScratchOffsetRegDelta = 0;		int64_t ScratchOffsetRegDelta = 0;

Align Alignment = MFI.getObjectAlign(Index);		Align Alignment = MFI.getObjectAlign(Index);
const MachinePointerInfo &BasePtrInfo = MMO->getPointerInfo();		const MachinePointerInfo &BasePtrInfo = MMO->getPointerInfo();

assert((Offset % EltSize) == 0 && "unexpected VGPR spill offset");		assert((Offset % EltSize) == 0 && "unexpected VGPR spill offset");

if (!SIInstrInfo::isLegalMUBUFImmOffset(Offset + Size - EltSize)) {		bool IsOffsetLegal = IsFlat
		? TII->isLegalFLATOffset(MaxOffset, AMDGPUAS::PRIVATE_ADDRESS, true)
		: SIInstrInfo::isLegalMUBUFImmOffset(MaxOffset);
		if (!IsOffsetLegal \|\| (IsFlat && !SOffset && !ST.hasFlatScratchSTMode())) {
SOffset = MCRegister();		SOffset = MCRegister();
		arsenmUnsubmitted Done Reply Inline Actions This looks backwards with the negated conditions arsenm: This looks backwards with the negated conditions

// We currently only support spilling VGPRs to EltSize boundaries, meaning		// We currently only support spilling VGPRs to EltSize boundaries, meaning
// we can simplify the adjustment of Offset here to just scale with		// we can simplify the adjustment of Offset here to just scale with
// WavefrontSize.		// WavefrontSize.
		if (!IsFlat)
Offset *= ST.getWavefrontSize();		Offset *= ST.getWavefrontSize();

// We don't have access to the register scavenger if this function is called		// We don't have access to the register scavenger if this function is called
// during PEI::scavengeFrameVirtualRegs().		// during PEI::scavengeFrameVirtualRegs().
if (RS)		if (RS)
SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);		SOffset = RS->scavengeRegister(&AMDGPU::SGPR_32RegClass, MI, 0, false);

if (!SOffset) {		if (!SOffset) {
// There are no free SGPRs, and since we are in the process of spilling		// There are no free SGPRs, and since we are in the process of spilling
Show All 21 Lines	if (ScratchOffsetReg == AMDGPU::NoRegister) {
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), SOffset)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), SOffset)
.addReg(ScratchOffsetReg)		.addReg(ScratchOffsetReg)
.addImm(Offset);		.addImm(Offset);
}		}

Offset = 0;		Offset = 0;
}		}

		if (IsFlat && SOffset == AMDGPU::NoRegister) {
		assert(AMDGPU::getNamedOperandIdx(LoadStoreOp, AMDGPU::OpName::vaddr) < 0
		&& "Unexpected vaddr for flat scratch with a FI operand");

		assert(ST.hasFlatScratchSTMode());
		LoadStoreOp = AMDGPU::getFlatScratchInstSTfromSS(LoadStoreOp);
		Desc = &TII->get(LoadStoreOp);
		}

Register TmpReg;		Register TmpReg;

		// FIXME: Flat scratch does not have to be limited to a dword per store.
for (unsigned i = 0, e = NumSubRegs; i != e; ++i, Offset += EltSize) {		for (unsigned i = 0, e = NumSubRegs; i != e; ++i, Offset += EltSize) {
Register SubReg = NumSubRegs == 1		Register SubReg = NumSubRegs == 1
? Register(ValueReg)		? Register(ValueReg)
: getSubReg(ValueReg, getSubRegFromChannel(i));		: getSubReg(ValueReg, getSubRegFromChannel(i));

unsigned SOffsetRegState = 0;		unsigned SOffsetRegState = 0;
unsigned SrcDstRegState = getDefRegState(!IsStore);		unsigned SrcDstRegState = getDefRegState(!IsStore);
if (i + 1 == e) {		if (i + 1 == e) {
Show All 28 Lines	if (!MIB.getInstr()) {
SubReg = TmpReg;		SubReg = TmpReg;
}		}

MachinePointerInfo PInfo = BasePtrInfo.getWithOffset(EltSize * i);		MachinePointerInfo PInfo = BasePtrInfo.getWithOffset(EltSize * i);
MachineMemOperand *NewMMO =		MachineMemOperand *NewMMO =
MF->getMachineMemOperand(PInfo, MMO->getFlags(), EltSize,		MF->getMachineMemOperand(PInfo, MMO->getFlags(), EltSize,
commonAlignment(Alignment, EltSize * i));		commonAlignment(Alignment, EltSize * i));

MIB = BuildMI(*MBB, MI, DL, Desc)		MIB = BuildMI(MBB, MI, DL, Desc)
.addReg(SubReg,		.addReg(SubReg,
getDefRegState(!IsStore) \| getKillRegState(IsKill))		getDefRegState(!IsStore) \| getKillRegState(IsKill));
.addReg(ScratchRsrcReg);		if (!IsFlat)
		MIB.addReg(ScratchRsrcReg);

if (SOffset == AMDGPU::NoRegister) {		if (SOffset == AMDGPU::NoRegister) {
		if (!IsFlat)
MIB.addImm(0);		MIB.addImm(0);
} else {		} else {
MIB.addReg(SOffset, SOffsetRegState);		MIB.addReg(SOffset, SOffsetRegState);
}		}
MIB.addImm(Offset)		MIB.addImm(Offset)
.addImm(0) // glc		.addImm(0) // glc
.addImm(0) // slc		.addImm(0) // slc
.addImm(0) // tfe		.addImm(0); // tfe for MUBUF or dlc for FLAT
.addImm(0) // dlc		if (!IsFlat)
.addImm(0) // swz		MIB.addImm(0) // dlc
.addMemOperand(NewMMO);		.addImm(0); // swz
		MIB.addMemOperand(NewMMO);

if (!IsAGPR && NeedSuperRegDef)		if (!IsAGPR && NeedSuperRegDef)
MIB.addReg(ValueReg, RegState::ImplicitDefine);		MIB.addReg(ValueReg, RegState::ImplicitDefine);

if (!IsStore && TmpReg != AMDGPU::NoRegister)		if (!IsStore && TmpReg != AMDGPU::NoRegister)
MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32),		MIB = BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ACCVGPR_WRITE_B32),
FinalReg)		FinalReg)
.addReg(TmpReg, RegState::Kill);		.addReg(TmpReg, RegState::Kill);
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	void SIRegisterInfo::buildSGPRSpillLoadStore(MachineBasicBlock::iterator MI,
Align Alignment = FrameInfo.getObjectAlign(Index);		Align Alignment = FrameInfo.getObjectAlign(Index);
MachinePointerInfo PtrInfo =		MachinePointerInfo PtrInfo =
MachinePointerInfo::getFixedStack(*MF, Index);		MachinePointerInfo::getFixedStack(*MF, Index);
MachineMemOperand *MMO = MF->getMachineMemOperand(		MachineMemOperand *MMO = MF->getMachineMemOperand(
PtrInfo, IsLoad ? MachineMemOperand::MOLoad : MachineMemOperand::MOStore,		PtrInfo, IsLoad ? MachineMemOperand::MOLoad : MachineMemOperand::MOStore,
EltSize, Alignment);		EltSize, Alignment);

if (IsLoad) {		if (IsLoad) {
buildSpillLoadStore(MI, AMDGPU::BUFFER_LOAD_DWORD_OFFSET,		unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_LOAD_DWORD_SADDR
		: AMDGPU::BUFFER_LOAD_DWORD_OFFSET;
		buildSpillLoadStore(MI, Opc,
Index,		Index,
VGPR, false,		VGPR, false,
MFI->getScratchRSrcReg(), FrameReg,		MFI->getScratchRSrcReg(), FrameReg,
Offset * EltSize, MMO,		Offset * EltSize, MMO,
RS);		RS);
} else {		} else {
buildSpillLoadStore(MI, AMDGPU::BUFFER_STORE_DWORD_OFFSET, Index, VGPR,		unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_STORE_DWORD_SADDR
		: AMDGPU::BUFFER_STORE_DWORD_OFFSET;
		buildSpillLoadStore(MI, Opc, Index, VGPR,
IsKill, MFI->getScratchRSrcReg(), FrameReg,		IsKill, MFI->getScratchRSrcReg(), FrameReg,
Offset * EltSize, MMO, RS);		Offset * EltSize, MMO, RS);
// This only ever adds one VGPR spill		// This only ever adds one VGPR spill
MFI->addToSpilledVGPRs(1);		MFI->addToSpilledVGPRs(1);
}		}

// Restore EXEC		// Restore EXEC
BuildMI(*MBB, MI, DL, TII->get(ExecMovOpc), ExecReg)		BuildMI(*MBB, MI, DL, TII->get(ExecMovOpc), ExecReg)
▲ Show 20 Lines • Show All 323 Lines • ▼ Show 20 Lines	switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_A96_SAVE:		case AMDGPU::SI_SPILL_A96_SAVE:
case AMDGPU::SI_SPILL_A64_SAVE:		case AMDGPU::SI_SPILL_A64_SAVE:
case AMDGPU::SI_SPILL_A32_SAVE: {		case AMDGPU::SI_SPILL_A32_SAVE: {
const MachineOperand VData = TII->getNamedOperand(MI,		const MachineOperand VData = TII->getNamedOperand(MI,
AMDGPU::OpName::vdata);		AMDGPU::OpName::vdata);
assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==		assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==
MFI->getStackPtrOffsetReg());		MFI->getStackPtrOffsetReg());

buildSpillLoadStore(MI, AMDGPU::BUFFER_STORE_DWORD_OFFSET,		unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_STORE_DWORD_SADDR
		: AMDGPU::BUFFER_STORE_DWORD_OFFSET;
		buildSpillLoadStore(MI, Opc,
Index,		Index,
VData->getReg(), VData->isKill(),		VData->getReg(), VData->isKill(),
TII->getNamedOperand(*MI, AMDGPU::OpName::srsrc)->getReg(),		TII->getNamedOperand(*MI, AMDGPU::OpName::srsrc)->getReg(),
FrameReg,		FrameReg,
TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),		TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),
*MI->memoperands_begin(),		*MI->memoperands_begin(),
RS);		RS);
MFI->addToSpilledVGPRs(getNumSubRegsForSpillOp(MI->getOpcode()));		MFI->addToSpilledVGPRs(getNumSubRegsForSpillOp(MI->getOpcode()));
Show All 17 Lines	switch (MI->getOpcode()) {
case AMDGPU::SI_SPILL_A256_RESTORE:		case AMDGPU::SI_SPILL_A256_RESTORE:
case AMDGPU::SI_SPILL_A512_RESTORE:		case AMDGPU::SI_SPILL_A512_RESTORE:
case AMDGPU::SI_SPILL_A1024_RESTORE: {		case AMDGPU::SI_SPILL_A1024_RESTORE: {
const MachineOperand VData = TII->getNamedOperand(MI,		const MachineOperand VData = TII->getNamedOperand(MI,
AMDGPU::OpName::vdata);		AMDGPU::OpName::vdata);
assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==		assert(TII->getNamedOperand(*MI, AMDGPU::OpName::soffset)->getReg() ==
MFI->getStackPtrOffsetReg());		MFI->getStackPtrOffsetReg());

buildSpillLoadStore(MI, AMDGPU::BUFFER_LOAD_DWORD_OFFSET,		unsigned Opc = ST.enableFlatScratch() ? AMDGPU::SCRATCH_LOAD_DWORD_SADDR
		: AMDGPU::BUFFER_LOAD_DWORD_OFFSET;
		buildSpillLoadStore(MI, Opc,
Index,		Index,
VData->getReg(), VData->isKill(),		VData->getReg(), VData->isKill(),
TII->getNamedOperand(*MI, AMDGPU::OpName::srsrc)->getReg(),		TII->getNamedOperand(*MI, AMDGPU::OpName::srsrc)->getReg(),
FrameReg,		FrameReg,
TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),		TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm(),
*MI->memoperands_begin(),		*MI->memoperands_begin(),
RS);		RS);
MI->eraseFromParent();		MI->eraseFromParent();
break;		break;
}		}

default: {		default: {
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();

		int64_t Offset = FrameInfo.getObjectOffset(Index);
		if (ST.enableFlatScratch()) {
		if (TII->isFLATScratch(*MI)) {
		// The offset is always swizzled, just replace it
		if (FrameReg)
		FIOp.ChangeToRegister(FrameReg, false);

		if (!Offset)
		return;

		MachineOperand *OffsetOp =
		TII->getNamedOperand(*MI, AMDGPU::OpName::offset);
		int64_t NewOffset = Offset + OffsetOp->getImm();
		if (TII->isLegalFLATOffset(NewOffset, AMDGPUAS::PRIVATE_ADDRESS,
		true)) {
		OffsetOp->setImm(NewOffset);
		if (FrameReg)
		return;
		Offset = 0;
		}

		assert(!TII->getNamedOperand(*MI, AMDGPU::OpName::vaddr) &&
		"Unexpected vaddr for flat scratch with a FI operand");

		// On GFX10 we have ST mode to use no registers for an address.
		// Otherwise we need to materialize 0 into an SGPR.
		if (!Offset && ST.hasFlatScratchSTMode()) {
		unsigned Opc = MI->getOpcode();
		unsigned NewOpc = AMDGPU::getFlatScratchInstSTfromSS(Opc);
		MI->RemoveOperand(
		AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::saddr));
		MI->setDesc(TII->get(NewOpc));
		return;
		}
		}

		if (!FrameReg) {
		FIOp.ChangeToImmediate(Offset);
		if (TII->isImmOperandLegal(*MI, FIOperandNum, FIOp))
		return;
		}

		// We need to use register here. Check if we can use an SGPR or need
		// a VGPR.
		FlakebiUnsubmitted Not Done Reply Inline Actions I get a failing assert here with `NewOpc = 4294967295`: llvm/include/llvm/MC/MCInstrInfo.h:63: const llvm::MCInstrDesc &llvm::MCInstrInfo::get(unsigned int) const: Assertion `Opcode < NumOpcodes && "Invalid opcode!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: compiler/llpc/amdllpc -gfxip=10.1 -amdgpu-enable-flat-scratch /pipelines/PipelineVsFs_0x1BEFB7D1A235B4F6.pipe -verify-machineinstrs 1. Running pass 'CallGraph Pass Manager' on module 'lgcPipeline'. 2. Running pass 'Prologue/Epilogue Insertion & Frame Finalization' on function '@_amdgpu_ps_main' #0 0x00000000023f0db1 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /llvm/lib/Support/Unix/Signals.inc:563:13 #1 0x00000000023ef060 llvm::sys::RunSignalHandlers() /llvm/lib/Support/Signals.cpp:72:18 #2 0x00000000023f1152 SignalHandler(int) /llvm/lib/Support/Unix/Signals.inc:0:3 #3 0x00007fadd6ebfee0 __restore_rt (/glibc-2.31/lib/libpthread.so.0+0x12ee0) #4 0x00007fadd6d0c08a raise (/glibc-2.31/lib/libc.so.6+0x3808a) #5 0x00007fadd6cf6528 abort (/glibc-2.31/lib/libc.so.6+0x22528) #6 0x00007fadd6cf640f _nl_load_domain.cold.0 (/glibc-2.31/lib/libc.so.6+0x2240f) #7 0x00007fadd6d04a02 (/glibc-2.31/lib/libc.so.6+0x30a02) #8 0x0000000001a03170 llvm::SIRegisterInfo::eliminateFrameIndex(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, int, unsigned int, llvm::RegScavenger) const /llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:1465:11 #9 0x000000000214e0f3 (anonymous namespace)::PEI::replaceFrameIndices(llvm::MachineBasicBlock, llvm::MachineFunction&, int&) /llvm/lib/CodeGen/PrologEpilogInserter.cpp:0:11 #10 0x000000000214caef llvm::MachineBasicBlock::getNumber() const /llvm/include/llvm/CodeGen/MachineBasicBlock.h:904:34 #11 0x000000000214caef (anonymous namespace)::PEI::replaceFrameIndices(llvm::MachineFunction&) /llvm/lib/CodeGen/PrologEpilogInserter.cpp:1161:17 #12 0x000000000214caef (anonymous namespace)::PEI::runOnMachineFunction(llvm::MachineFunction&) /llvm/lib/CodeGen/PrologEpilogInserter.cpp:269:3 #13 0x0000000002031e7e llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /llvm/lib/CodeGen/MachineFunctionPass.cpp:0:13 #14 0x0000000003136a85 llvm::FPPassManager::runOnFunction(llvm::Function&) /llvm/lib/IR/LegacyPassManager.cpp:1519:27 #15 0x0000000001c76b38 (anonymous namespace)::CGPassManager::RunPassOnSCC(llvm::Pass, llvm::CallGraphSCC&, llvm::CallGraph&, bool&, bool&) /llvm/lib/Analysis/CallGraphSCCPass.cpp:178:25 #16 0x0000000001c76b38 (anonymous namespace)::CGPassManager::RunAllPassesOnSCC(llvm::CallGraphSCC&, llvm::CallGraph&, bool&) /llvm/lib/Analysis/CallGraphSCCPass.cpp:476:9 #17 0x0000000001c76b38 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) /llvm/lib/Analysis/CallGraphSCCPass.cpp:541:18 #18 0x0000000003137149 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /llvm/lib/IR/LegacyPassManager.cpp:0:27 #19 0x0000000003137149 llvm::legacy::PassManagerImpl::run(llvm::Module&) /llvm/lib/IR/LegacyPassManager.cpp:615:44 … Flakebi:* I get a failing assert here with `NewOpc = 4294967295`: ``` llvm/include/llvm/MC/MCInstrInfo.h…
		rampitecAuthorUnsubmitted Done Reply Inline Actions I cannot reproduce this. Take in mind that D89424 is not updated to use ST mode yet, so they do not work together yet. rampitec: I cannot reproduce this. Take in mind that D89424 is not updated to use ST mode yet, so they do…
		FIOp.ChangeToRegister(AMDGPU::M0, false);
		bool UseSGPR = TII->isOperandLegal(*MI, FIOperandNum, &FIOp);

		if (!Offset && FrameReg && UseSGPR) {
		FIOp.setReg(FrameReg);
		return;
		}

		const TargetRegisterClass *RC = UseSGPR ? &AMDGPU::SReg_32_XM0RegClass
		: &AMDGPU::VGPR_32RegClass;

		Register TmpReg = RS->scavengeRegister(RC, MI, 0, !UseSGPR);
		FIOp.setReg(TmpReg);
		FIOp.setIsKill(true);

		if ((!FrameReg \|\| !Offset) && TmpReg) {
		unsigned Opc = UseSGPR ? AMDGPU::S_MOV_B32 : AMDGPU::V_MOV_B32_e32;
		auto MIB = BuildMI(*MBB, MI, DL, TII->get(Opc), TmpReg);
		if (FrameReg)
		MIB.addReg(FrameReg);
		else
		MIB.addImm(Offset);

		return;
		}

		Register TmpSReg =
		UseSGPR ? TmpReg
		arsenmUnsubmitted Done Reply Inline Actions What happens if this needs an SGPR spill? arsenm: What happens if this needs an SGPR spill?
		rampitecAuthorUnsubmitted Done Reply Inline Actions It it can scavenge it it shall be fine as offset shall not change. If not I guess I would need to adjust SP and revert it. I have added FIXME here. rampitec: It it can scavenge it it shall be fine as offset shall not change. If not I guess I would need…
		: RS->scavengeRegister(&AMDGPU::SReg_32_XM0RegClass, MI, 0,
		!UseSGPR);

		// TODO: for flat scratch another attempt can be made with a VGPR index
		// if no SGPRs can be scavenged.
		if ((!TmpSReg && !FrameReg) \|\| (!TmpReg && !UseSGPR))
		report_fatal_error("Cannot scavenge register in FI elimination!");

		if (!TmpSReg) {
		// Use frame register and restore it after.
		TmpSReg = FrameReg;
		FIOp.setReg(FrameReg);
		FIOp.setIsKill(false);
		}

		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_ADD_U32), TmpSReg)
		.addReg(FrameReg)
		.addImm(Offset);

		if (!UseSGPR)
		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpReg)
		.addReg(TmpSReg, RegState::Kill);

		if (TmpSReg == FrameReg) {
		// Undo frame register modification.
		BuildMI(*MBB, std::next(MI), DL, TII->get(AMDGPU::S_SUB_U32),
		FrameReg)
		.addReg(FrameReg)
		.addImm(Offset);
		}

		return;
		}

bool IsMUBUF = TII->isMUBUF(*MI);		bool IsMUBUF = TII->isMUBUF(*MI);

if (!IsMUBUF && !MFI->isEntryFunction()) {		if (!IsMUBUF && !MFI->isEntryFunction()) {
// Convert to a swizzled stack address by scaling by the wave size.		// Convert to a swizzled stack address by scaling by the wave size.
//		//
// In an entry function/kernel the offset is already swizzled.		// In an entry function/kernel the offset is already swizzled.

bool IsCopy = MI->getOpcode() == AMDGPU::V_MOV_B32_e32;		bool IsCopy = MI->getOpcode() == AMDGPU::V_MOV_B32_e32;
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	default: {
MI->eraseFromParent();		MI->eraseFromParent();
return;		return;
}		}
}		}

// If the offset is simply too big, don't convert to a scratch wave offset		// If the offset is simply too big, don't convert to a scratch wave offset
// relative index.		// relative index.

int64_t Offset = FrameInfo.getObjectOffset(Index);
FIOp.ChangeToImmediate(Offset);		FIOp.ChangeToImmediate(Offset);
if (!TII->isImmOperandLegal(*MI, FIOperandNum, FIOp)) {		if (!TII->isImmOperandLegal(*MI, FIOperandNum, FIOp)) {
Register TmpReg = RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MI, 0);		Register TmpReg = RS->scavengeRegister(&AMDGPU::VGPR_32RegClass, MI, 0);
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_MOV_B32_e32), TmpReg)
.addImm(Offset);		.addImm(Offset);
FIOp.ChangeToRegister(TmpReg, false, false, true);		FIOp.ChangeToRegister(TmpReg, false, false, true);
}		}
}		}
▲ Show 20 Lines • Show All 512 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,MUBUF %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,MUBUF %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,MUBUF %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,FLATSCR %s

	declare hidden void @external_void_func_void() #0			declare hidden void @external_void_func_void() #0

	; GCN-LABEL: {{^}}test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GCN-LABEL: {{^}}test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GCN: s_getpc_b64 s[34:35]			; GCN: s_getpc_b64 s[34:35]
	; GCN-NEXT: s_add_u32 s34, s34,			; GCN-NEXT: s_add_u32 s34, s34,
	; GCN-NEXT: s_addc_u32 s35, s35,			; GCN-NEXT: s_addc_u32 s35, s35,
	; GCN-NEXT: s_mov_b32 s32, 0			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN: s_swappc_b64 s[30:31], s[34:35]			; GCN: s_swappc_b64 s[30:31], s[34:35]

	; GCN-NEXT: #ASMSTART			; GCN-NEXT: #ASMSTART
	; GCN-NEXT: #ASMEND			; GCN-NEXT: #ASMEND
	; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]			; GCN-NEXT: s_swappc_b64 s[30:31], s[34:35]
	define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {			define amdgpu_kernel void @test_kernel_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
	call void @external_void_func_void()			call void @external_void_func_void()
	call void asm sideeffect "", ""() #0			call void asm sideeffect "", ""() #0
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:			; GCN-LABEL: {{^}}test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void:
	; GCN: buffer_store_dword			; MUBUF: buffer_store_dword
				; FLATSCR: scratch_store_dword
	; GCN: v_writelane_b32 v40, s33, 4			; GCN: v_writelane_b32 v40, s33, 4
	; GCN: v_writelane_b32 v40, s34, 0			; GCN: v_writelane_b32 v40, s34, 0
	; GCN: v_writelane_b32 v40, s35, 1			; GCN: v_writelane_b32 v40, s35, 1
	; GCN: v_writelane_b32 v40, s30, 2			; GCN: v_writelane_b32 v40, s30, 2
	; GCN: v_writelane_b32 v40, s31, 3			; GCN: v_writelane_b32 v40, s31, 3

	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-NEXT: ;;#ASMSTART			; GCN-NEXT: ;;#ASMSTART
	; GCN-NEXT: ;;#ASMEND			; GCN-NEXT: ;;#ASMEND
	; GCN-NEXT: s_swappc_b64			; GCN-NEXT: s_swappc_b64
	; GCN-DAG: v_readlane_b32 s4, v40, 2			; GCN-DAG: v_readlane_b32 s4, v40, 2
	; GCN-DAG: v_readlane_b32 s5, v40, 3			; GCN-DAG: v_readlane_b32 s5, v40, 3
	; GCN: v_readlane_b32 s35, v40, 1			; GCN: v_readlane_b32 s35, v40, 1
	; GCN: v_readlane_b32 s34, v40, 0			; GCN: v_readlane_b32 s34, v40, 0

	; GCN: v_readlane_b32 s33, v40, 4			; GCN: v_readlane_b32 s33, v40, 4
	; GCN: buffer_load_dword			; MUBUF: buffer_load_dword
				; FLATSCR: scratch_load_dword
	; GCN: s_setpc_b64			; GCN: s_setpc_b64
	define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {			define void @test_func_call_external_void_func_void_clobber_s30_s31_call_external_void_func_void() #0 {
	call void @external_void_func_void()			call void @external_void_func_void()
	call void asm sideeffect "", ""() #0			call void asm sideeffect "", ""() #0
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:			; GCN-LABEL: {{^}}test_func_call_external_void_funcx2:
	; GCN: buffer_store_dword v40			; MUBUF: buffer_store_dword v40
				; FLATSCR: scratch_store_dword off, v40
	; GCN: v_writelane_b32 v40, s33, 4			; GCN: v_writelane_b32 v40, s33, 4

	; GCN: s_mov_b32 s33, s32			; GCN: s_mov_b32 s33, s32
	; GCN: s_add_u32 s32, s32, 0x400			; MUBUF: s_add_u32 s32, s32, 0x400
				; FLATSCR: s_add_u32 s32, s32, 16
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	; GCN-NEXT: s_swappc_b64			; GCN-NEXT: s_swappc_b64

	; GCN: v_readlane_b32 s33, v40, 4			; GCN: v_readlane_b32 s33, v40, 4
	; GCN: buffer_load_dword v40,			; MUBUF: buffer_load_dword v40
				; FLATSCR: scratch_load_dword v40
	define void @test_func_call_external_void_funcx2() #0 {			define void @test_func_call_external_void_funcx2() #0 {
	call void @external_void_func_void()			call void @external_void_func_void()
	call void @external_void_func_void()			call void @external_void_func_void()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:			; GCN-LABEL: {{^}}void_func_void_clobber_s30_s31:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=CI %s		; RUN: llc -march=amdgcn -mcpu=hawaii -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI,MUBUF %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=GCN -check-prefix=GFX9 %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,MUBUF %s
		; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-flat-scratch < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,FLATSCR %s

; GCN-LABEL: {{^}}callee_no_stack:		; GCN-LABEL: {{^}}callee_no_stack:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_no_stack() #0 {		define void @callee_no_stack() #0 {
ret void		ret void
}		}
Show All 16 Lines
define void @callee_no_stack_no_fp_elim_nonleaf() #2 {		define void @callee_no_stack_no_fp_elim_nonleaf() #2 {
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_with_stack:		; GCN-LABEL: {{^}}callee_with_stack:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}		; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32{{$}}		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32{{$}}
		; FLATSCR-NEXT: scratch_store_dword off, v0, s32
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_with_stack() #0 {		define void @callee_with_stack() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

; Can use free call clobbered register to preserve original FP value.		; Can use free call clobbered register to preserve original FP value.

; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_all:		; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_all:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_mov_b32 s4, s33		; GCN-NEXT: s_mov_b32 s4, s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_u32 s32, s32, 0x200		; MUBUF-NEXT: s_add_u32 s32, s32, 0x200
		; FLATSCR-NEXT: s_add_u32 s32, s32, 8
; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}		; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4{{$}}		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s33 offset:4{{$}}
; GCN-NEXT: s_sub_u32 s32, s32, 0x200		; FLATSCR-NEXT: scratch_store_dword off, v0, s33 offset:4{{$}}
		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x200
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 8
; GCN-NEXT: s_mov_b32 s33, s4		; GCN-NEXT: s_mov_b32 s33, s4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_with_stack_no_fp_elim_all() #1 {		define void @callee_with_stack_no_fp_elim_all() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_non_leaf:		; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_non_leaf:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}		; GCN-NEXT: v_mov_b32_e32 v0, 0{{$}}
; GCN-NEXT: buffer_store_dword v0, off, s[0:3], s32{{$}}		; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32{{$}}
		; FLATSCR-NEXT: scratch_store_dword off, v0, s32{{$}}
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_with_stack_no_fp_elim_non_leaf() #2 {		define void @callee_with_stack_no_fp_elim_non_leaf() #2 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

; GCN-LABEL: {{^}}callee_with_stack_and_call:		; GCN-LABEL: {{^}}callee_with_stack_and_call:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:4 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:4 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN: v_writelane_b32 [[CSR_VGPR]], s33, 2
; GCN-DAG: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; GCN-DAG: s_add_u32 s32, s32, 0x400{{$}}		; MUBUF-DAG: s_add_u32 s32, s32, 0x400{{$}}
		; FLATSCR-DAG: s_add_u32 s32, s32, 16{{$}}
; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}		; GCN-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0{{$}}
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30,
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31,

; GCN-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}		; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33{{$}}
		; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33{{$}}

; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s5, [[CSR_VGPR]]		; GCN-DAG: v_readlane_b32 s5, [[CSR_VGPR]]
; GCN-DAG: v_readlane_b32 s4, [[CSR_VGPR]]		; GCN-DAG: v_readlane_b32 s4, [[CSR_VGPR]]

; GCN: s_sub_u32 s32, s32, 0x400{{$}}		; MUBUF: s_sub_u32 s32, s32, 0x400{{$}}
		; FLATSCR: s_sub_u32 s32, s32, 16{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:4 ; 4-byte Folded Reload
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:4 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)

; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_with_stack_and_call() #0 {		define void @callee_with_stack_and_call() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

; Should be able to copy incoming stack pointer directly to inner		; Should be able to copy incoming stack pointer directly to inner
; call's stack pointer argument.		; call's stack pointer argument.

; There is stack usage only because of the need to evict a VGPR for		; There is stack usage only because of the need to evict a VGPR for
; spilling CSR SGPRs.		; spilling CSR SGPRs.

; GCN-LABEL: {{^}}callee_no_stack_with_call:		; GCN-LABEL: {{^}}callee_no_stack_with_call:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-DAG: s_add_u32 s32, s32, 0x400		; MUBUF-DAG: s_add_u32 s32, s32, 0x400
		; FLATSCR-DAG: s_add_u32 s32, s32, 16
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s33, [[FP_SPILL_LANE:[0-9]+]]		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s33, [[FP_SPILL_LANE:[0-9]+]]

; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; GCN: s_swappc_b64		; GCN: s_swappc_b64

; GCN-DAG: v_readlane_b32 s4, v40, 0		; GCN-DAG: v_readlane_b32 s4, v40, 0
; GCN-DAG: v_readlane_b32 s5, v40, 1		; GCN-DAG: v_readlane_b32 s5, v40, 1

; GCN: s_sub_u32 s32, s32, 0x400		; MUBUF: s_sub_u32 s32, s32, 0x400
		; FLATSCR: s_sub_u32 s32, s32, 16
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], [[FP_SPILL_LANE]]		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], [[FP_SPILL_LANE]]
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_no_stack_with_call() #0 {		define void @callee_no_stack_with_call() #0 {
call void @external_void_func_void()		call void @external_void_func_void()
ret void		ret void
}		}

declare hidden void @external_void_func_void() #0		declare hidden void @external_void_func_void() #0

; Make sure if a CSR vgpr is used for SGPR spilling, it is saved and		; Make sure if a CSR vgpr is used for SGPR spilling, it is saved and
; restored. No FP is required.		; restored. No FP is required.
;		;
; GCN-LABEL: {{^}}callee_func_sgpr_spill_no_calls:		; GCN-LABEL: {{^}}callee_func_sgpr_spill_no_calls:
; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN: v_writelane_b32 [[CSR_VGPR]], s		; GCN: v_writelane_b32 [[CSR_VGPR]], s
; GCN: v_writelane_b32 [[CSR_VGPR]], s		; GCN: v_writelane_b32 [[CSR_VGPR]], s

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_readlane_b32 s{{[0-9]+}}, [[CSR_VGPR]]		; GCN: v_readlane_b32 s{{[0-9]+}}, [[CSR_VGPR]]
; GCN: v_readlane_b32 s{{[0-9]+}}, [[CSR_VGPR]]		; GCN: v_readlane_b32 s{{[0-9]+}}, [[CSR_VGPR]]

; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 ; 4-byte Folded Reload
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_func_sgpr_spill_no_calls(i32 %in) #0 {		define void @callee_func_sgpr_spill_no_calls(i32 %in) #0 {
call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0		call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}"() #0
call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0		call void asm sideeffect "", "~{v8},~{v9},~{v10},~{v11},~{v12},~{v13},~{v14},~{v15}"() #0
call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0		call void asm sideeffect "", "~{v16},~{v17},~{v18},~{v19},~{v20},~{v21},~{v22},~{v23}"() #0
call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0		call void asm sideeffect "", "~{v24},~{v25},~{v26},~{v27},~{v28},~{v29},~{v30},~{v31}"() #0
Show All 32 Lines
}		}

; TODO: Can the SP inc/deec be remvoed?		; TODO: Can the SP inc/deec be remvoed?
; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_csr_vgpr:		; GCN-LABEL: {{^}}callee_with_stack_no_fp_elim_csr_vgpr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT:s_mov_b32 [[FP_COPY:s[0-9]+]], s33		; GCN-NEXT:s_mov_b32 [[FP_COPY:s[0-9]+]], s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; GCN-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF-DAG: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
; GCN-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:8		; FLATSCR-DAG: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
		; MUBUF-DAG: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:8
		; FLATSCR-DAG: scratch_store_dword off, [[ZERO]], s33 offset:8

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-NEXT: ; clobber v41		; GCN-NEXT: ; clobber v41
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND

; GCN: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN: s_add_u32 s32, s32, 0x300		; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload
; GCN-NEXT: s_sub_u32 s32, s32, 0x300		; MUBUF: s_add_u32 s32, s32, 0x300
		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300
		; FLATSCR: s_add_u32 s32, s32, 12
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 12
; GCN-NEXT: s_mov_b32 s33, s4		; GCN-NEXT: s_mov_b32 s33, s4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @callee_with_stack_no_fp_elim_csr_vgpr() #1 {		define void @callee_with_stack_no_fp_elim_csr_vgpr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void asm sideeffect "; clobber v41", "~{v41}"()		call void asm sideeffect "; clobber v41", "~{v41}"()
ret void		ret void
}		}

; Use a copy to a free SGPR instead of introducing a second CSR VGPR.		; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:		; GCN-LABEL: {{^}}last_lane_vgpr_for_fp_csr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: v_writelane_b32 v1, s33, 63		; GCN-NEXT: v_writelane_b32 v1, s33, 63
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
		; FLATSCR: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; GCN-COUNT-63: v_writelane_b32 v1		; GCN-COUNT-63: v_writelane_b32 v1
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:8		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:8
		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33 offset:8
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-COUNT-63: v_readlane_b32 s{{[0-9]+}}, v1		; GCN-COUNT-63: v_readlane_b32 s{{[0-9]+}}, v1

; GCN: s_add_u32 s32, s32, 0x300		; MUBUF: s_add_u32 s32, s32, 0x300
; GCN-NEXT: s_sub_u32 s32, s32, 0x300		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300
		; FLATSCR: s_add_u32 s32, s32, 12
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 12
; GCN-NEXT: v_readlane_b32 s33, v1, 63		; GCN-NEXT: v_readlane_b32 s33, v1, 63
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @last_lane_vgpr_for_fp_csr() #1 {		define void @last_lane_vgpr_for_fp_csr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void asm sideeffect "; clobber v41", "~{v41}"()		call void asm sideeffect "; clobber v41", "~{v41}"()
call void asm sideeffect "",		call void asm sideeffect "",
"~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}		"~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}
,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}		,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}
,~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66},~{s67},~{s68},~{s69}		,~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66},~{s67},~{s68},~{s69}
,~{s70},~{s71},~{s72},~{s73},~{s74},~{s75},~{s76},~{s77},~{s78},~{s79}		,~{s70},~{s71},~{s72},~{s73},~{s74},~{s75},~{s76},~{s77},~{s78},~{s79}
,~{s80},~{s81},~{s82},~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89}		,~{s80},~{s81},~{s82},~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89}
,~{s90},~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98},~{s99}		,~{s90},~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98},~{s99}
,~{s100},~{s101},~{s102}"() #1		,~{s100},~{s101},~{s102}"() #1

ret void		ret void
}		}

; Use a copy to a free SGPR instead of introducing a second CSR VGPR.		; Use a copy to a free SGPR instead of introducing a second CSR VGPR.
; GCN-LABEL: {{^}}no_new_vgpr_for_fp_csr:		; GCN-LABEL: {{^}}no_new_vgpr_for_fp_csr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_mov_b32 [[FP_COPY:s[0-9]+]], s33		; GCN-NEXT: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword v41, off, s[0:3], s33 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, v41, s33 ; 4-byte Folded Spill
; GCN-COUNT-64: v_writelane_b32 v1,		; GCN-COUNT-64: v_writelane_b32 v1,

; GCN: buffer_store_dword		; MUBUF: buffer_store_dword
		; FLATSCR: scratch_store_dword
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v1		; GCN-COUNT-64: v_readlane_b32 s{{[0-9]+}}, v1

; GCN: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload		; MUBUF: buffer_load_dword v41, off, s[0:3], s33 ; 4-byte Folded Reload
; GCN: s_add_u32 s32, s32, 0x300		; FLATSCR: scratch_load_dword v41, off, s33 ; 4-byte Folded Reload
; GCN-NEXT: s_sub_u32 s32, s32, 0x300		; MUBUF: s_add_u32 s32, s32, 0x300
		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300
		; FLATSCR: s_add_u32 s32, s32, 12
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 12
; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]		; GCN-NEXT: s_mov_b32 s33, [[FP_COPY]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @no_new_vgpr_for_fp_csr() #1 {		define void @no_new_vgpr_for_fp_csr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void asm sideeffect "; clobber v41", "~{v41}"()		call void asm sideeffect "; clobber v41", "~{v41}"()
call void asm sideeffect "",		call void asm sideeffect "",
"~{s39},~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}		"~{s39},~{s40},~{s41},~{s42},~{s43},~{s44},~{s45},~{s46},~{s47},~{s48},~{s49}
,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}		,~{s50},~{s51},~{s52},~{s53},~{s54},~{s55},~{s56},~{s57},~{s58},~{s59}
,~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66},~{s67},~{s68},~{s69}		,~{s60},~{s61},~{s62},~{s63},~{s64},~{s65},~{s66},~{s67},~{s68},~{s69}
,~{s70},~{s71},~{s72},~{s73},~{s74},~{s75},~{s76},~{s77},~{s78},~{s79}		,~{s70},~{s71},~{s72},~{s73},~{s74},~{s75},~{s76},~{s77},~{s78},~{s79}
,~{s80},~{s81},~{s82},~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89}		,~{s80},~{s81},~{s82},~{s83},~{s84},~{s85},~{s86},~{s87},~{s88},~{s89}
,~{s90},~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98},~{s99}		,~{s90},~{s91},~{s92},~{s93},~{s94},~{s95},~{s96},~{s97},~{s98},~{s99}
,~{s100},~{s101},~{s102}"() #1		,~{s100},~{s101},~{s102}"() #1

ret void		ret void
}		}

; GCN-LABEL: {{^}}realign_stack_no_fp_elim:		; GCN-LABEL: {{^}}realign_stack_no_fp_elim:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_add_u32 [[SCRATCH:s[0-9]+]], s32, 0x7ffc0		; MUBUF-NEXT: s_add_u32 [[SCRATCH:s[0-9]+]], s32, 0x7ffc0
		; FLATSCR-NEXT: s_add_u32 [[SCRATCH:s[0-9]+]], s32, 0x1fff
; GCN-NEXT: s_mov_b32 s4, s33		; GCN-NEXT: s_mov_b32 s4, s33
; GCN-NEXT: s_and_b32 s33, [[SCRATCH]], 0xfff80000		; MUBUF-NEXT: s_and_b32 s33, [[SCRATCH]], 0xfff80000
; GCN-NEXT: s_add_u32 s32, s32, 0x100000		; FLATSCR-NEXT: s_and_b32 s33, [[SCRATCH]], 0xffffe000
		; MUBUF-NEXT: s_add_u32 s32, s32, 0x100000
		; FLATSCR-NEXT: s_add_u32 s32, s32, 0x4000
; GCN-NEXT: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN-NEXT: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; GCN-NEXT: buffer_store_dword [[ZERO]], off, s[0:3], s33		; MUBUF-NEXT: buffer_store_dword [[ZERO]], off, s[0:3], s33
; GCN-NEXT: s_sub_u32 s32, s32, 0x100000		; FLATSCR-NEXT: scratch_store_dword off, [[ZERO]], s33
		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x100000
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x4000
; GCN-NEXT: s_mov_b32 s33, s4		; GCN-NEXT: s_mov_b32 s33, s4
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @realign_stack_no_fp_elim() #1 {		define void @realign_stack_no_fp_elim() #1 {
%alloca = alloca i32, align 8192, addrspace(5)		%alloca = alloca i32, align 8192, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
ret void		ret void
}		}

; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: v_writelane_b32 v1, s33, 2		; GCN-NEXT: v_writelane_b32 v1, s33, 2
; GCN-NEXT: v_writelane_b32 v1, s30, 0		; GCN-NEXT: v_writelane_b32 v1, s30, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32
; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GCN: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; GCN: v_writelane_b32 v1, s31, 1		; GCN: v_writelane_b32 v1, s31, 1
; GCN: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:4		; MUBUF: buffer_store_dword [[ZERO]], off, s[0:3], s33 offset:4
		; FLATSCR: scratch_store_dword off, [[ZERO]], s33 offset:4
; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART
; GCN: v_readlane_b32 s4, v1, 0		; GCN: v_readlane_b32 s4, v1, 0
; GCN-NEXT: s_add_u32 s32, s32, 0x200		; MUBUF-NEXT: s_add_u32 s32, s32, 0x200
		; FLATSCR-NEXT: s_add_u32 s32, s32, 8
; GCN-NEXT: v_readlane_b32 s5, v1, 1		; GCN-NEXT: v_readlane_b32 s5, v1, 1
; GCN-NEXT: s_sub_u32 s32, s32, 0x200		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x200
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 8
; GCN-NEXT: v_readlane_b32 s33, v1, 2		; GCN-NEXT: v_readlane_b32 s33, v1, 2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[4:5]		; GCN-NEXT: s_setpc_b64 s[4:5]
define void @no_unused_non_csr_sgpr_for_fp() #1 {		define void @no_unused_non_csr_sgpr_for_fp() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

; Use all clobberable registers, so FP has to spill to a VGPR.		; Use all clobberable registers, so FP has to spill to a VGPR.
call void asm sideeffect "",		call void asm sideeffect "",
"~{s0},~{s1},~{s2},~{s3},~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s0},~{s1},~{s2},~{s3},~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}		,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}		,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
,~{s30},~{s31}"() #0		,~{s30},~{s31}"() #0

ret void		ret void
}		}

; Need a new CSR VGPR to satisfy the FP spill.		; Need a new CSR VGPR to satisfy the FP spill.
; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:		; GCN-LABEL: {{^}}no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], off, s[0:3], s32 offset:8 ; 4-byte Folded Spill
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], s32 offset:8 ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-NEXT: s_mov_b32 s33, s32

; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; GCN-DAG: buffer_store_dword		; MUBUF-DAG: buffer_store_dword
; GCN: s_add_u32 s32, s32, 0x300{{$}}		; FLATSCR-DAG: scratch_store_dword
		; MUBUF: s_add_u32 s32, s32, 0x300{{$}}
		; FLATSCR: s_add_u32 s32, s32, 12{{$}}

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART

; GCN: v_readlane_b32 s4, [[CSR_VGPR]], 0		; GCN: v_readlane_b32 s4, [[CSR_VGPR]], 0
; GCN-NEXT: v_readlane_b32 s5, [[CSR_VGPR]], 1		; GCN-NEXT: v_readlane_b32 s5, [[CSR_VGPR]], 1
; GCN-NEXT: s_sub_u32 s32, s32, 0x300{{$}}		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x300{{$}}
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 12{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], off, s[0:3], s32 offset:8 ; 4-byte Folded Reload
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, s32 offset:8 ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {		define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

; Use all clobberable registers, so FP has to spill to a VGPR.		; Use all clobberable registers, so FP has to spill to a VGPR.
Show All 12 Lines	define void @no_unused_non_csr_sgpr_for_fp_no_scratch_vgpr() #1 {
ret void		ret void
}		}

; The byval argument exceeds the MUBUF constant offset, so a scratch		; The byval argument exceeds the MUBUF constant offset, so a scratch
; register is needed to access the CSR VGPR slot.		; register is needed to access the CSR VGPR slot.
; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:		; GCN-LABEL: {{^}}scratch_reg_needed_mubuf_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC0:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008		; MUBUF-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008
; GCN-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Spill		; MUBUF-NEXT: buffer_store_dword [[CSR_VGPR:v[0-9]+]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Spill
		; FLATSCR-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008
		; FLATSCR-NEXT: scratch_store_dword off, [[CSR_VGPR:v[0-9]+]], [[SCRATCH_SGPR]] ; 4-byte Folded Spill
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC0]]
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2		; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s33, 2
; GCN-NEXT: v_writelane_b32 [[CSR_VGPR]], s30, 0		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s30, 0
; GCN-NEXT: s_mov_b32 s33, s32		; GCN-DAG: s_mov_b32 s33, s32
; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1		; GCN-DAG: v_writelane_b32 [[CSR_VGPR]], s31, 1
; GCN-DAG: s_add_u32 s32, s32, 0x40300{{$}}		; MUBUF-DAG: s_add_u32 s32, s32, 0x40300{{$}}
; GCN-DAG: buffer_store_dword		; FLATSCR-DAG: s_add_u32 s32, s32, 0x100c{{$}}
		; MUBUF-DAG: buffer_store_dword
		; FLATSCR-DAG: scratch_store_dword

; GCN: ;;#ASMSTART		; GCN: ;;#ASMSTART

; GCN: v_readlane_b32 s4, [[CSR_VGPR]], 0		; GCN: v_readlane_b32 s4, [[CSR_VGPR]], 0
; GCN-NEXT: v_readlane_b32 s5, [[CSR_VGPR]], 1		; GCN-NEXT: v_readlane_b32 s5, [[CSR_VGPR]], 1
; GCN-NEXT: s_sub_u32 s32, s32, 0x40300{{$}}		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x40300{{$}}
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x100c{{$}}
; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2		; GCN-NEXT: v_readlane_b32 s33, [[CSR_VGPR]], 2
; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN-NEXT: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008		; MUBUF-NEXT: v_mov_b32_e32 [[SCRATCH_VGPR:v[0-9]+]], 0x1008
; GCN-NEXT: buffer_load_dword [[CSR_VGPR]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Reload		; MUBUF-NEXT: buffer_load_dword [[CSR_VGPR]], [[SCRATCH_VGPR]], s[0:3], s32 offen ; 4-byte Folded Reload
		; FLATSCR-NEXT: s_add_u32 [[SCRATCH_SGPR:s[0-9]+]], s32, 0x1008
		; FLATSCR-NEXT: scratch_load_dword [[CSR_VGPR]], off, [[SCRATCH_SGPR]] ; 4-byte Folded Reload
; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN-NEXT: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NEXT: s_waitcnt vmcnt(0)		; GCN-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval align 4 %arg) #1 {		define void @scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval align 4 %arg) #1 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

; Use all clobberable registers, so FP has to spill to a VGPR.		; Use all clobberable registers, so FP has to spill to a VGPR.
Show All 20 Lines	define internal void @local_empty_func() #0 {
ret void		ret void
}		}

; An FP is needed, despite not needing any spills		; An FP is needed, despite not needing any spills
; TODO: Ccould see callee does not use stack and omit FP.		; TODO: Ccould see callee does not use stack and omit FP.
; GCN-LABEL: {{^}}ipra_call_with_stack:		; GCN-LABEL: {{^}}ipra_call_with_stack:
; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33		; GCN: s_mov_b32 [[FP_COPY:s[0-9]+]], s33
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; GCN: s_add_u32 s32, s32, 0x400		; MUBUF: s_add_u32 s32, s32, 0x400
; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}		; FLATSCR: s_add_u32 s32, s32, 16
		; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33{{$}}
		; FLATSCR: scratch_store_dword off, v{{[0-9]+}}, s33{{$}}
; GCN: s_swappc_b64		; GCN: s_swappc_b64
; GCN: s_sub_u32 s32, s32, 0x400		; MUBUF: s_sub_u32 s32, s32, 0x400
		; FLATSCR: s_sub_u32 s32, s32, 16
; GCN: s_mov_b32 s33, [[FP_COPY:s[0-9]+]]		; GCN: s_mov_b32 s33, [[FP_COPY:s[0-9]+]]
define void @ipra_call_with_stack() #0 {		define void @ipra_call_with_stack() #0 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca
call void @local_empty_func()		call void @local_empty_func()
ret void		ret void
}		}

; With no free registers, we must spill the FP to memory.		; With no free registers, we must spill the FP to memory.
; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_memory:		; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_memory:
; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN: v_mov_b32_e32 [[TMP_VGPR1:v[0-9]+]], s33		; GCN: v_mov_b32_e32 [[TMP_VGPR1:v[0-9]+]], s33
; GCN: buffer_store_dword [[TMP_VGPR1]], off, s[0:3], s32 offset:4		; MUBUF: buffer_store_dword [[TMP_VGPR1]], off, s[0:3], s32 offset:4
		; FLATSCR: scratch_store_dword off, [[TMP_VGPR1]], s32 offset:4
; GCN: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; GCN: s_or_saveexec_b64 [[COPY_EXEC2:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC2:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN: buffer_load_dword [[TMP_VGPR2:v[0-9]+]], off, s[0:3], s32 offset:4		; MUBUF: buffer_load_dword [[TMP_VGPR2:v[0-9]+]], off, s[0:3], s32 offset:4
		; FLATSCR: scratch_load_dword [[TMP_VGPR2:v[0-9]+]], off, s32 offset:4
; GCN: s_waitcnt vmcnt(0)		; GCN: s_waitcnt vmcnt(0)
; GCN: v_readfirstlane_b32 s33, [[TMP_VGPR2]]		; GCN: v_readfirstlane_b32 s33, [[TMP_VGPR2]]
; GCN: s_mov_b64 exec, [[COPY_EXEC2]]		; GCN: s_mov_b64 exec, [[COPY_EXEC2]]
; GCN: s_setpc_b64		; GCN: s_setpc_b64
; GCN: ScratchSize: 8		; GCN: ScratchSize: 8
define void @callee_need_to_spill_fp_to_memory() #3 {		define void @callee_need_to_spill_fp_to_memory() #3 {
call void asm sideeffect "; clobber nonpreserved SGPRs",		call void asm sideeffect "; clobber nonpreserved SGPRs",
"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
Show All 10 Lines
}		}

; If we have a reserved VGPR that can be used for SGPR spills, we may still		; If we have a reserved VGPR that can be used for SGPR spills, we may still
; need to spill the FP to memory if there are no free lanes in the reserved		; need to spill the FP to memory if there are no free lanes in the reserved
; VGPR.		; VGPR.
; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_memory_full_reserved_vgpr:		; GCN-LABEL: {{^}}callee_need_to_spill_fp_to_memory_full_reserved_vgpr:
; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC1:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN: v_mov_b32_e32 [[TMP_VGPR1:v[0-9]+]], s33		; GCN: v_mov_b32_e32 [[TMP_VGPR1:v[0-9]+]], s33
; GCN: buffer_store_dword [[TMP_VGPR1]], off, s[0:3], s32 offset:[[OFF:[0-9]+]]		; MUBUF: buffer_store_dword [[TMP_VGPR1]], off, s[0:3], s32 offset:[[OFF:[0-9]+]]
		; FLATSCR: scratch_store_dword off, [[TMP_VGPR1]], s32 offset:[[OFF:[0-9]+]]
; GCN: s_mov_b64 exec, [[COPY_EXEC1]]		; GCN: s_mov_b64 exec, [[COPY_EXEC1]]
; GCN-NOT: v_writelane_b32 v40, s33		; GCN-NOT: v_writelane_b32 v40, s33
; GCN: s_mov_b32 s33, s32		; GCN: s_mov_b32 s33, s32
; GCN-NOT: v_readlane_b32 s33, v40		; GCN-NOT: v_readlane_b32 s33, v40
; GCN: s_or_saveexec_b64 [[COPY_EXEC2:s\[[0-9]+:[0-9]+\]]], -1{{$}}		; GCN: s_or_saveexec_b64 [[COPY_EXEC2:s\[[0-9]+:[0-9]+\]]], -1{{$}}
; GCN: buffer_load_dword [[TMP_VGPR2:v[0-9]+]], off, s[0:3], s32 offset:[[OFF]]		; MUBUF: buffer_load_dword [[TMP_VGPR2:v[0-9]+]], off, s[0:3], s32 offset:[[OFF]]
		; FLATSCR: scratch_load_dword [[TMP_VGPR2:v[0-9]+]], off, s32 offset:[[OFF]]
; GCN: v_readfirstlane_b32 s33, [[TMP_VGPR2]]		; GCN: v_readfirstlane_b32 s33, [[TMP_VGPR2]]
; GCN: s_mov_b64 exec, [[COPY_EXEC2]]		; GCN: s_mov_b64 exec, [[COPY_EXEC2]]
; GCN: s_setpc_b64		; GCN: s_setpc_b64
define void @callee_need_to_spill_fp_to_memory_full_reserved_vgpr() #3 {		define void @callee_need_to_spill_fp_to_memory_full_reserved_vgpr() #3 {
call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",		call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}		,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}		,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
Show All 12 Lines	call void asm sideeffect "; clobber all VGPRs except CSR v40",
,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38}"()		,~{v30},~{v31},~{v32},~{v33},~{v34},~{v35},~{v36},~{v37},~{v38}"()
ret void		ret void
}		}

; If the size of the offset exceeds the MUBUF offset field we need another		; If the size of the offset exceeds the MUBUF offset field we need another
; scratch VGPR to hold the offset.		; scratch VGPR to hold the offset.
; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset		; GCN-LABEL: {{^}}spill_fp_to_memory_scratch_reg_needed_mubuf_offset
; GCN: s_or_saveexec_b64 s[4:5], -1		; GCN: s_or_saveexec_b64 s[4:5], -1
; GCN: v_mov_b32_e32 v0, s33		; MUBUF: v_mov_b32_e32 v0, s33
; GCN-NOT: v_mov_b32_e32 v0, 0x1008		; GCN-NOT: v_mov_b32_e32 v0, 0x1008
; GCN-NEXT: v_mov_b32_e32 v1, 0x1008		; MUBUF-NEXT: v_mov_b32_e32 v1, 0x1008
; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], s32 offen		; MUBUF-NEXT: buffer_store_dword v0, v1, s[0:3], s32 offen ; 4-byte Folded Spill
		; FLATSCR-NEXT: s_add_u32 [[SOFF:s[0-9]+]], s32, 0x1008
		; FLATSCR-NEXT: v_mov_b32_e32 v0, s33
		; FLATSCR-NEXT: scratch_store_dword off, v0, [[SOFF]] ; 4-byte Folded Spill
define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval align 4 %arg) #3 {		define void @spill_fp_to_memory_scratch_reg_needed_mubuf_offset([4096 x i8] addrspace(5)* byval align 4 %arg) #3 {
%alloca = alloca i32, addrspace(5)		%alloca = alloca i32, addrspace(5)
store volatile i32 0, i32 addrspace(5)* %alloca		store volatile i32 0, i32 addrspace(5)* %alloca

call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",		call void asm sideeffect "; clobber nonpreserved SGPRs and 64 CSRs",
"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}		"~{s4},~{s5},~{s6},~{s7},~{s8},~{s9}
,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}		,~{s10},~{s11},~{s12},~{s13},~{s14},~{s15},~{s16},~{s17},~{s18},~{s19}
,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}		,~{s20},~{s21},~{s22},~{s23},~{s24},~{s25},~{s26},~{s27},~{s28},~{s29}
Show All 20 Lines

llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s
		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=GCN,FLATSCR %s

define <2 x half> @chain_hi_to_lo_private() {		define <2 x half> @chain_hi_to_lo_private() {
; GCN-LABEL: chain_hi_to_lo_private:		; GFX900-LABEL: chain_hi_to_lo_private:
; GCN: ; %bb.0: ; %bb		; GFX900: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:2		; GFX900-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], 0		; GFX900-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], 0
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
		;
		; FLATSCR-LABEL: chain_hi_to_lo_private:
		; FLATSCR: ; %bb.0: ; %bb
		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; FLATSCR-NEXT: s_mov_b32 s4, 2
		; FLATSCR-NEXT: scratch_load_ushort v0, off, s4
		; FLATSCR-NEXT: s_mov_b32 s4, 0
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: scratch_load_short_d16_hi v0, off, s4
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%gep_lo = getelementptr inbounds half, half addrspace(5)* null, i64 1		%gep_lo = getelementptr inbounds half, half addrspace(5)* null, i64 1
%load_lo = load half, half addrspace(5)* %gep_lo		%load_lo = load half, half addrspace(5)* %gep_lo
%gep_hi = getelementptr inbounds half, half addrspace(5)* null, i64 0		%gep_hi = getelementptr inbounds half, half addrspace(5)* null, i64 0
%load_hi = load half, half addrspace(5)* %gep_hi		%load_hi = load half, half addrspace(5)* %gep_hi

%temp = insertelement <2 x half> undef, half %load_lo, i32 0		%temp = insertelement <2 x half> undef, half %load_lo, i32 0
%result = insertelement <2 x half> %temp, half %load_hi, i32 1		%result = insertelement <2 x half> %temp, half %load_hi, i32 1

ret <2 x half> %result		ret <2 x half> %result
}		}

define <2 x half> @chain_hi_to_lo_private_different_bases(half addrspace(5)* %base_lo, half addrspace(5)* %base_hi) {		define <2 x half> @chain_hi_to_lo_private_different_bases(half addrspace(5)* %base_lo, half addrspace(5)* %base_hi) {
; GCN-LABEL: chain_hi_to_lo_private_different_bases:		; GFX900-LABEL: chain_hi_to_lo_private_different_bases:
; GCN: ; %bb.0: ; %bb		; GFX900: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen		; GFX900-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v0, v1, s[0:3], 0 offen		; GFX900-NEXT: buffer_load_short_d16_hi v0, v1, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
		;
		; FLATSCR-LABEL: chain_hi_to_lo_private_different_bases:
		; FLATSCR: ; %bb.0: ; %bb
		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; FLATSCR-NEXT: scratch_load_ushort v0, v0, off
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: scratch_load_short_d16_hi v0, v1, off
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%load_lo = load half, half addrspace(5)* %base_lo		%load_lo = load half, half addrspace(5)* %base_lo
%load_hi = load half, half addrspace(5)* %base_hi		%load_hi = load half, half addrspace(5)* %base_hi

%temp = insertelement <2 x half> undef, half %load_lo, i32 0		%temp = insertelement <2 x half> undef, half %load_lo, i32 0
%result = insertelement <2 x half> %temp, half %load_hi, i32 1		%result = insertelement <2 x half> %temp, half %load_hi, i32 1

ret <2 x half> %result		ret <2 x half> %result
}		}

define <2 x half> @chain_hi_to_lo_arithmatic(half addrspace(5)* %base, half %in) {		define <2 x half> @chain_hi_to_lo_arithmatic(half addrspace(5)* %base, half %in) {
; GCN-LABEL: chain_hi_to_lo_arithmatic:		; GFX900-LABEL: chain_hi_to_lo_arithmatic:
; GCN: ; %bb.0: ; %bb		; GFX900: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: v_add_f16_e32 v1, 1.0, v1		; GFX900-NEXT: v_add_f16_e32 v1, 1.0, v1
; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], 0 offen		; GFX900-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, v1		; GFX900-NEXT: v_mov_b32_e32 v0, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
		;
		; FLATSCR-LABEL: chain_hi_to_lo_arithmatic:
		; FLATSCR: ; %bb.0: ; %bb
		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; FLATSCR-NEXT: v_add_f16_e32 v1, 1.0, v1
		; FLATSCR-NEXT: scratch_load_short_d16_hi v1, v0, off
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: v_mov_b32_e32 v0, v1
		; FLATSCR-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%arith_lo = fadd half %in, 1.0		%arith_lo = fadd half %in, 1.0
%load_hi = load half, half addrspace(5)* %base		%load_hi = load half, half addrspace(5)* %base

%temp = insertelement <2 x half> undef, half %arith_lo, i32 0		%temp = insertelement <2 x half> undef, half %arith_lo, i32 0
%result = insertelement <2 x half> %temp, half %load_hi, i32 1		%result = insertelement <2 x half> %temp, half %load_hi, i32 1

ret <2 x half> %result		ret <2 x half> %result
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	bb:
%temp = insertelement <2 x half> undef, half %load_lo, i32 0		%temp = insertelement <2 x half> undef, half %load_lo, i32 0
%result = insertelement <2 x half> %temp, half %load_hi, i32 1		%result = insertelement <2 x half> %temp, half %load_hi, i32 1

ret <2 x half> %result		ret <2 x half> %result
}		}

; Make sure we don't lose any of the private stores.		; Make sure we don't lose any of the private stores.
define amdgpu_kernel void @vload2_private(i16 addrspace(1)* nocapture readonly %in, <2 x i16> addrspace(1)* nocapture %out) #0 {		define amdgpu_kernel void @vload2_private(i16 addrspace(1)* nocapture readonly %in, <2 x i16> addrspace(1)* nocapture %out) #0 {
; GCN-LABEL: vload2_private:		; GFX900-LABEL: vload2_private:
; GCN: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9		; GFX900-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; GFX900-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0		; GFX900-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
; GCN-NEXT: s_add_u32 s0, s0, s9		; GFX900-NEXT: s_add_u32 s0, s0, s9
; GCN-NEXT: s_addc_u32 s1, s1, 0		; GFX900-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GFX900-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, s4		; GFX900-NEXT: v_mov_b32_e32 v0, s4
; GCN-NEXT: v_mov_b32_e32 v1, s5		; GFX900-NEXT: v_mov_b32_e32 v1, s5
; GCN-NEXT: global_load_ushort v2, v[0:1], off		; GFX900-NEXT: global_load_ushort v2, v[0:1], off
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v2, off, s[0:3], 0 offset:4		; GFX900-NEXT: buffer_store_short v2, off, s[0:3], 0 offset:4
; GCN-NEXT: global_load_ushort v2, v[0:1], off offset:2		; GFX900-NEXT: global_load_ushort v2, v[0:1], off offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v2, off, s[0:3], 0 offset:6		; GFX900-NEXT: buffer_store_short v2, off, s[0:3], 0 offset:6
; GCN-NEXT: global_load_ushort v2, v[0:1], off offset:4		; GFX900-NEXT: global_load_ushort v2, v[0:1], off offset:4
; GCN-NEXT: v_mov_b32_e32 v0, s6		; GFX900-NEXT: v_mov_b32_e32 v0, s6
; GCN-NEXT: v_mov_b32_e32 v1, s7		; GFX900-NEXT: v_mov_b32_e32 v1, s7
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: buffer_store_short v2, off, s[0:3], 0 offset:8		; GFX900-NEXT: buffer_store_short v2, off, s[0:3], 0 offset:8
; GCN-NEXT: buffer_load_ushort v2, off, s[0:3], 0 offset:4		; GFX900-NEXT: buffer_load_ushort v2, off, s[0:3], 0 offset:4
; GCN-NEXT: buffer_load_ushort v4, off, s[0:3], 0 offset:6		; GFX900-NEXT: buffer_load_ushort v4, off, s[0:3], 0 offset:6
; GCN-NEXT: s_waitcnt vmcnt(1)		; GFX900-NEXT: s_waitcnt vmcnt(1)
; GCN-NEXT: v_and_b32_e32 v2, 0xffff, v2		; GFX900-NEXT: v_and_b32_e32 v2, 0xffff, v2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v3, v4		; GFX900-NEXT: v_mov_b32_e32 v3, v4
; GCN-NEXT: buffer_load_short_d16_hi v3, off, s[0:3], 0 offset:8		; GFX900-NEXT: buffer_load_short_d16_hi v3, off, s[0:3], 0 offset:8
; GCN-NEXT: v_lshl_or_b32 v2, v4, 16, v2		; GFX900-NEXT: v_lshl_or_b32 v2, v4, 16, v2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: global_store_dwordx2 v[0:1], v[2:3], off		; GFX900-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
; GCN-NEXT: s_endpgm		; GFX900-NEXT: s_endpgm
		;
		; FLATSCR-LABEL: vload2_private:
		; FLATSCR: ; %bb.0: ; %entry
		; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s6, s9
		; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
		; FLATSCR-NEXT: s_load_dwordx4 s[4:7], s[4:5], 0x0
		; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
		; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
		; FLATSCR-NEXT: v_mov_b32_e32 v0, s4
		; FLATSCR-NEXT: v_mov_b32_e32 v1, s5
		; FLATSCR-NEXT: global_load_ushort v2, v[0:1], off
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: scratch_store_short off, v2, vcc_hi offset:4
		; FLATSCR-NEXT: global_load_ushort v2, v[0:1], off offset:2
		; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: scratch_store_short off, v2, vcc_hi offset:6
		; FLATSCR-NEXT: global_load_ushort v2, v[0:1], off offset:4
		; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
		; FLATSCR-NEXT: v_mov_b32_e32 v0, s6
		; FLATSCR-NEXT: v_mov_b32_e32 v1, s7
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: scratch_store_short off, v2, vcc_hi offset:8
		; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
		; FLATSCR-NEXT: scratch_load_ushort v2, off, vcc_hi offset:4
		; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
		; FLATSCR-NEXT: scratch_load_ushort v4, off, vcc_hi offset:6
		; FLATSCR-NEXT: s_mov_b32 vcc_hi, 0
		; FLATSCR-NEXT: s_waitcnt vmcnt(1)
		; FLATSCR-NEXT: v_and_b32_e32 v2, 0xffff, v2
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: v_mov_b32_e32 v3, v4
		; FLATSCR-NEXT: scratch_load_short_d16_hi v3, off, vcc_hi offset:8
		; FLATSCR-NEXT: v_lshl_or_b32 v2, v4, 16, v2
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
		; FLATSCR-NEXT: s_endpgm
entry:		entry:
%loc = alloca [3 x i16], align 2, addrspace(5)		%loc = alloca [3 x i16], align 2, addrspace(5)
%loc.0.sroa_cast1 = bitcast [3 x i16] addrspace(5)* %loc to i8 addrspace(5)*		%loc.0.sroa_cast1 = bitcast [3 x i16] addrspace(5)* %loc to i8 addrspace(5)*
%tmp = load i16, i16 addrspace(1)* %in, align 2		%tmp = load i16, i16 addrspace(1)* %in, align 2
%loc.0.sroa_idx = getelementptr inbounds [3 x i16], [3 x i16] addrspace(5)* %loc, i32 0, i32 0		%loc.0.sroa_idx = getelementptr inbounds [3 x i16], [3 x i16] addrspace(5)* %loc, i32 0, i32 0
store volatile i16 %tmp, i16 addrspace(5)* %loc.0.sroa_idx		store volatile i16 %tmp, i16 addrspace(5)* %loc.0.sroa_idx
%arrayidx.1 = getelementptr inbounds i16, i16 addrspace(1)* %in, i64 1		%arrayidx.1 = getelementptr inbounds i16, i16 addrspace(1)* %in, i64 1
%tmp1 = load i16, i16 addrspace(1)* %arrayidx.1, align 2		%tmp1 = load i16, i16 addrspace(1)* %arrayidx.1, align 2
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	bb:
%load_hi = load volatile i16, i16 addrspace(3)* %gep_hi		%load_hi = load volatile i16, i16 addrspace(3)* %gep_hi
%to.hi = insertelement <2 x i16> undef, i16 %load_hi, i32 1		%to.hi = insertelement <2 x i16> undef, i16 %load_hi, i32 1
%op.hi = add <2 x i16> %to.hi, <i16 12, i16 12>		%op.hi = add <2 x i16> %to.hi, <i16 12, i16 12>
%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0		%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0
ret <2 x i16> %result		ret <2 x i16> %result
}		}

define <2 x i16> @chain_hi_to_lo_private_other_dep(i16 addrspace(5)* %ptr) {		define <2 x i16> @chain_hi_to_lo_private_other_dep(i16 addrspace(5)* %ptr) {
; GCN-LABEL: chain_hi_to_lo_private_other_dep:		; GFX900-LABEL: chain_hi_to_lo_private_other_dep:
; GCN: ; %bb.0: ; %bb		; GFX900: ; %bb.0: ; %bb
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], 0 offen		; GFX900-NEXT: buffer_load_short_d16_hi v1, v0, s[0:3], 0 offen
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_pk_sub_u16 v1, v1, -12 op_sel_hi:[1,0]		; GFX900-NEXT: v_pk_sub_u16 v1, v1, -12 op_sel_hi:[1,0]
; GCN-NEXT: buffer_load_short_d16 v1, v0, s[0:3], 0 offen offset:2		; GFX900-NEXT: buffer_load_short_d16 v1, v0, s[0:3], 0 offen offset:2
; GCN-NEXT: s_waitcnt vmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, v1		; GFX900-NEXT: v_mov_b32_e32 v0, v1
; GCN-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
		;
		; FLATSCR-LABEL: chain_hi_to_lo_private_other_dep:
		; FLATSCR: ; %bb.0: ; %bb
		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; FLATSCR-NEXT: scratch_load_short_d16_hi v1, v0, off
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: v_pk_sub_u16 v1, v1, -12 op_sel_hi:[1,0]
		; FLATSCR-NEXT: scratch_load_short_d16 v1, v0, off offset:2
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: v_mov_b32_e32 v0, v1
		; FLATSCR-NEXT: s_setpc_b64 s[30:31]
bb:		bb:
%gep_lo = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 1		%gep_lo = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 1
%load_lo = load i16, i16 addrspace(5)* %gep_lo		%load_lo = load i16, i16 addrspace(5)* %gep_lo
%gep_hi = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 0		%gep_hi = getelementptr inbounds i16, i16 addrspace(5)* %ptr, i64 0
%load_hi = load i16, i16 addrspace(5)* %gep_hi		%load_hi = load i16, i16 addrspace(5)* %gep_hi
%to.hi = insertelement <2 x i16> undef, i16 %load_hi, i32 1		%to.hi = insertelement <2 x i16> undef, i16 %load_hi, i32 1
%op.hi = add <2 x i16> %to.hi, <i16 12, i16 12>		%op.hi = add <2 x i16> %to.hi, <i16 12, i16 12>
%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0		%result = insertelement <2 x i16> %op.hi, i16 %load_lo, i32 0
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=-unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-ALIGNED %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=-unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-ALIGNED %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-UNALIGNED %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX7-UNALIGNED %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-scratch-access < %s \| FileCheck -check-prefixes=GCN,GFX9 %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+unaligned-scratch-access -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=GCN,GFX9-FLASTSCR %s

	; Should not merge this to a dword load			; Should not merge this to a dword load
	define i32 @private_load_2xi16_align2(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align2(i16 addrspace(5)* %p) #0 {
	; GFX7-ALIGNED-LABEL: private_load_2xi16_align2:			; GFX7-ALIGNED-LABEL: private_load_2xi16_align2:
	; GFX7-ALIGNED: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0			; GFX7-ALIGNED-NEXT: v_add_i32_e32 v1, vcc, 2, v0
	; GFX7-ALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen			; GFX7-ALIGNED-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen
	Show All 17 Lines
	; GFX9-LABEL: private_load_2xi16_align2:			; GFX9-LABEL: private_load_2xi16_align2:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_ushort v1, v0, s[0:3], 0 offen			; GFX9-NEXT: buffer_load_ushort v1, v0, s[0:3], 0 offen
	; GFX9-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen offset:2			; GFX9-NEXT: buffer_load_ushort v0, v0, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_lshl_or_b32 v0, v0, 16, v1			; GFX9-NEXT: v_lshl_or_b32 v0, v0, 16, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-FLASTSCR-LABEL: private_load_2xi16_align2:
				; GFX9-FLASTSCR: ; %bb.0:
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-FLASTSCR-NEXT: scratch_load_ushort v1, v0, off
				; GFX9-FLASTSCR-NEXT: scratch_load_ushort v0, v0, off offset:2
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0)
				; GFX9-FLASTSCR-NEXT: v_lshl_or_b32 v0, v0, 16, v1
				; GFX9-FLASTSCR-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 2			%p.0 = load i16, i16 addrspace(5)* %p, align 2
	%p.1 = load i16, i16 addrspace(5)* %gep.p, align 2			%p.1 = load i16, i16 addrspace(5)* %gep.p, align 2
	%zext.0 = zext i16 %p.0 to i32			%zext.0 = zext i16 %p.0 to i32
	%zext.1 = zext i16 %p.1 to i32			%zext.1 = zext i16 %p.1 to i32
	%shl.1 = shl i32 %zext.1, 16			%shl.1 = shl i32 %zext.1, 16
	%or = or i32 %zext.0, %shl.1			%or = or i32 %zext.0, %shl.1
	ret i32 %or			ret i32 %or
	Show All 27 Lines
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 1			; GFX9-NEXT: v_mov_b32_e32 v0, 1
	; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: v_mov_b32_e32 v0, 2			; GFX9-NEXT: v_mov_b32_e32 v0, 2
	; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen offset:2			; GFX9-NEXT: buffer_store_short v0, v1, s[0:3], 0 offen offset:2
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-FLASTSCR-LABEL: private_store_2xi16_align2:
				; GFX9-FLASTSCR: ; %bb.0:
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-FLASTSCR-NEXT: v_mov_b32_e32 v0, 1
				; GFX9-FLASTSCR-NEXT: scratch_store_short v1, v0, off
				; GFX9-FLASTSCR-NEXT: v_mov_b32_e32 v0, 2
				; GFX9-FLASTSCR-NEXT: scratch_store_short v1, v0, off offset:2
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0)
				; GFX9-FLASTSCR-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 2			store i16 1, i16 addrspace(5)* %r, align 2
	store i16 2, i16 addrspace(5)* %gep.r, align 2			store i16 2, i16 addrspace(5)* %gep.r, align 2
	ret void			ret void
	}			}

	; Should produce align 1 dword when legal			; Should produce align 1 dword when legal
	define i32 @private_load_2xi16_align1(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align1(i16 addrspace(5)* %p) #0 {
	Show All 30 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen			; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff
	; GFX9-NEXT: s_mov_b32 s4, 0xffff			; GFX9-NEXT: s_mov_b32 s4, 0xffff
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0			; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0
	; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-FLASTSCR-LABEL: private_load_2xi16_align1:
				; GFX9-FLASTSCR: ; %bb.0:
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-FLASTSCR-NEXT: scratch_load_dword v0, v0, off
				; GFX9-FLASTSCR-NEXT: v_mov_b32_e32 v1, 0xffff
				; GFX9-FLASTSCR-NEXT: s_mov_b32 s4, 0xffff
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0)
				; GFX9-FLASTSCR-NEXT: v_bfi_b32 v1, v1, 0, v0
				; GFX9-FLASTSCR-NEXT: v_and_or_b32 v0, v0, s4, v1
				; GFX9-FLASTSCR-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 1			%p.0 = load i16, i16 addrspace(5)* %p, align 1
	%p.1 = load i16, i16 addrspace(5)* %gep.p, align 1			%p.1 = load i16, i16 addrspace(5)* %gep.p, align 1
	%zext.0 = zext i16 %p.0 to i32			%zext.0 = zext i16 %p.0 to i32
	%zext.1 = zext i16 %p.1 to i32			%zext.1 = zext i16 %p.1 to i32
	%shl.1 = shl i32 %zext.1, 16			%shl.1 = shl i32 %zext.1, 16
	%or = or i32 %zext.0, %shl.1			%or = or i32 %zext.0, %shl.1
	ret i32 %or			ret i32 %or
	Show All 27 Lines
	;			;
	; GFX9-LABEL: private_store_2xi16_align1:			; GFX9-LABEL: private_store_2xi16_align1:
	; GFX9: ; %bb.0:			; GFX9: ; %bb.0:
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
	; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-FLASTSCR-LABEL: private_store_2xi16_align1:
				; GFX9-FLASTSCR: ; %bb.0:
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-FLASTSCR-NEXT: v_mov_b32_e32 v0, 0x20001
				; GFX9-FLASTSCR-NEXT: scratch_store_dword v1, v0, off
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0)
				; GFX9-FLASTSCR-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 1			store i16 1, i16 addrspace(5)* %r, align 1
	store i16 2, i16 addrspace(5)* %gep.r, align 1			store i16 2, i16 addrspace(5)* %gep.r, align 1
	ret void			ret void
	}			}

	; Should merge this to a dword load			; Should merge this to a dword load
	define i32 @private_load_2xi16_align4(i16 addrspace(5)* %p) #0 {			define i32 @private_load_2xi16_align4(i16 addrspace(5)* %p) #0 {
	Show All 23 Lines
	; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen			; GFX9-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen
	; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff			; GFX9-NEXT: v_mov_b32_e32 v1, 0xffff
	; GFX9-NEXT: s_mov_b32 s4, 0xffff			; GFX9-NEXT: s_mov_b32 s4, 0xffff
	; GFX9-NEXT: s_waitcnt vmcnt(0)			; GFX9-NEXT: s_waitcnt vmcnt(0)
	; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0			; GFX9-NEXT: v_bfi_b32 v1, v1, 0, v0
	; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1			; GFX9-NEXT: v_and_or_b32 v0, v0, s4, v1
	; GFX9-NEXT: s_setpc_b64 s[30:31]			; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-FLASTSCR-LABEL: private_load_2xi16_align4:
				; GFX9-FLASTSCR: ; %bb.0:
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-FLASTSCR-NEXT: scratch_load_dword v0, v0, off
				; GFX9-FLASTSCR-NEXT: v_mov_b32_e32 v1, 0xffff
				; GFX9-FLASTSCR-NEXT: s_mov_b32 s4, 0xffff
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0)
				; GFX9-FLASTSCR-NEXT: v_bfi_b32 v1, v1, 0, v0
				; GFX9-FLASTSCR-NEXT: v_and_or_b32 v0, v0, s4, v1
				; GFX9-FLASTSCR-NEXT: s_setpc_b64 s[30:31]
	%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1			%gep.p = getelementptr i16, i16 addrspace(5)* %p, i64 1
	%p.0 = load i16, i16 addrspace(5)* %p, align 4			%p.0 = load i16, i16 addrspace(5)* %p, align 4
	%p.1 = load i16, i16 addrspace(5)* %gep.p, align 2			%p.1 = load i16, i16 addrspace(5)* %gep.p, align 2
	%zext.0 = zext i16 %p.0 to i32			%zext.0 = zext i16 %p.0 to i32
	%zext.1 = zext i16 %p.1 to i32			%zext.1 = zext i16 %p.1 to i32
	%shl.1 = shl i32 %zext.1, 16			%shl.1 = shl i32 %zext.1, 16
	%or = or i32 %zext.0, %shl.1			%or = or i32 %zext.0, %shl.1
	ret i32 %or			ret i32 %or
	}			}

	; Should merge this to a dword store			; Should merge this to a dword store
	define void @private_store_2xi16_align4(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {			define void @private_store_2xi16_align4(i16 addrspace(5)* %p, i16 addrspace(5)* %r) #0 {
	; GFX7-LABEL: private_store_2xi16_align4:			; GFX7-LABEL: private_store_2xi16_align4:
	; GFX7: ; %bb.0:			; GFX7: ; %bb.0:
	; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x2			; GFX7-NEXT: s_load_dwordx2 s[0:1], s[4:5], 0x2
	; GFX7-NEXT: v_mov_b32_e32 v2, 0x20001			; GFX7-NEXT: v_mov_b32_e32 v2, 0x20001
	; GFX7-NEXT: s_waitcnt lgkmcnt(0)			; GFX7-NEXT: s_waitcnt lgkmcnt(0)
	; GFX7-NEXT: v_mov_b32_e32 v0, s0			; GFX7-NEXT: v_mov_b32_e32 v0, s0
	; GFX7-NEXT: v_mov_b32_e32 v1, s1			; GFX7-NEXT: v_mov_b32_e32 v1, s1
	; GFX7-NEXT: flat_store_dword v[0:1], v2			; GFX7-NEXT: flat_store_dword v[0:1], v2
	; GFX7-NEXT: s_endpgm			; GFX7-NEXT: s_endpgm
	;			;
	; GCN-LABEL: private_store_2xi16_align4:			; GFX7-ALIGNED-LABEL: private_store_2xi16_align4:
	; GCN: ; %bb.0:			; GFX7-ALIGNED: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, 0x20001			; GFX7-ALIGNED-NEXT: v_mov_b32_e32 v0, 0x20001
	; GCN-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen			; GFX7-ALIGNED-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GFX7-ALIGNED-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GFX7-ALIGNED-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX7-UNALIGNED-LABEL: private_store_2xi16_align4:
				; GFX7-UNALIGNED: ; %bb.0:
				; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX7-UNALIGNED-NEXT: v_mov_b32_e32 v0, 0x20001
				; GFX7-UNALIGNED-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
				; GFX7-UNALIGNED-NEXT: s_waitcnt vmcnt(0)
				; GFX7-UNALIGNED-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-LABEL: private_store_2xi16_align4:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 0x20001
				; GFX9-NEXT: buffer_store_dword v0, v1, s[0:3], 0 offen
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX9-FLASTSCR-LABEL: private_store_2xi16_align4:
				; GFX9-FLASTSCR: ; %bb.0:
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-FLASTSCR-NEXT: v_mov_b32_e32 v0, 0x20001
				; GFX9-FLASTSCR-NEXT: scratch_store_dword v1, v0, off
				; GFX9-FLASTSCR-NEXT: s_waitcnt vmcnt(0)
				; GFX9-FLASTSCR-NEXT: s_setpc_b64 s[30:31]
	%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1			%gep.r = getelementptr i16, i16 addrspace(5)* %r, i64 1
	store i16 1, i16 addrspace(5)* %r, align 4			store i16 1, i16 addrspace(5)* %r, align 4
	store i16 2, i16 addrspace(5)* %gep.r, align 2			store i16 2, i16 addrspace(5)* %gep.r, align 2
	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/flat-scratch.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX9 %s
				; RUN: llc -march=amdgcn -mcpu=gfx1030 -mattr=-promote-alloca -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX10 %s

				define amdgpu_kernel void @zero_init_kernel() {
				; GFX9-LABEL: zero_init_kernel:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: v_mov_b32_e32 v0, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:76
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:72
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:68
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:64
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:60
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:56
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:52
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:48
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:44
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:40
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:36
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:32
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:28
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:24
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:20
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:16
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: zero_init_kernel:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:76
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:72
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:68
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:64
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:60
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:56
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:52
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:48
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:44
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:40
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:36
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:32
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:28
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:24
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:20
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:16
				; GFX10-NEXT: s_endpgm
				%alloca = alloca [32 x i16], align 2, addrspace(5)
				%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
				call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
				ret void
				}

				define void @zero_init_foo() {
				; GFX9-LABEL: zero_init_foo:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 0
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:60
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:56
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:52
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:48
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:44
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:40
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:36
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:32
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:28
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:24
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:20
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:16
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:12
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:8
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:4
				; GFX9-NEXT: scratch_store_dword off, v0, s32
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: zero_init_foo:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:60
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:56
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:52
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:48
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:44
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:40
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:36
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:32
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:28
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:24
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:20
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:16
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:12
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:8
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:4
				; GFX10-NEXT: scratch_store_dword off, v0, s32
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				%alloca = alloca [32 x i16], align 2, addrspace(5)
				%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
				call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
				ret void
				}

				define amdgpu_kernel void @store_load_sindex_kernel(i32 %idx) {
				; GFX9-LABEL: store_load_sindex_kernel:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
				; GFX9-NEXT: v_mov_b32_e32 v0, 15
				; GFX9-NEXT: s_lshl_b32 s1, s0, 2
				; GFX9-NEXT: s_and_b32 s0, s0, 15
				; GFX9-NEXT: s_lshl_b32 s0, s0, 2
				; GFX9-NEXT: s_add_u32 s1, 4, s1
				; GFX9-NEXT: scratch_store_dword off, v0, s1
				; GFX9-NEXT: s_add_u32 s0, 4, s0
				; GFX9-NEXT: scratch_load_dword v0, off, s0
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_sindex_kernel:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s2, s2, s5
				; GFX10-NEXT: s_addc_u32 s3, s3, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
				; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
				; GFX10-NEXT: v_mov_b32_e32 v0, 15
				; GFX10-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-NEXT: s_and_b32 s1, s0, 15
				; GFX10-NEXT: s_lshl_b32 s0, s0, 2
				; GFX10-NEXT: s_lshl_b32 s1, s1, 2
				; GFX10-NEXT: s_add_u32 s0, 4, s0
				; GFX10-NEXT: s_add_u32 s1, 4, s1
				; GFX10-NEXT: scratch_store_dword off, v0, s0
				; GFX10-NEXT: scratch_load_dword v0, off, s1
				; GFX10-NEXT: s_endpgm
				bb:
				%i = alloca [32 x float], align 4, addrspace(5)
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define amdgpu_ps void @store_load_sindex_foo(i32 inreg %idx) {
				; GFX9-LABEL: store_load_sindex_foo:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: s_lshl_b32 s0, s2, 2
				; GFX9-NEXT: s_add_u32 s0, 4, s0
				; GFX9-NEXT: v_mov_b32_e32 v0, 15
				; GFX9-NEXT: scratch_store_dword off, v0, s0
				; GFX9-NEXT: s_and_b32 s0, s2, 15
				; GFX9-NEXT: s_lshl_b32 s0, s0, 2
				; GFX9-NEXT: s_add_u32 s0, 4, s0
				; GFX9-NEXT: scratch_load_dword v0, off, s0
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_sindex_foo:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: s_and_b32 s0, s2, 15
				; GFX10-NEXT: v_mov_b32_e32 v0, 15
				; GFX10-NEXT: s_lshl_b32 s1, s2, 2
				; GFX10-NEXT: s_lshl_b32 s0, s0, 2
				; GFX10-NEXT: s_add_u32 s1, 4, s1
				; GFX10-NEXT: s_add_u32 s0, 4, s0
				; GFX10-NEXT: scratch_store_dword off, v0, s1
				; GFX10-NEXT: scratch_load_dword v0, off, s0
				; GFX10-NEXT: s_endpgm
				bb:
				%i = alloca [32 x float], align 4, addrspace(5)
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define amdgpu_kernel void @store_load_vindex_kernel() {
				; GFX9-LABEL: store_load_vindex_kernel:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GFX9-NEXT: v_mov_b32_e32 v1, 4
				; GFX9-NEXT: v_add_u32_e32 v2, v1, v0
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: v_mov_b32_e32 v3, 15
				; GFX9-NEXT: scratch_store_dword v2, v3, off
				; GFX9-NEXT: v_sub_u32_e32 v0, v1, v0
				; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_vindex_kernel:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: v_mov_b32_e32 v1, 4
				; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GFX10-NEXT: v_mov_b32_e32 v3, 15
				; GFX10-NEXT: v_add_nc_u32_e32 v2, v1, v0
				; GFX10-NEXT: v_sub_nc_u32_e32 v0, v1, v0
				; GFX10-NEXT: scratch_store_dword v2, v3, off
				; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124
				; GFX10-NEXT: s_endpgm
				bb:
				%i = alloca [32 x float], align 4, addrspace(5)
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()
				%i3 = zext i32 %i2 to i64
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = sub nsw i32 31, %i2
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define void @store_load_vindex_foo(i32 %idx) {
				; GFX9-LABEL: store_load_vindex_foo:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v1, s32
				; GFX9-NEXT: v_mov_b32_e32 v3, 15
				; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1
				; GFX9-NEXT: v_and_b32_e32 v0, v0, v3
				; GFX9-NEXT: scratch_store_dword v2, v3, off
				; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
				; GFX9-NEXT: scratch_load_dword v0, v0, off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: store_load_vindex_foo:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_mov_b32_e32 v1, 15
				; GFX10-NEXT: v_mov_b32_e32 v2, s32
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_and_b32_e32 v3, v0, v1
				; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, v2
				; GFX10-NEXT: v_lshl_add_u32 v2, v3, 2, v2
				; GFX10-NEXT: scratch_store_dword v0, v1, off
				; GFX10-NEXT: scratch_load_dword v0, v2, off
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				bb:
				%i = alloca [32 x float], align 4, addrspace(5)
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define void @private_ptr_foo(float addrspace(5)* nocapture %arg) {
				; GFX9-LABEL: private_ptr_foo:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v1, 0x41200000
				; GFX9-NEXT: scratch_store_dword v0, v1, off offset:4
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: private_ptr_foo:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_mov_b32_e32 v1, 0x41200000
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_dword v0, v1, off offset:4
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				%gep = getelementptr inbounds float, float addrspace(5)* %arg, i32 1
				store float 1.000000e+01, float addrspace(5)* %gep, align 4
				ret void
				}

				define amdgpu_kernel void @zero_init_small_offset_kernel() {
				; GFX9-LABEL: zero_init_small_offset_kernel:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:284
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:280
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:276
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:272
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:300
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:296
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:292
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:288
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:316
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:312
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:308
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:304
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:332
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:328
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:324
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:320
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: zero_init_small_offset_kernel:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: scratch_load_dword v0, off, off offset:4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:284
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:280
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:276
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:272
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:300
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:296
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:292
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:288
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:316
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:312
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:308
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:304
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:332
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:328
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:324
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:320
				; GFX10-NEXT: s_endpgm
				%padding = alloca [64 x i32], align 4, addrspace(5)
				%alloca = alloca [32 x i16], align 2, addrspace(5)
				%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
				call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
				ret void
				}

				define void @zero_init_small_offset_foo() {
				; GFX9-LABEL: zero_init_small_offset_foo:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: scratch_load_dword v0, off, s32
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 0
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:268
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:264
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:260
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:256
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:284
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:280
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:276
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:272
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:300
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:296
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:292
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:288
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:316
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:312
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:308
				; GFX9-NEXT: scratch_store_dword off, v0, s32 offset:304
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: zero_init_small_offset_foo:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: scratch_load_dword v0, off, s32
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:268
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:264
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:260
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:256
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:284
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:280
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:276
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:272
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:300
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:296
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:292
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:288
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:316
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:312
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:308
				; GFX10-NEXT: scratch_store_dword off, v0, s32 offset:304
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				%padding = alloca [64 x i32], align 4, addrspace(5)
				%alloca = alloca [32 x i16], align 2, addrspace(5)
				%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
				call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
				ret void
				}

				define amdgpu_kernel void @store_load_sindex_small_offset_kernel(i32 %idx) {
				; GFX9-LABEL: store_load_sindex_small_offset_kernel:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4
				; GFX9-NEXT: s_lshl_b32 s1, s0, 2
				; GFX9-NEXT: s_and_b32 s0, s0, 15
				; GFX9-NEXT: s_lshl_b32 s0, s0, 2
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 15
				; GFX9-NEXT: s_add_u32 s1, 0x104, s1
				; GFX9-NEXT: scratch_store_dword off, v0, s1
				; GFX9-NEXT: s_add_u32 s0, 0x104, s0
				; GFX9-NEXT: scratch_load_dword v0, off, s0
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_sindex_small_offset_kernel:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s2, s2, s5
				; GFX10-NEXT: s_addc_u32 s3, s3, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
				; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
				; GFX10-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-NEXT: scratch_load_dword v0, off, off offset:4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: v_mov_b32_e32 v0, 15
				; GFX10-NEXT: s_and_b32 s1, s0, 15
				; GFX10-NEXT: s_lshl_b32 s0, s0, 2
				; GFX10-NEXT: s_lshl_b32 s1, s1, 2
				; GFX10-NEXT: s_add_u32 s0, 0x104, s0
				; GFX10-NEXT: s_add_u32 s1, 0x104, s1
				; GFX10-NEXT: scratch_store_dword off, v0, s0
				; GFX10-NEXT: scratch_load_dword v0, off, s1
				; GFX10-NEXT: s_endpgm
				bb:
				%padding = alloca [64 x i32], align 4, addrspace(5)
				%i = alloca [32 x float], align 4, addrspace(5)
				%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define amdgpu_ps void @store_load_sindex_small_offset_foo(i32 inreg %idx) {
				; GFX9-LABEL: store_load_sindex_small_offset_foo:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: s_lshl_b32 s0, s2, 2
				; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4
				; GFX9-NEXT: s_add_u32 s0, 0x104, s0
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 15
				; GFX9-NEXT: scratch_store_dword off, v0, s0
				; GFX9-NEXT: s_and_b32 s0, s2, 15
				; GFX9-NEXT: s_lshl_b32 s0, s0, 2
				; GFX9-NEXT: s_add_u32 s0, 0x104, s0
				; GFX9-NEXT: scratch_load_dword v0, off, s0
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_sindex_small_offset_foo:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: scratch_load_dword v0, off, off offset:4
				; GFX10-NEXT: s_and_b32 s0, s2, 15
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: v_mov_b32_e32 v0, 15
				; GFX10-NEXT: s_lshl_b32 s1, s2, 2
				; GFX10-NEXT: s_lshl_b32 s0, s0, 2
				; GFX10-NEXT: s_add_u32 s1, 0x104, s1
				; GFX10-NEXT: s_add_u32 s0, 0x104, s0
				; GFX10-NEXT: scratch_store_dword off, v0, s1
				; GFX10-NEXT: scratch_load_dword v0, off, s0
				; GFX10-NEXT: s_endpgm
				bb:
				%padding = alloca [64 x i32], align 4, addrspace(5)
				%i = alloca [32 x float], align 4, addrspace(5)
				%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define amdgpu_kernel void @store_load_vindex_small_offset_kernel() {
				; GFX9-LABEL: store_load_vindex_small_offset_kernel:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_load_dword v1, off, vcc_hi offset:4
				; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v1, 0x104
				; GFX9-NEXT: v_add_u32_e32 v2, v1, v0
				; GFX9-NEXT: v_mov_b32_e32 v3, 15
				; GFX9-NEXT: scratch_store_dword v2, v3, off
				; GFX9-NEXT: v_sub_u32_e32 v0, v1, v0
				; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_vindex_small_offset_kernel:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: v_mov_b32_e32 v1, 0x104
				; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GFX10-NEXT: v_mov_b32_e32 v3, 15
				; GFX10-NEXT: v_add_nc_u32_e32 v2, v1, v0
				; GFX10-NEXT: v_sub_nc_u32_e32 v0, v1, v0
				; GFX10-NEXT: scratch_load_dword v1, off, off offset:4
				; GFX10-NEXT: scratch_store_dword v2, v3, off
				; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124
				; GFX10-NEXT: s_endpgm
				bb:
				%padding = alloca [64 x i32], align 4, addrspace(5)
				%i = alloca [32 x float], align 4, addrspace(5)
				%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()
				%i3 = zext i32 %i2 to i64
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = sub nsw i32 31, %i2
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define void @store_load_vindex_small_offset_foo(i32 %idx) {
				; GFX9-LABEL: store_load_vindex_small_offset_foo:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: scratch_load_dword v1, off, s32
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x100
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi
				; GFX9-NEXT: v_mov_b32_e32 v3, 15
				; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1
				; GFX9-NEXT: v_and_b32_e32 v0, v0, v3
				; GFX9-NEXT: scratch_store_dword v2, v3, off
				; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
				; GFX9-NEXT: scratch_load_dword v0, v0, off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: store_load_vindex_small_offset_foo:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_mov_b32_e32 v1, 15
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x100
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo
				; GFX10-NEXT: v_and_b32_e32 v3, v0, v1
				; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, v2
				; GFX10-NEXT: v_lshl_add_u32 v2, v3, 2, v2
				; GFX10-NEXT: scratch_load_dword v3, off, s32
				; GFX10-NEXT: scratch_store_dword v0, v1, off
				; GFX10-NEXT: scratch_load_dword v0, v2, off
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				bb:
				%padding = alloca [64 x i32], align 4, addrspace(5)
				%i = alloca [32 x float], align 4, addrspace(5)
				%pad_gep = getelementptr inbounds [64 x i32], [64 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define amdgpu_kernel void @zero_init_large_offset_kernel() {
				; GFX9-LABEL: zero_init_large_offset_kernel:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 0
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:12
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:8
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:28
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:24
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:20
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:16
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:44
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:40
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:36
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:32
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:60
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:56
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:52
				; GFX9-NEXT: s_movk_i32 vcc_hi, 0x4010
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:48
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: zero_init_large_offset_kernel:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: scratch_load_dword v0, off, off offset:4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:12
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:8
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:4
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:28
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:24
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:20
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:16
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:44
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:40
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:36
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:32
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:60
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:56
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:52
				; GFX10-NEXT: s_movk_i32 vcc_lo, 0x4010
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:48
				; GFX10-NEXT: s_endpgm
				%padding = alloca [4096 x i32], align 4, addrspace(5)
				%alloca = alloca [32 x i16], align 2, addrspace(5)
				%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
				call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
				ret void
				}

				define void @zero_init_large_offset_foo() {
				; GFX9-LABEL: zero_init_large_offset_foo:
				; GFX9: ; %bb.0:
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: scratch_load_dword v0, off, s32
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 0
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:12
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:8
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:28
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:24
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:20
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:16
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:44
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:40
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:36
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:32
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:60
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:56
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:52
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:48
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: zero_init_large_offset_foo:
				; GFX10: ; %bb.0:
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: scratch_load_dword v0, off, s32
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: v_mov_b32_e32 v0, 0
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:12
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:8
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:4
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:28
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:24
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:20
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:16
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:44
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:40
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:36
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:32
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:60
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:56
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:52
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: scratch_store_dword off, v0, vcc_lo offset:48
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				%padding = alloca [4096 x i32], align 4, addrspace(5)
				%alloca = alloca [32 x i16], align 2, addrspace(5)
				%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%cast = bitcast [32 x i16] addrspace(5)* %alloca to i8 addrspace(5)*
				call void @llvm.memset.p5i8.i64(i8 addrspace(5)* align 2 dereferenceable(64) %cast, i8 0, i64 64, i1 false)
				ret void
				}

				define amdgpu_kernel void @store_load_sindex_large_offset_kernel(i32 %idx) {
				; GFX9-LABEL: store_load_sindex_large_offset_kernel:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4
				; GFX9-NEXT: s_lshl_b32 s1, s0, 2
				; GFX9-NEXT: s_and_b32 s0, s0, 15
				; GFX9-NEXT: s_lshl_b32 s0, s0, 2
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 15
				; GFX9-NEXT: s_add_u32 s1, 0x4004, s1
				; GFX9-NEXT: scratch_store_dword off, v0, s1
				; GFX9-NEXT: s_add_u32 s0, 0x4004, s0
				; GFX9-NEXT: scratch_load_dword v0, off, s0
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_sindex_large_offset_kernel:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s2, s2, s5
				; GFX10-NEXT: s_addc_u32 s3, s3, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
				; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
				; GFX10-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-NEXT: scratch_load_dword v0, off, off offset:4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: v_mov_b32_e32 v0, 15
				; GFX10-NEXT: s_and_b32 s1, s0, 15
				; GFX10-NEXT: s_lshl_b32 s0, s0, 2
				; GFX10-NEXT: s_lshl_b32 s1, s1, 2
				; GFX10-NEXT: s_add_u32 s0, 0x4004, s0
				; GFX10-NEXT: s_add_u32 s1, 0x4004, s1
				; GFX10-NEXT: scratch_store_dword off, v0, s0
				; GFX10-NEXT: scratch_load_dword v0, off, s1
				; GFX10-NEXT: s_endpgm
				bb:
				%padding = alloca [4096 x i32], align 4, addrspace(5)
				%i = alloca [32 x float], align 4, addrspace(5)
				%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define amdgpu_ps void @store_load_sindex_large_offset_foo(i32 inreg %idx) {
				; GFX9-LABEL: store_load_sindex_large_offset_foo:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: s_lshl_b32 s0, s2, 2
				; GFX9-NEXT: scratch_load_dword v0, off, vcc_hi offset:4
				; GFX9-NEXT: s_add_u32 s0, 0x4004, s0
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v0, 15
				; GFX9-NEXT: scratch_store_dword off, v0, s0
				; GFX9-NEXT: s_and_b32 s0, s2, 15
				; GFX9-NEXT: s_lshl_b32 s0, s0, 2
				; GFX9-NEXT: s_add_u32 s0, 0x4004, s0
				; GFX9-NEXT: scratch_load_dword v0, off, s0
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_sindex_large_offset_foo:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: scratch_load_dword v0, off, off offset:4
				; GFX10-NEXT: s_and_b32 s0, s2, 15
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: v_mov_b32_e32 v0, 15
				; GFX10-NEXT: s_lshl_b32 s1, s2, 2
				; GFX10-NEXT: s_lshl_b32 s0, s0, 2
				; GFX10-NEXT: s_add_u32 s1, 0x4004, s1
				; GFX10-NEXT: s_add_u32 s0, 0x4004, s0
				; GFX10-NEXT: scratch_store_dword off, v0, s1
				; GFX10-NEXT: scratch_load_dword v0, off, s0
				; GFX10-NEXT: s_endpgm
				bb:
				%padding = alloca [4096 x i32], align 4, addrspace(5)
				%i = alloca [32 x float], align 4, addrspace(5)
				%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define amdgpu_kernel void @store_load_vindex_large_offset_kernel() {
				; GFX9-LABEL: store_load_vindex_large_offset_kernel:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_load_dword v1, off, vcc_hi offset:4
				; GFX9-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v1, 0x4004
				; GFX9-NEXT: v_add_u32_e32 v2, v1, v0
				; GFX9-NEXT: v_mov_b32_e32 v3, 15
				; GFX9-NEXT: scratch_store_dword v2, v3, off
				; GFX9-NEXT: v_sub_u32_e32 v0, v1, v0
				; GFX9-NEXT: scratch_load_dword v0, v0, off offset:124
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_vindex_large_offset_kernel:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: v_mov_b32_e32 v1, 0x4004
				; GFX10-NEXT: v_lshlrev_b32_e32 v0, 2, v0
				; GFX10-NEXT: v_mov_b32_e32 v3, 15
				; GFX10-NEXT: v_add_nc_u32_e32 v2, v1, v0
				; GFX10-NEXT: v_sub_nc_u32_e32 v0, v1, v0
				; GFX10-NEXT: scratch_load_dword v1, off, off offset:4
				; GFX10-NEXT: scratch_store_dword v2, v3, off
				; GFX10-NEXT: scratch_load_dword v0, v0, off offset:124
				; GFX10-NEXT: s_endpgm
				bb:
				%padding = alloca [4096 x i32], align 4, addrspace(5)
				%i = alloca [32 x float], align 4, addrspace(5)
				%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i2 = tail call i32 @llvm.amdgcn.workitem.id.x()
				%i3 = zext i32 %i2 to i64
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i2
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = sub nsw i32 31, %i2
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define void @store_load_vindex_large_offset_foo(i32 %idx) {
				; GFX9-LABEL: store_load_vindex_large_offset_foo:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: scratch_load_dword v1, off, s32
				; GFX9-NEXT: s_add_u32 vcc_hi, s32, 0x4000
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v1, vcc_hi
				; GFX9-NEXT: v_mov_b32_e32 v3, 15
				; GFX9-NEXT: v_lshl_add_u32 v2, v0, 2, v1
				; GFX9-NEXT: v_and_b32_e32 v0, v0, v3
				; GFX9-NEXT: scratch_store_dword v2, v3, off
				; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
				; GFX9-NEXT: scratch_load_dword v0, v0, off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: store_load_vindex_large_offset_foo:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_mov_b32_e32 v1, 15
				; GFX10-NEXT: s_add_u32 vcc_lo, s32, 0x4000
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: v_mov_b32_e32 v2, vcc_lo
				; GFX10-NEXT: v_and_b32_e32 v3, v0, v1
				; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, v2
				; GFX10-NEXT: v_lshl_add_u32 v2, v3, 2, v2
				; GFX10-NEXT: scratch_load_dword v3, off, s32
				; GFX10-NEXT: scratch_store_dword v0, v1, off
				; GFX10-NEXT: scratch_load_dword v0, v2, off
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				bb:
				%padding = alloca [4096 x i32], align 4, addrspace(5)
				%i = alloca [32 x float], align 4, addrspace(5)
				%pad_gep = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %padding, i32 0, i32 undef
				%pad_load = load volatile i32, i32 addrspace(5)* %pad_gep, align 4
				%i1 = bitcast [32 x float] addrspace(5)* %i to i8 addrspace(5)*
				%i7 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %idx
				%i8 = bitcast float addrspace(5)* %i7 to i32 addrspace(5)*
				store volatile i32 15, i32 addrspace(5)* %i8, align 4
				%i9 = and i32 %idx, 15
				%i10 = getelementptr inbounds [32 x float], [32 x float] addrspace(5)* %i, i32 0, i32 %i9
				%i11 = bitcast float addrspace(5)* %i10 to i32 addrspace(5)*
				%i12 = load volatile i32, i32 addrspace(5)* %i11, align 4
				ret void
				}

				define amdgpu_kernel void @store_load_large_imm_offset_kernel() {
				; GFX9-LABEL: store_load_large_imm_offset_kernel:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s0, s3
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s1, 0
				; GFX9-NEXT: s_movk_i32 s0, 0x3000
				; GFX9-NEXT: v_mov_b32_e32 v0, 13
				; GFX9-NEXT: s_mov_b32 vcc_hi, 0
				; GFX9-NEXT: scratch_store_dword off, v0, vcc_hi offset:4
				; GFX9-NEXT: s_add_u32 s0, 4, s0
				; GFX9-NEXT: v_mov_b32_e32 v0, 15
				; GFX9-NEXT: scratch_store_dword off, v0, s0 offset:3712
				; GFX9-NEXT: scratch_load_dword v0, off, s0 offset:3712
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_large_imm_offset_kernel:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s0, s0, s3
				; GFX10-NEXT: s_addc_u32 s1, s1, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1
				; GFX10-NEXT: v_mov_b32_e32 v0, 13
				; GFX10-NEXT: v_mov_b32_e32 v1, 15
				; GFX10-NEXT: s_movk_i32 s0, 0x3800
				; GFX10-NEXT: s_add_u32 s0, 4, s0
				; GFX10-NEXT: scratch_store_dword off, v0, off offset:4
				; GFX10-NEXT: scratch_store_dword off, v1, s0 offset:1664
				; GFX10-NEXT: scratch_load_dword v0, off, s0 offset:1664
				; GFX10-NEXT: s_endpgm
				bb:
				%i = alloca [4096 x i32], align 4, addrspace(5)
				%i1 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 undef
				store volatile i32 13, i32 addrspace(5)* %i1, align 4
				%i7 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000
				store volatile i32 15, i32 addrspace(5)* %i7, align 4
				%i10 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000
				%i12 = load volatile i32, i32 addrspace(5)* %i10, align 4
				ret void
				}

				define void @store_load_large_imm_offset_foo() {
				; GFX9-LABEL: store_load_large_imm_offset_foo:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: s_movk_i32 s4, 0x3000
				; GFX9-NEXT: v_mov_b32_e32 v0, 13
				; GFX9-NEXT: scratch_store_dword off, v0, s32
				; GFX9-NEXT: s_add_u32 s4, s32, s4
				; GFX9-NEXT: v_mov_b32_e32 v0, 15
				; GFX9-NEXT: scratch_store_dword off, v0, s4 offset:3712
				; GFX9-NEXT: scratch_load_dword v0, off, s4 offset:3712
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: store_load_large_imm_offset_foo:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_mov_b32_e32 v0, 13
				; GFX10-NEXT: v_mov_b32_e32 v1, 15
				; GFX10-NEXT: s_movk_i32 s4, 0x3800
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: s_add_u32 s4, s32, s4
				; GFX10-NEXT: scratch_store_dword off, v0, s32
				; GFX10-NEXT: scratch_store_dword off, v1, s4 offset:1664
				; GFX10-NEXT: scratch_load_dword v0, off, s4 offset:1664
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				bb:
				%i = alloca [4096 x i32], align 4, addrspace(5)
				%i1 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 undef
				store volatile i32 13, i32 addrspace(5)* %i1, align 4
				%i7 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000
				store volatile i32 15, i32 addrspace(5)* %i7, align 4
				%i10 = getelementptr inbounds [4096 x i32], [4096 x i32] addrspace(5)* %i, i32 0, i32 4000
				%i12 = load volatile i32, i32 addrspace(5)* %i10, align 4
				ret void
				}

				define amdgpu_kernel void @store_load_vidx_sidx_offset(i32 %sidx) {
				; GFX9-LABEL: store_load_vidx_sidx_offset:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_load_dword s0, s[0:1], 0x24
				; GFX9-NEXT: s_waitcnt lgkmcnt(0)
				; GFX9-NEXT: s_add_u32 flat_scratch_lo, s2, s5
				; GFX9-NEXT: v_mov_b32_e32 v1, 4
				; GFX9-NEXT: s_addc_u32 flat_scratch_hi, s3, 0
				; GFX9-NEXT: v_add_u32_e32 v0, s0, v0
				; GFX9-NEXT: v_lshl_add_u32 v0, v0, 2, v1
				; GFX9-NEXT: v_mov_b32_e32 v1, 15
				; GFX9-NEXT: scratch_store_dword v0, v1, off offset:1024
				; GFX9-NEXT: scratch_load_dword v0, v0, off offset:1024
				; GFX9-NEXT: s_endpgm
				;
				; GFX10-LABEL: store_load_vidx_sidx_offset:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_add_u32 s2, s2, s5
				; GFX10-NEXT: s_addc_u32 s3, s3, 0
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
				; GFX10-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
				; GFX10-NEXT: s_load_dword s0, s[0:1], 0x24
				; GFX10-NEXT: v_mov_b32_e32 v1, 15
				; GFX10-NEXT: s_waitcnt lgkmcnt(0)
				; GFX10-NEXT: v_add_nc_u32_e32 v0, s0, v0
				; GFX10-NEXT: v_lshl_add_u32 v0, v0, 2, 4
				; GFX10-NEXT: scratch_store_dword v0, v1, off offset:1024
				; GFX10-NEXT: scratch_load_dword v0, v0, off offset:1024
				; GFX10-NEXT: s_endpgm
				bb:
				%alloca = alloca [32 x i32], align 4, addrspace(5)
				%vidx = tail call i32 @llvm.amdgcn.workitem.id.x()
				%add1 = add nsw i32 %sidx, %vidx
				%add2 = add nsw i32 %add1, 256
				%gep = getelementptr inbounds [32 x i32], [32 x i32] addrspace(5)* %alloca, i32 0, i32 %add2
				store volatile i32 15, i32 addrspace(5)* %gep, align 4
				%load = load volatile i32, i32 addrspace(5)* %gep, align 4
				ret void
				}

				; FIXME: Multi-DWORD scratch shall be supported
				define void @store_load_i64_aligned(i64 addrspace(5)* nocapture %arg) {
				; GFX9-LABEL: store_load_i64_aligned:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v1, 0
				; GFX9-NEXT: scratch_store_dword v0, v1, off offset:4
				; GFX9-NEXT: v_mov_b32_e32 v1, 15
				; GFX9-NEXT: scratch_store_dword v0, v1, off
				; GFX9-NEXT: scratch_load_dword v1, v0, off offset:4
				; GFX9-NEXT: scratch_load_dword v0, v0, off
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: store_load_i64_aligned:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_mov_b32_e32 v1, 0
				; GFX10-NEXT: v_mov_b32_e32 v2, 15
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_dword v0, v1, off offset:4
				; GFX10-NEXT: scratch_store_dword v0, v2, off
				; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: scratch_load_dword v1, v0, off offset:4
				; GFX10-NEXT: scratch_load_dword v0, v0, off
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				bb:
				store volatile i64 15, i64 addrspace(5)* %arg, align 8
				%load = load volatile i64, i64 addrspace(5)* %arg, align 8
				ret void
				}

				; FIXME: Multi-DWORD unaligned scratch shall be supported
				define void @store_load_i64_unaligned(i64 addrspace(5)* nocapture %arg) {
				; GFX9-LABEL: store_load_i64_unaligned:
				; GFX9: ; %bb.0: ; %bb
				; GFX9-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX9-NEXT: v_mov_b32_e32 v1, 0
				; GFX9-NEXT: scratch_store_byte v0, v1, off offset:7
				; GFX9-NEXT: scratch_store_byte v0, v1, off offset:6
				; GFX9-NEXT: scratch_store_byte v0, v1, off offset:5
				; GFX9-NEXT: scratch_store_byte v0, v1, off offset:4
				; GFX9-NEXT: scratch_store_byte v0, v1, off offset:3
				; GFX9-NEXT: scratch_store_byte v0, v1, off offset:2
				; GFX9-NEXT: scratch_store_byte v0, v1, off offset:1
				; GFX9-NEXT: v_mov_b32_e32 v1, 15
				; GFX9-NEXT: scratch_store_byte v0, v1, off
				; GFX9-NEXT: scratch_load_ubyte v1, v0, off offset:6
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: scratch_load_ubyte v1, v0, off offset:7
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: scratch_load_ubyte v1, v0, off offset:4
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: scratch_load_ubyte v1, v0, off offset:5
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: scratch_load_ubyte v1, v0, off offset:2
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: scratch_load_ubyte v1, v0, off offset:3
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: scratch_load_ubyte v1, v0, off
				; GFX9-NEXT: scratch_load_ubyte v0, v0, off offset:1
				; GFX9-NEXT: s_waitcnt vmcnt(0)
				; GFX9-NEXT: s_setpc_b64 s[30:31]
				;
				; GFX10-LABEL: store_load_i64_unaligned:
				; GFX10: ; %bb.0: ; %bb
				; GFX10-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: v_mov_b32_e32 v1, 0
				; GFX10-NEXT: v_mov_b32_e32 v2, 15
				; GFX10-NEXT: ; implicit-def: $vcc_hi
				; GFX10-NEXT: scratch_store_byte v0, v1, off offset:7
				; GFX10-NEXT: scratch_store_byte v0, v1, off offset:6
				; GFX10-NEXT: scratch_store_byte v0, v1, off offset:5
				; GFX10-NEXT: scratch_store_byte v0, v1, off offset:4
				; GFX10-NEXT: scratch_store_byte v0, v1, off offset:3
				; GFX10-NEXT: scratch_store_byte v0, v1, off offset:2
				; GFX10-NEXT: scratch_store_byte v0, v1, off offset:1
				; GFX10-NEXT: scratch_store_byte v0, v2, off
				; GFX10-NEXT: scratch_load_ubyte v1, v0, off offset:6
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: scratch_load_ubyte v1, v0, off offset:7
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: scratch_load_ubyte v1, v0, off offset:4
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: scratch_load_ubyte v1, v0, off offset:5
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: scratch_load_ubyte v1, v0, off offset:2
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: scratch_load_ubyte v1, v0, off offset:3
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: s_clause 0x1
				; GFX10-NEXT: scratch_load_ubyte v1, v0, off
				; GFX10-NEXT: scratch_load_ubyte v0, v0, off offset:1
				; GFX10-NEXT: s_waitcnt vmcnt(0)
				; GFX10-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX10-NEXT: s_setpc_b64 s[30:31]
				bb:
				store volatile i64 15, i64 addrspace(5)* %arg, align 1
				%load = load volatile i64, i64 addrspace(5)* %arg, align 1
				ret void
				}

				declare void @llvm.memset.p5i8.i64(i8 addrspace(5)* nocapture writeonly, i8, i64, i1 immarg)
				declare i32 @llvm.amdgcn.workitem.id.x()

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CI,MUBUF %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-sroa=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-MUBUF,MUBUF %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=-promote-alloca -amdgpu-sroa=0 -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,GFX9-FLATSCR %s

	; Test that non-entry function frame indices are expanded properly to			; Test that non-entry function frame indices are expanded properly to
	; give an index relative to the scratch wave offset register			; give an index relative to the scratch wave offset register

	; Materialize into a mov. Make sure there isn't an unnecessary copy.			; Materialize into a mov. Make sure there isn't an unnecessary copy.
	; GCN-LABEL: {{^}}func_mov_fi_i32:			; GCN-LABEL: {{^}}func_mov_fi_i32:
	; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

	; CI-NEXT: v_lshr_b32_e64 v0, s32, 6			; CI-NEXT: v_lshr_b32_e64 v0, s32, 6
	; GFX9-NEXT: v_lshrrev_b32_e64 v0, 6, s32			; GFX9-MUBUF-NEXT: v_lshrrev_b32_e64 v0, 6, s32

				; GFX9-FLATSCR: v_mov_b32_e32 v0, s32
				; GFX9-FLATSCR-NOT: v_lshrrev_b32_e64

				; MUBUF-NOT: v_mov

	; GCN-NOT: v_mov
	; GCN: ds_write_b32 v0, v0			; GCN: ds_write_b32 v0, v0
	define void @func_mov_fi_i32() #0 {			define void @func_mov_fi_i32() #0 {
	%alloca = alloca i32, addrspace(5)			%alloca = alloca i32, addrspace(5)
	store volatile i32 addrspace(5)* %alloca, i32 addrspace(5)* addrspace(3)* undef			store volatile i32 addrspace(5)* %alloca, i32 addrspace(5)* addrspace(3)* undef
	ret void			ret void
	}			}

	; Offset due to different objects			; Offset due to different objects
	; GCN-LABEL: {{^}}func_mov_fi_i32_offset:			; GCN-LABEL: {{^}}func_mov_fi_i32_offset:
	; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

	; CI-DAG: v_lshr_b32_e64 v0, s32, 6			; CI-DAG: v_lshr_b32_e64 v0, s32, 6
	; CI-NOT: v_mov			; CI-NOT: v_mov
	; CI: ds_write_b32 v0, v0			; CI: ds_write_b32 v0, v0
	; CI-NEXT: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6			; CI-NEXT: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
	; CI-NEXT: v_add_i32_e{{32\|64}} v0, {{s\[[0-9]+:[0-9]+\]\|vcc}}, 4, [[SCALED]]			; CI-NEXT: v_add_i32_e{{32\|64}} v0, {{s\[[0-9]+:[0-9]+\]\|vcc}}, 4, [[SCALED]]
	; CI-NEXT: ds_write_b32 v0, v0			; CI-NEXT: ds_write_b32 v0, v0

	; GFX9: v_lshrrev_b32_e64 v0, 6, s32			; GFX9-MUBUF-NEXT: v_lshrrev_b32_e64 v0, 6, s32
				; GFX9-FLATSCR: v_mov_b32_e32 v0, s32
				; GFX9-FLATSCR: s_add_u32 [[ADD:[^,]+]], s32, 4
	; GFX9-NEXT: ds_write_b32 v0, v0			; GFX9-NEXT: ds_write_b32 v0, v0
	; GFX9-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32			; GFX9-MUBUF-NEXT: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
	; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]			; GFX9-MUBUF-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]
				; GFX9-FLATSCR-NEXT: v_mov_b32_e32 v0, [[ADD]]
	; GFX9-NEXT: ds_write_b32 v0, v0			; GFX9-NEXT: ds_write_b32 v0, v0
	define void @func_mov_fi_i32_offset() #0 {			define void @func_mov_fi_i32_offset() #0 {
	%alloca0 = alloca i32, addrspace(5)			%alloca0 = alloca i32, addrspace(5)
	%alloca1 = alloca i32, addrspace(5)			%alloca1 = alloca i32, addrspace(5)
	store volatile i32 addrspace(5)* %alloca0, i32 addrspace(5)* addrspace(3)* undef			store volatile i32 addrspace(5)* %alloca0, i32 addrspace(5)* addrspace(3)* undef
	store volatile i32 addrspace(5)* %alloca1, i32 addrspace(5)* addrspace(3)* undef			store volatile i32 addrspace(5)* %alloca1, i32 addrspace(5)* addrspace(3)* undef
	ret void			ret void
	}			}

	; Materialize into an add of a constant offset from the FI.			; Materialize into an add of a constant offset from the FI.
	; FIXME: Should be able to merge adds			; FIXME: Should be able to merge adds

	; GCN-LABEL: {{^}}func_add_constant_to_fi_i32:			; GCN-LABEL: {{^}}func_add_constant_to_fi_i32:
	; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

	; CI: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6			; CI: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
	; CI-NEXT: v_add_i32_e32 v0, vcc, 4, [[SCALED]]			; CI-NEXT: v_add_i32_e32 v0, vcc, 4, [[SCALED]]

	; GFX9: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32			; GFX9-MUBUF: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
	; GFX9-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]			; GFX9-MUBUF-NEXT: v_add_u32_e32 v0, 4, [[SCALED]]

				; GFX9-FLATSCR: v_mov_b32_e32 [[ADD:v[0-9]+]], s32
				; GFX9-FLATSCR-NEXT: v_add_u32_e32 v0, 4, [[ADD]]

	; GCN-NOT: v_mov			; GCN-NOT: v_mov
	; GCN: ds_write_b32 v0, v0			; GCN: ds_write_b32 v0, v0
	define void @func_add_constant_to_fi_i32() #0 {			define void @func_add_constant_to_fi_i32() #0 {
	%alloca = alloca [2 x i32], align 4, addrspace(5)			%alloca = alloca [2 x i32], align 4, addrspace(5)
	%gep0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(5)* %alloca, i32 0, i32 1			%gep0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(5)* %alloca, i32 0, i32 1
	store volatile i32 addrspace(5)* %gep0, i32 addrspace(5)* addrspace(3)* undef			store volatile i32 addrspace(5)* %gep0, i32 addrspace(5)* addrspace(3)* undef
	ret void			ret void
	}			}

	; A user the materialized frame index can't be meaningfully folded			; A user the materialized frame index can't be meaningfully folded
	; into.			; into.

	; GCN-LABEL: {{^}}func_other_fi_user_i32:			; GCN-LABEL: {{^}}func_other_fi_user_i32:

	; CI: v_lshr_b32_e64 v0, s32, 6			; CI: v_lshr_b32_e64 v0, s32, 6

	; GFX9: v_lshrrev_b32_e64 v0, 6, s32			; GFX9-MUBUF: v_lshrrev_b32_e64 v0, 6, s32
				; GFX9-FLATSCR: v_mov_b32_e32 v0, s32

	; GCN-NEXT: v_mul_u32_u24_e32 v0, 9, v0			; GCN-NEXT: v_mul_u32_u24_e32 v0, 9, v0
	; GCN-NOT: v_mov			; GCN-NOT: v_mov
	; GCN: ds_write_b32 v0, v0			; GCN: ds_write_b32 v0, v0
	define void @func_other_fi_user_i32() #0 {			define void @func_other_fi_user_i32() #0 {
	%alloca = alloca [2 x i32], align 4, addrspace(5)			%alloca = alloca [2 x i32], align 4, addrspace(5)
	%ptrtoint = ptrtoint [2 x i32] addrspace(5)* %alloca to i32			%ptrtoint = ptrtoint [2 x i32] addrspace(5)* %alloca to i32
	%mul = mul i32 %ptrtoint, 9			%mul = mul i32 %ptrtoint, 9
	store volatile i32 %mul, i32 addrspace(3)* undef			store volatile i32 %mul, i32 addrspace(3)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_store_private_arg_i32_ptr:			; GCN-LABEL: {{^}}func_store_private_arg_i32_ptr:
	; GCN: v_mov_b32_e32 v1, 15{{$}}			; GCN: v_mov_b32_e32 v1, 15{{$}}
	; GCN: buffer_store_dword v1, v0, s[0:3], 0 offen{{$}}			; MUBUF: buffer_store_dword v1, v0, s[0:3], 0 offen{{$}}
				; GFX9-FLATSCR: scratch_store_dword v0, v1, off{{$}}
	define void @func_store_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {			define void @func_store_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {
	store volatile i32 15, i32 addrspace(5)* %ptr			store volatile i32 15, i32 addrspace(5)* %ptr
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_load_private_arg_i32_ptr:			; GCN-LABEL: {{^}}func_load_private_arg_i32_ptr:
	; GCN: s_waitcnt			; GCN: s_waitcnt
	; GCN-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen{{$}}			; MUBUF-NEXT: buffer_load_dword v0, v0, s[0:3], 0 offen{{$}}
				; GFX9-FLATSCR-NEXT: scratch_load_dword v0, v0, off{{$}}
	define void @func_load_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {			define void @func_load_private_arg_i32_ptr(i32 addrspace(5)* %ptr) #0 {
	%val = load volatile i32, i32 addrspace(5)* %ptr			%val = load volatile i32, i32 addrspace(5)* %ptr
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr:			; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr:
	; GCN: s_waitcnt			; GCN: s_waitcnt

	; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6			; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6
	; CI-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]			; CI-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]

	; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32			; GFX9-MUBUF: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32
	; GFX9-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]			; GFX9-MUBUF-NEXT: v_or_b32_e32 v0, 4, [[SHIFT]]

				; GFX9-FLATSCR: v_mov_b32_e32 [[SP:v[0-9]+]], s32
				; GFX9-FLATSCR-NEXT: v_or_b32_e32 v0, 4, [[SP]]

	; GCN-NOT: v_mov			; GCN-NOT: v_mov
	; GCN: ds_write_b32 v0, v0			; GCN: ds_write_b32 v0, v0
	define void @void_func_byval_struct_i8_i32_ptr({ i8, i32 } addrspace(5)* byval %arg0) #0 {			define void @void_func_byval_struct_i8_i32_ptr({ i8, i32 } addrspace(5)* byval %arg0) #0 {
	%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1
	%load1 = load i32, i32 addrspace(5)* %gep1			%load1 = load i32, i32 addrspace(5)* %gep1
	store volatile i32 addrspace(5)* %gep1, i32 addrspace(5)* addrspace(3)* undef			store volatile i32 addrspace(5)* %gep1, i32 addrspace(5)* addrspace(3)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_value:			; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_value:
	; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: buffer_load_ubyte v0, off, s[0:3], s32			; MUBUF-NEXT: buffer_load_ubyte v0, off, s[0:3], s32
	; GCN_NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_load_dword v1, off, s[0:3], s32 offset:4
				; GFX9-FLATSCR-NEXT: scratch_load_ubyte v0, off, s32
				; GFX9-FLATSCR-NEXT: scratch_load_dword v1, off, s32 offset:4
	define void @void_func_byval_struct_i8_i32_ptr_value({ i8, i32 } addrspace(5)* byval %arg0) #0 {			define void @void_func_byval_struct_i8_i32_ptr_value({ i8, i32 } addrspace(5)* byval %arg0) #0 {
	%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1
	%load0 = load i8, i8 addrspace(5)* %gep0			%load0 = load i8, i8 addrspace(5)* %gep0
	%load1 = load i32, i32 addrspace(5)* %gep1			%load1 = load i32, i32 addrspace(5)* %gep1
	store volatile i8 %load0, i8 addrspace(3)* undef			store volatile i8 %load0, i8 addrspace(3)* undef
	store volatile i32 %load1, i32 addrspace(3)* undef			store volatile i32 %load1, i32 addrspace(3)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_nonentry_block:			; GCN-LABEL: {{^}}void_func_byval_struct_i8_i32_ptr_nonentry_block:

	; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6			; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6

	; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32			; GFX9-MUBUF: v_lshrrev_b32_e64 [[SP:v[0-9]+]], 6, s32
				; GFX9-FLATSCR: v_mov_b32_e32 [[SP:v[0-9]+]], s32

	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64

	; CI: v_add_i32_e32 [[GEP:v[0-9]+]], vcc, 4, [[SHIFT]]			; CI: v_add_i32_e32 [[GEP:v[0-9]+]], vcc, 4, [[SHIFT]]
	; CI: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}			; CI: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}

	; GFX9: v_add_u32_e32 [[GEP:v[0-9]+]], 4, [[SHIFT]]			; GFX9: v_add_u32_e32 [[GEP:v[0-9]+]], 4, [[SP]]
	; GFX9: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}			; GFX9-MUBUF: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4{{$}}
				; GFX9-FLATSCR: scratch_load_dword v{{[0-9]+}}, [[SP]], off offset:4{{$}}

	; GCN: ds_write_b32 v{{[0-9]+}}, [[GEP]]			; GCN: ds_write_b32 v{{[0-9]+}}, [[GEP]]
	define void @void_func_byval_struct_i8_i32_ptr_nonentry_block({ i8, i32 } addrspace(5)* byval %arg0, i32 %arg2) #0 {			define void @void_func_byval_struct_i8_i32_ptr_nonentry_block({ i8, i32 } addrspace(5)* byval %arg0, i32 %arg2) #0 {
	%cmp = icmp eq i32 %arg2, 0			%cmp = icmp eq i32 %arg2, 0
	br i1 %cmp, label %bb, label %ret			br i1 %cmp, label %bb, label %ret

	bb:			bb:
	%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0			%gep0 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 0
	%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1			%gep1 = getelementptr inbounds { i8, i32 }, { i8, i32 } addrspace(5)* %arg0, i32 0, i32 1
	%load1 = load volatile i32, i32 addrspace(5)* %gep1			%load1 = load volatile i32, i32 addrspace(5)* %gep1
	store volatile i32 addrspace(5)* %gep1, i32 addrspace(5)* addrspace(3)* undef			store volatile i32 addrspace(5)* %gep1, i32 addrspace(5)* addrspace(3)* undef
	br label %ret			br label %ret

	ret:			ret:
	ret void			ret void
	}			}

	; Added offset can't be used with VOP3 add			; Added offset can't be used with VOP3 add
	; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32:			; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32:

	; CI-DAG: s_movk_i32 [[K:s[0-9]+\|vcc_lo\|vcc_hi]], 0x200			; CI-DAG: s_movk_i32 [[K:s[0-9]+\|vcc_lo\|vcc_hi]], 0x200
	; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6			; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
	; CI: v_add_i32_e32 [[VZ:v[0-9]+]], vcc, [[K]], [[SCALED]]			; CI: v_add_i32_e32 [[VZ:v[0-9]+]], vcc, [[K]], [[SCALED]]

	; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32			; GFX9-MUBUF-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
	; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]			; GFX9-MUBUF: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]

				; GFX9-FLATSCR-DAG: s_add_u32 [[SZ:[^,]+]], s32, 0x200
				; GFX9-FLATSCR: v_mov_b32_e32 [[VZ:v[0-9]+]], [[SZ]]

	; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]			; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]
	; GCN: ds_write_b32 v0, [[VZ]]			; GCN: ds_write_b32 v0, [[VZ]]
	define void @func_other_fi_user_non_inline_imm_offset_i32() #0 {			define void @func_other_fi_user_non_inline_imm_offset_i32() #0 {
	%alloca0 = alloca [128 x i32], align 4, addrspace(5)			%alloca0 = alloca [128 x i32], align 4, addrspace(5)
	%alloca1 = alloca [8 x i32], align 4, addrspace(5)			%alloca1 = alloca [8 x i32], align 4, addrspace(5)
	%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65			%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65
	%gep1 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca1, i32 0, i32 0			%gep1 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca1, i32 0, i32 0
	store volatile i32 7, i32 addrspace(5)* %gep0			store volatile i32 7, i32 addrspace(5)* %gep0
	%ptrtoint = ptrtoint i32 addrspace(5)* %gep1 to i32			%ptrtoint = ptrtoint i32 addrspace(5)* %gep1 to i32
	%mul = mul i32 %ptrtoint, 9			%mul = mul i32 %ptrtoint, 9
	store volatile i32 %mul, i32 addrspace(3)* undef			store volatile i32 %mul, i32 addrspace(3)* undef
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32_vcc_live:			; GCN-LABEL: {{^}}func_other_fi_user_non_inline_imm_offset_i32_vcc_live:

	; CI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x200			; CI-DAG: s_movk_i32 [[OFFSET:s[0-9]+]], 0x200
	; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6			; CI-DAG: v_lshr_b32_e64 [[SCALED:v[0-9]+]], s32, 6
	; CI: v_add_i32_e64 [[VZ:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, [[OFFSET]], [[SCALED]]			; CI: v_add_i32_e64 [[VZ:v[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, [[OFFSET]], [[SCALED]]

	; GFX9-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32			; GFX9-MUBUF-DAG: v_lshrrev_b32_e64 [[SCALED:v[0-9]+]], 6, s32
	; GFX9: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]			; GFX9-MUBUF: v_add_u32_e32 [[VZ:v[0-9]+]], 0x200, [[SCALED]]

				; GFX9-FLATSCR-DAG: s_add_u32 [[SZ:[^,]+]], s32, 0x200
				; GFX9-FLATSCR: v_mov_b32_e32 [[VZ:v[0-9]+]], [[SZ]]

	; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]			; GCN: v_mul_u32_u24_e32 [[VZ]], 9, [[VZ]]
	; GCN: ds_write_b32 v0, [[VZ]]			; GCN: ds_write_b32 v0, [[VZ]]
	define void @func_other_fi_user_non_inline_imm_offset_i32_vcc_live() #0 {			define void @func_other_fi_user_non_inline_imm_offset_i32_vcc_live() #0 {
	%alloca0 = alloca [128 x i32], align 4, addrspace(5)			%alloca0 = alloca [128 x i32], align 4, addrspace(5)
	%alloca1 = alloca [8 x i32], align 4, addrspace(5)			%alloca1 = alloca [8 x i32], align 4, addrspace(5)
	%vcc = call i64 asm sideeffect "; def $0", "={vcc}"()			%vcc = call i64 asm sideeffect "; def $0", "={vcc}"()
	%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65			%gep0 = getelementptr inbounds [128 x i32], [128 x i32] addrspace(5)* %alloca0, i32 0, i32 65
	%gep1 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca1, i32 0, i32 0			%gep1 = getelementptr inbounds [8 x i32], [8 x i32] addrspace(5)* %alloca1, i32 0, i32 0
	store volatile i32 7, i32 addrspace(5)* %gep0			store volatile i32 7, i32 addrspace(5)* %gep0
	call void asm sideeffect "; use $0", "{vcc}"(i64 %vcc)			call void asm sideeffect "; use $0", "{vcc}"(i64 %vcc)
	%ptrtoint = ptrtoint i32 addrspace(5)* %gep1 to i32			%ptrtoint = ptrtoint i32 addrspace(5)* %gep1 to i32
	%mul = mul i32 %ptrtoint, 9			%mul = mul i32 %ptrtoint, 9
	store volatile i32 %mul, i32 addrspace(3)* undef			store volatile i32 %mul, i32 addrspace(3)* undef
	ret void			ret void
	}			}

	declare void @func(<4 x float> addrspace(5)* nocapture) #0			declare void @func(<4 x float> addrspace(5)* nocapture) #0

	; undef flag not preserved in eliminateFrameIndex when handling the			; undef flag not preserved in eliminateFrameIndex when handling the
	; stores in the middle block.			; stores in the middle block.

	; GCN-LABEL: {{^}}undefined_stack_store_reg:			; GCN-LABEL: {{^}}undefined_stack_store_reg:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN: buffer_store_dword v0, off, s[0:3], s33 offset:			; MUBUF: buffer_store_dword v0, off, s[0:3], s33 offset:
	; GCN: buffer_store_dword v0, off, s[0:3], s33 offset:			; MUBUF: buffer_store_dword v0, off, s[0:3], s33 offset:
	; GCN: buffer_store_dword v0, off, s[0:3], s33 offset:			; MUBUF: buffer_store_dword v0, off, s[0:3], s33 offset:
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:			; MUBUF: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s33 offset:
				; FLATSCR: scratch_store_dword v0, off, s33 offset:
				; FLATSCR: scratch_store_dword v0, off, s33 offset:
				; FLATSCR: scratch_store_dword v0, off, s33 offset:
				; FLATSCR: scratch_store_dword v{{[0-9]+}}, off, s33 offset:
	define void @undefined_stack_store_reg(float %arg, i32 %arg1) #0 {			define void @undefined_stack_store_reg(float %arg, i32 %arg1) #0 {
	bb:			bb:
	%tmp = alloca <4 x float>, align 16, addrspace(5)			%tmp = alloca <4 x float>, align 16, addrspace(5)
	%tmp2 = insertelement <4 x float> undef, float %arg, i32 0			%tmp2 = insertelement <4 x float> undef, float %arg, i32 0
	store <4 x float> %tmp2, <4 x float> addrspace(5)* undef			store <4 x float> %tmp2, <4 x float> addrspace(5)* undef
	%tmp3 = icmp eq i32 %arg1, 0			%tmp3 = icmp eq i32 %arg1, 0
	br i1 %tmp3, label %bb4, label %bb5			br i1 %tmp3, label %bb4, label %bb5

	bb4:			bb4:
	call void @func(<4 x float> addrspace(5)* nonnull undef)			call void @func(<4 x float> addrspace(5)* nonnull undef)
	store <4 x float> %tmp2, <4 x float> addrspace(5)* %tmp, align 16			store <4 x float> %tmp2, <4 x float> addrspace(5)* %tmp, align 16
	call void @func(<4 x float> addrspace(5)* nonnull %tmp)			call void @func(<4 x float> addrspace(5)* nonnull %tmp)
	br label %bb5			br label %bb5

	bb5:			bb5:
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}alloca_ptr_nonentry_block:			; GCN-LABEL: {{^}}alloca_ptr_nonentry_block:
	; GCN: s_and_saveexec_b64			; GCN: s_and_saveexec_b64
	; GCN: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4			; MUBUF: buffer_load_dword v{{[0-9]+}}, off, s[0:3], s32 offset:4
				; FLATSCR: scratch_load_dword v{{[0-9]+}}, off, s32 offset:4

	; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6			; CI: v_lshr_b32_e64 [[SHIFT:v[0-9]+]], s32, 6
	; CI-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]			; CI-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]

	; GFX9: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32			; GFX9-MUBUF: v_lshrrev_b32_e64 [[SHIFT:v[0-9]+]], 6, s32
	; GFX9-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]			; GFX9-MUBUF-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SHIFT]]

				; GFX9-FLATSCR: v_mov_b32_e32 [[SP:v[0-9]+]], s32
				; GFX9-FLATSCR-NEXT: v_or_b32_e32 [[PTR:v[0-9]+]], 4, [[SP]]

	; GCN: ds_write_b32 v{{[0-9]+}}, [[PTR]]			; GCN: ds_write_b32 v{{[0-9]+}}, [[PTR]]
	define void @alloca_ptr_nonentry_block(i32 %arg0) #0 {			define void @alloca_ptr_nonentry_block(i32 %arg0) #0 {
	%alloca0 = alloca { i8, i32 }, align 4, addrspace(5)			%alloca0 = alloca { i8, i32 }, align 4, addrspace(5)
	%cmp = icmp eq i32 %arg0, 0			%cmp = icmp eq i32 %arg0, 0
	br i1 %cmp, label %bb, label %ret			br i1 %cmp, label %bb, label %ret

	bb:			bb:
	Show All 11 Lines

llvm/test/CodeGen/AMDGPU/load-hi16.ll

; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900,GFX900-MUBUF %s
; RUN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX906,NO-D16-HI %s		; RUN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX906,NO-D16-HI %s
; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX803,NO-D16-HI %s		; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX803,NO-D16-HI %s
		; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900,GFX900-FLATSCR %s

; GCN-LABEL: {{^}}load_local_lo_hi_v2i16_multi_use_lo:		; GCN-LABEL: {{^}}load_local_lo_hi_v2i16_multi_use_lo:
; GFX900: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: ds_read_u16 v2, v0		; GFX900-NEXT: ds_read_u16 v2, v0
; GFX900-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0		; GFX900-DAG: v_mov_b32_e32 [[ZERO:v[0-9]+]], 0
; GFX900-DAG: s_waitcnt lgkmcnt(0)		; GFX900-DAG: s_waitcnt lgkmcnt(0)
; GFX900-DAG: v_mov_b32_e32 v1, v2		; GFX900-DAG: v_mov_b32_e32 v1, v2
; GFX900-DAG: ds_read_u16_d16_hi v1, v0 offset:16		; GFX900-DAG: ds_read_u16_d16_hi v1, v0 offset:16
▲ Show 20 Lines • Show All 476 Lines • ▼ Show 20 Lines	entry:
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %bitcast, i32 1		%build1 = insertelement <2 x half> %build0, half %bitcast, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_short_d16_hi v0, off, s[0:3], s32 offset:4094{{$}}		; GFX900-MUBUF: buffer_load_short_d16_hi v0, off, s[0:3], s32 offset:4094{{$}}
		; GFX900-FLATSCR: scratch_load_short_d16_hi v0, off, s32 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], s32 offset:4094{{$}}		; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], s32 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg(i16 addrspace(5)* byval %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg(i16 addrspace(5)* byval %in, i16 %reg) #0 {
entry:		entry:
%gep = getelementptr inbounds i16, i16 addrspace(5)* %in, i64 2047		%gep = getelementptr inbounds i16, i16 addrspace(5)* %in, i64 2047
%load = load i16, i16 addrspace(5)* %gep		%load = load i16, i16 addrspace(5)* %gep
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_short_d16_hi v0, off, s[0:3], s32 offset:4094{{$}}		; GFX900-MUBUF: buffer_load_short_d16_hi v0, off, s[0:3], s32 offset:4094{{$}}
		; GFX900-FLATSCR: scratch_load_short_d16_hi v0, off, s32 offset:4094{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], s32 offset:4094{{$}}		; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], s32 offset:4094{{$}}
define void @load_private_hi_v2f16_reglo_vreg(half addrspace(5)* byval %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg(half addrspace(5)* byval %in, half %reg) #0 {
entry:		entry:
%gep = getelementptr inbounds half, half addrspace(5)* %in, i64 2047		%gep = getelementptr inbounds half, half addrspace(5)* %in, i64 2047
%load = load half, half addrspace(5)* %gep		%load = load half, half addrspace(5)* %gep
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %load, i32 1		%build1 = insertelement <2 x half> %build0, half %load, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_short_d16_hi v0, off, s[0:3], 0 offset:4094{{$}}		; GFX900-MUBUFF: buffer_load_short_d16_hi v0, off, s[0:3], 0 offset:4094{{$}}
		; GFX900-FLATSCR: s_movk_i32 [[SOFF:[^,]+]], 0xffe
		; GFX900-FLATSCR: scratch_load_short_d16_hi v0, off, [[SOFF]]{{$}}
; GFX900: s_waitcnt		; GFX900: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], 0 offset:4094{{$}}		; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff(i16 addrspace(5)* byval %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff(i16 addrspace(5)* byval %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_short_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}		; GFX900-MUBUF-NEXT: buffer_load_short_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
		; GFX900-FLATSCR-NEXT: s_movk_i32 [[SOFF:[^,]+]], 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16_hi v1, off, [[SOFF]]{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], 0 offset:4094{{$}}		; NO-D16-HI: buffer_load_ushort v{{[0-9]+}}, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, half %reg) #0 {
entry:		entry:
%load = load volatile half, half addrspace(5)* inttoptr (i32 4094 to half addrspace(5)*)		%load = load volatile half, half addrspace(5)* inttoptr (i32 4094 to half addrspace(5)*)
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %load, i32 1		%build1 = insertelement <2 x half> %build0, half %load, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_zexti8:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_zexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_ubyte_d16_hi v0, off, s[0:3], s32 offset:4095{{$}}		; GFX900-MUBUF: buffer_load_ubyte_d16_hi v0, off, s[0:3], s32 offset:4095{{$}}
		; GFX900-FLATSCR: scratch_load_ubyte_d16_hi v0, off, s32 offset:4095{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ubyte v{{[0-9]+}}, off, s[0:3], s32 offset:4095{{$}}		; NO-D16-HI: buffer_load_ubyte v{{[0-9]+}}, off, s[0:3], s32 offset:4095{{$}}
define void @load_private_hi_v2i16_reglo_vreg_zexti8(i8 addrspace(5)* byval %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_zexti8(i8 addrspace(5)* byval %in, i16 %reg) #0 {
entry:		entry:
%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095		%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_zexti8:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_zexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_ubyte_d16_hi v0, off, s[0:3], s32 offset:4095{{$}}		; GFX900-MUBUF: buffer_load_ubyte_d16_hi v0, off, s[0:3], s32 offset:4095{{$}}
		; GFX900-FLATSCR: scratch_load_ubyte_d16_hi v0, off, s32 offset:4095{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ubyte v{{[0-9]+}}, off, s[0:3], s32 offset:4095{{$}}		; NO-D16-HI: buffer_load_ubyte v{{[0-9]+}}, off, s[0:3], s32 offset:4095{{$}}
define void @load_private_hi_v2f16_reglo_vreg_zexti8(i8 addrspace(5)* byval %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg_zexti8(i8 addrspace(5)* byval %in, half %reg) #0 {
entry:		entry:
%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095		%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%bitcast = bitcast i16 %ext to half		%bitcast = bitcast i16 %ext to half
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %bitcast, i32 1		%build1 = insertelement <2 x half> %build0, half %bitcast, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_sexti8:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_sexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_sbyte_d16_hi v0, off, s[0:3], s32 offset:4095{{$}}		; GFX900-MUBUF: buffer_load_sbyte_d16_hi v0, off, s[0:3], s32 offset:4095{{$}}
		; GFX900-FLATSCR: scratch_load_sbyte_d16_hi v0, off, s32 offset:4095{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_sbyte v{{[0-9]+}}, off, s[0:3], s32 offset:4095{{$}}		; NO-D16-HI: buffer_load_sbyte v{{[0-9]+}}, off, s[0:3], s32 offset:4095{{$}}
define void @load_private_hi_v2f16_reglo_vreg_sexti8(i8 addrspace(5)* byval %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg_sexti8(i8 addrspace(5)* byval %in, half %reg) #0 {
entry:		entry:
%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095		%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%bitcast = bitcast i16 %ext to half		%bitcast = bitcast i16 %ext to half
%build0 = insertelement <2 x half> undef, half %reg, i32 0		%build0 = insertelement <2 x half> undef, half %reg, i32 0
%build1 = insertelement <2 x half> %build0, half %bitcast, i32 1		%build1 = insertelement <2 x half> %build0, half %bitcast, i32 1
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_sexti8:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_sexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_sbyte_d16_hi v0, off, s[0:3], s32 offset:4095{{$}}		; GFX900-MUBUF: buffer_load_sbyte_d16_hi v0, off, s[0:3], s32 offset:4095{{$}}
		; GFX900-FLATSCR: scratch_load_sbyte_d16_hi v0, off, s32 offset:4095{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v0
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_sbyte v{{[0-9]+}}, off, s[0:3], s32 offset:4095{{$}}		; NO-D16-HI: buffer_load_sbyte v{{[0-9]+}}, off, s[0:3], s32 offset:4095{{$}}
define void @load_private_hi_v2i16_reglo_vreg_sexti8(i8 addrspace(5)* byval %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_sexti8(i8 addrspace(5)* byval %in, i16 %reg) #0 {
entry:		entry:
%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095		%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_zexti8:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_zexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}		; GFX900-MUBUF-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
		; GFX900-FLATSCR-NEXT: s_movk_i32 [[SOFF:[^,]+]], 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_ubyte_d16_hi v1, off, [[SOFF]]{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094{{$}}		; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_sexti8:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_nooff_sexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_sbyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}		; GFX900-MUBUF-NEXT: buffer_load_sbyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
		; GFX900-FLATSCR-NEXT: s_movk_i32 [[SOFF:[^,]+]], 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_sbyte_d16_hi v1, off, [[SOFF]]{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094{{$}}		; NO-D16-HI: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i16 %reg) #0 {
entry:		entry:
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff_zexti8:		; GCN-LABEL: {{^}}load_private_hi_v2f16_reglo_vreg_nooff_zexti8:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}		; GFX900-MUBUF-NEXT: buffer_load_ubyte_d16_hi v1, off, s[0:3], 0 offset:4094{{$}}
		; GFX900-FLATSCR-NEXT: s_movk_i32 [[SOFF:[^,]+]], 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_ubyte_d16_hi v1, off, [[SOFF]]{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1		; GFX900-NEXT: global_store_dword v{{\[[0-9]+:[0-9]+\]}}, v1
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64

; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094{{$}}		; NO-D16-HI: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094{{$}}
define void @load_private_hi_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, half %reg) #0 {		define void @load_private_hi_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, half %reg) #0 {
entry:		entry:
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	entry:
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

; Local object gives known offset, so requires converting from offen		; Local object gives known offset, so requires converting from offen
; to offset variant.		; to offset variant.

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_to_offset:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_to_offset:
; GFX900: buffer_store_dword		; GFX900-MUBUF: buffer_store_dword
; GFX900-NEXT: buffer_load_short_d16_hi v{{[0-9]+}}, off, s[0:3], s32 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_short_d16_hi v{{[0-9]+}}, off, s[0:3], s32 offset:4094
		; GFX900-FLATSCR: scratch_store_dword
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16_hi v{{[0-9]+}}, off, s32 offset:4094
define void @load_private_hi_v2i16_reglo_vreg_to_offset(i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_to_offset(i16 %reg) #0 {
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i16], align 2, addrspace(5)		%obj1 = alloca [4096 x i16], align 2, addrspace(5)
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%gep = getelementptr inbounds [4096 x i16], [4096 x i16] addrspace(5)* %obj1, i32 0, i32 2027		%gep = getelementptr inbounds [4096 x i16], [4096 x i16] addrspace(5)* %obj1, i32 0, i32 2027
%load = load i16, i16 addrspace(5)* %gep		%load = load i16, i16 addrspace(5)* %gep
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %load, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_sexti8_to_offset:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_sexti8_to_offset:
; GFX900: buffer_store_dword		; GFX900-MUBUF: buffer_store_dword
; GFX900-NEXT: buffer_load_sbyte_d16_hi v{{[0-9]+}}, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_load_sbyte_d16_hi v{{[0-9]+}}, off, s[0:3], s32 offset:4095
		; GFX900-FLATSCR: scratch_store_dword
		; GFX900-FLATSCR-NEXT: scratch_load_sbyte_d16_hi v{{[0-9]+}}, off, s32 offset:4095
define void @load_private_hi_v2i16_reglo_vreg_sexti8_to_offset(i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_sexti8_to_offset(i16 %reg) #0 {
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i8], align 2, addrspace(5)		%obj1 = alloca [4096 x i8], align 2, addrspace(5)
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055		%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 0
%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %ext, i32 1
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_zexti8_to_offset:		; GCN-LABEL: {{^}}load_private_hi_v2i16_reglo_vreg_zexti8_to_offset:
; GFX900: buffer_store_dword		; GFX900-MUBUF: buffer_store_dword
; GFX900-NEXT: buffer_load_ubyte_d16_hi v{{[0-9]+}}, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_load_ubyte_d16_hi v{{[0-9]+}}, off, s[0:3], s32 offset:4095
		; GFX900-FLATSCR: scratch_store_dword
		; GFX900-FLATSCR-NEXT: scratch_load_ubyte_d16_hi v{{[0-9]+}}, off, s32 offset:4095
define void @load_private_hi_v2i16_reglo_vreg_zexti8_to_offset(i16 %reg) #0 {		define void @load_private_hi_v2i16_reglo_vreg_zexti8_to_offset(i16 %reg) #0 {
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i8], align 2, addrspace(5)		%obj1 = alloca [4096 x i8], align 2, addrspace(5)
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055		%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	entry:
%build1 = insertelement <2 x i16> %build0, i16 %load1, i32 1		%build1 = insertelement <2 x i16> %build0, i16 %load1, i32 1
ret <2 x i16> %build1		ret <2 x i16> %build1
}		}

; FIXME: Remove m0 init and waitcnt between reads		; FIXME: Remove m0 init and waitcnt between reads
; FIXME: Is there a cost to using the extload over not?		; FIXME: Is there a cost to using the extload over not?
; GCN-LABEL: {{^}}load_private_v2i16_split:		; GCN-LABEL: {{^}}load_private_v2i16_split:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_load_ushort v0, off, s[0:3], s32{{$}}		; GFX900-MUBUF: buffer_load_ushort v0, off, s[0:3], s32{{$}}
		; GFX900-FLATSCR: scratch_load_ushort v0, off, s32{{$}}
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], s32 offset:2		; GFX900-MUBUF-NEXT: buffer_load_short_d16_hi v0, off, s[0:3], s32 offset:2
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16_hi v0, off, s32 offset:2
; GFX900-NEXT: s_waitcnt		; GFX900-NEXT: s_waitcnt
; GFX900-NEXT: s_setpc_b64		; GFX900-NEXT: s_setpc_b64
define <2 x i16> @load_private_v2i16_split(i16 addrspace(5)* byval %in) #0 {		define <2 x i16> @load_private_v2i16_split(i16 addrspace(5)* byval %in) #0 {
entry:		entry:
%gep = getelementptr inbounds i16, i16 addrspace(5)* %in, i32 1		%gep = getelementptr inbounds i16, i16 addrspace(5)* %in, i32 1
%load0 = load volatile i16, i16 addrspace(5)* %in		%load0 = load volatile i16, i16 addrspace(5)* %in
%load1 = load volatile i16, i16 addrspace(5)* %gep		%load1 = load volatile i16, i16 addrspace(5)* %gep
%build0 = insertelement <2 x i16> undef, i16 %load0, i32 0		%build0 = insertelement <2 x i16> undef, i16 %load0, i32 0
Show All 25 Lines

llvm/test/CodeGen/AMDGPU/load-lo16.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900 %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX900,GFX900-MUBUF %s
; RUN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca,+sram-ecc -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX906,NO-D16-HI %s		; RUN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca,+sram-ecc -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX906,NO-D16-HI %s
; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX803,NO-D16-HI %s		; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX803,NO-D16-HI %s
		; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs --amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=GCN,GFX900,GFX900-FLATSCR %s

define <2 x i16> @load_local_lo_v2i16_undeflo(i16 addrspace(3)* %in) #0 {		define <2 x i16> @load_local_lo_v2i16_undeflo(i16 addrspace(3)* %in) #0 {
; GFX900-LABEL: load_local_lo_v2i16_undeflo:		; GFX900-LABEL: load_local_lo_v2i16_undeflo:
; GFX900: ; %bb.0: ; %entry		; GFX900: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: ds_read_u16_d16 v0, v0		; GFX900-NEXT: ds_read_u16_d16 v0, v0
; GFX900-NEXT: s_waitcnt lgkmcnt(0)		; GFX900-NEXT: s_waitcnt lgkmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-NEXT: s_setpc_b64 s[30:31]
▲ Show 20 Lines • Show All 1,159 Lines • ▼ Show 20 Lines	entry:
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%bitcast = bitcast i16 %ext to half		%bitcast = bitcast i16 %ext to half
%build1 = insertelement <2 x half> %reg.bc, half %bitcast, i32 0		%build1 = insertelement <2 x half> %reg.bc, half %bitcast, i32 0
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg(i16 addrspace(5)* byval %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg(i16 addrspace(5)* byval %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v0, off, s[0:3], s32 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_short_d16 v0, off, s[0:3], s32 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX906-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v1, v0		; GFX906-NEXT: v_bfi_b32 v0, v2, v1, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX803-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0		; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v1, v0		; GFX803-NEXT: v_or_b32_e32 v0, v1, v0
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16 v0, off, s32 offset:4094
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%gep = getelementptr inbounds i16, i16 addrspace(5)* %in, i64 2047		%gep = getelementptr inbounds i16, i16 addrspace(5)* %in, i64 2047
%load = load i16, i16 addrspace(5)* %gep		%load = load i16, i16 addrspace(5)* %gep
%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reghi_vreg(i16 addrspace(5)* byval %in, i16 %reg) #0 {		define void @load_private_lo_v2i16_reghi_vreg(i16 addrspace(5)* byval %in, i16 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reghi_vreg:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reghi_vreg:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: v_and_b32_e32 v1, 0xffff, v1		; GFX900-MUBUF-NEXT: v_and_b32_e32 v1, 0xffff, v1
; GFX900-NEXT: v_lshl_or_b32 v0, v0, 16, v1		; GFX900-MUBUF-NEXT: v_lshl_or_b32 v0, v0, 16, v1
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reghi_vreg:		; GFX906-LABEL: load_private_lo_v2i16_reghi_vreg:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX906-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_and_b32_e32 v1, 0xffff, v1		; GFX906-NEXT: v_and_b32_e32 v1, 0xffff, v1
; GFX906-NEXT: v_lshl_or_b32 v0, v0, 16, v1		; GFX906-NEXT: v_lshl_or_b32 v0, v0, 16, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reghi_vreg:		; GFX803-LABEL: load_private_lo_v2i16_reghi_vreg:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX803-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX803-NEXT: v_lshlrev_b32_e32 v0, 16, v0		; GFX803-NEXT: v_lshlrev_b32_e32 v0, 16, v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v1, v0		; GFX803-NEXT: v_or_b32_e32 v0, v1, v0
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reghi_vreg:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: scratch_load_ushort v1, off, s32 offset:4094
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: v_and_b32_e32 v1, 0xffff, v1
		; GFX900-FLATSCR-NEXT: v_lshl_or_b32 v0, v0, 16, v1
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%gep = getelementptr inbounds i16, i16 addrspace(5)* %in, i64 2047		%gep = getelementptr inbounds i16, i16 addrspace(5)* %in, i64 2047
%load = load i16, i16 addrspace(5)* %gep		%load = load i16, i16 addrspace(5)* %gep
%build0 = insertelement <2 x i16> undef, i16 %reg, i32 1		%build0 = insertelement <2 x i16> undef, i16 %reg, i32 1
%build1 = insertelement <2 x i16> %build0, i16 %load, i32 0		%build1 = insertelement <2 x i16> %build0, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg(half addrspace(5)* byval %in, i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg(half addrspace(5)* byval %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg:		; GFX900-MUBUF-LABEL: load_private_lo_v2f16_reglo_vreg:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v0, off, s[0:3], s32 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_short_d16 v0, off, s[0:3], s32 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX906-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX906-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; GFX906-NEXT: v_lshrrev_b32_e32 v0, 16, v0
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_and_b32_e32 v1, 0xffff, v1		; GFX906-NEXT: v_and_b32_e32 v1, 0xffff, v1
; GFX906-NEXT: v_lshl_or_b32 v0, v0, 16, v1		; GFX906-NEXT: v_lshl_or_b32 v0, v0, 16, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg:		; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX803-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0		; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v1, v0		; GFX803-NEXT: v_or_b32_e32 v0, v1, v0
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2f16_reglo_vreg:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16 v0, off, s32 offset:4094
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
%gep = getelementptr inbounds half, half addrspace(5)* %in, i64 2047		%gep = getelementptr inbounds half, half addrspace(5)* %in, i64 2047
%load = load half, half addrspace(5)* %gep		%load = load half, half addrspace(5)* %gep
%build1 = insertelement <2 x half> %reg.bc, half %load, i32 0		%build1 = insertelement <2 x half> %reg.bc, half %load, i32 0
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg_nooff:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: s_movk_i32 s4, 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16 v1, off, s4
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v1, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reghi_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reghi_vreg_nooff(i16 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reghi_vreg_nooff:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: s_movk_i32 s4, 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16 v1, off, s4
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v1, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)		%load = load volatile i16, i16 addrspace(5)* inttoptr (i32 4094 to i16 addrspace(5)*)
%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg_nooff(half addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX900-MUBUF-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_short_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094		; GFX906-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0		; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0
; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0		; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:		; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094		; GFX803-NEXT: buffer_load_ushort v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v0, v1		; GFX803-NEXT: v_or_b32_e32 v0, v0, v1
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2f16_reglo_vreg_nooff:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: s_movk_i32 s4, 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16 v1, off, s4
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v1, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
%load = load volatile half, half addrspace(5)* inttoptr (i32 4094 to half addrspace(5)*)		%load = load volatile half, half addrspace(5)* inttoptr (i32 4094 to half addrspace(5)*)
%build1 = insertelement <2 x half> %reg.bc, half %load, i32 0		%build1 = insertelement <2 x half> %reg.bc, half %load, i32 0
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_zexti8(i8 addrspace(5)* byval %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_zexti8(i8 addrspace(5)* byval %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_ubyte_d16 v0, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_load_ubyte_d16 v0, off, s[0:3], s32 offset:4095
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095		; GFX906-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v1, v0		; GFX906-NEXT: v_bfi_b32 v0, v2, v1, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v0
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: scratch_load_ubyte_d16 v0, off, s32 offset:4095
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095		%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_sexti8(i8 addrspace(5)* byval %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_sexti8(i8 addrspace(5)* byval %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_sbyte_d16 v0, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_load_sbyte_d16 v0, off, s[0:3], s32 offset:4095
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095		; GFX906-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v1, v0		; GFX906-NEXT: v_bfi_b32 v0, v2, v1, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095		; GFX803-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095
; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0		; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_sdwa v0, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; GFX803-NEXT: v_or_b32_sdwa v0, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: scratch_load_sbyte_d16 v0, off, s32 offset:4095
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095		%gep = getelementptr inbounds i8, i8 addrspace(5)* %in, i64 4095
%load = load i8, i8 addrspace(5)* %gep		%load = load i8, i8 addrspace(5)* %gep
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], 0 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094		; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], 0 offset:4094		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], 0 offset:4094
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_zexti8:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: s_movk_i32 s4, 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_ubyte_d16 v1, off, s4
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v1, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_nooff_sexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_sbyte_d16 v1, off, s[0:3], 0 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_sbyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094		; GFX906-NEXT: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1		; GFX906-NEXT: v_bfi_b32 v0, v2, v0, v1
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:		; GFX803-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094		; GFX803-NEXT: buffer_load_sbyte v0, off, s[0:3], 0 offset:4094
; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1		; GFX803-NEXT: v_and_b32_e32 v1, 0xffff0000, v1
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; GFX803-NEXT: v_or_b32_sdwa v0, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg_nooff_sexti8:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: s_movk_i32 s4, 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_sbyte_d16 v1, off, s4
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v1, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg_nooff_zexti8(i8 addrspace(5)* %in, i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX900-MUBUF-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], 0 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_ubyte_d16 v1, off, s[0:3], 0 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v1, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v1, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094		; GFX906-NEXT: buffer_load_ubyte v0, off, s[0:3], 0 offset:4094
; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; GFX906-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0		; GFX906-NEXT: v_and_b32_e32 v0, 0xffff, v0
; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0		; GFX906-NEXT: v_lshl_or_b32 v0, v1, 16, v0
; GFX906-NEXT: global_store_dword v[0:1], v0, off		; GFX906-NEXT: global_store_dword v[0:1], v0, off
; GFX906-NEXT: s_waitcnt vmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0)
; GFX906-NEXT: s_setpc_b64 s[30:31]		; GFX906-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:		; GFX803-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
; GFX803: ; %bb.0: ; %entry		; GFX803: ; %bb.0: ; %entry
; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v1
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], 0 offset:4094		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], 0 offset:4094
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2f16_reglo_vreg_nooff_zexti8:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: s_movk_i32 s4, 0xffe
		; GFX900-FLATSCR-NEXT: scratch_load_ubyte_d16 v1, off, s4
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v1, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)		%load = load volatile i8, i8 addrspace(5)* inttoptr (i32 4094 to i8 addrspace(5)*)
%ext = zext i8 %load to i16		%ext = zext i8 %load to i16
%bc.ext = bitcast i16 %ext to half		%bc.ext = bitcast i16 %ext to half
%build1 = insertelement <2 x half> %reg.bc, half %bc.ext, i32 0		%build1 = insertelement <2 x half> %reg.bc, half %bc.ext, i32 0
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	entry:
%ext = sext i8 %load to i16		%ext = sext i8 %load to i16
%bitcast = bitcast i16 %ext to half		%bitcast = bitcast i16 %ext to half
%build1 = insertelement <2 x half> %reg.bc, half %bitcast, i32 0		%build1 = insertelement <2 x half> %reg.bc, half %bitcast, i32 0
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_to_offset(i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_to_offset(i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_to_offset:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg_to_offset:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX900-MUBUF-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX900-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX900-MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX900-NEXT: buffer_load_short_d16 v0, off, s[0:3], s32 offset:4094		; GFX900-MUBUF-NEXT: buffer_load_short_d16 v0, off, s[0:3], s32 offset:4094
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_to_offset:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_to_offset:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX906-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX906-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
Show All 10 Lines
; GFX803-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX803-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX803-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094		; GFX803-NEXT: buffer_load_ushort v1, off, s[0:3], s32 offset:4094
; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0		; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_e32 v0, v1, v0		; GFX803-NEXT: v_or_b32_e32 v0, v1, v0
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg_to_offset:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: v_mov_b32_e32 v1, 0x7b
		; GFX900-FLATSCR-NEXT: scratch_store_dword off, v1, s32
		; GFX900-FLATSCR-NEXT: scratch_load_short_d16 v0, off, s32 offset:4094
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i16], align 2, addrspace(5)		%obj1 = alloca [4096 x i16], align 2, addrspace(5)
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%gep = getelementptr inbounds [4096 x i16], [4096 x i16] addrspace(5)* %obj1, i32 0, i32 2027		%gep = getelementptr inbounds [4096 x i16], [4096 x i16] addrspace(5)* %obj1, i32 0, i32 2027
%load = load volatile i16, i16 addrspace(5)* %gep		%load = load volatile i16, i16 addrspace(5)* %gep
%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_sexti8_to_offset(i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_sexti8_to_offset(i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8_to_offset:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8_to_offset:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX900-MUBUF-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX900-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX900-MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX900-NEXT: buffer_load_sbyte_d16 v0, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_load_sbyte_d16 v0, off, s[0:3], s32 offset:4095
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8_to_offset:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8_to_offset:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX906-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095		; GFX906-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
Show All 10 Lines
; GFX803-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX803-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX803-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095		; GFX803-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095
; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0		; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_sdwa v0, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; GFX803-NEXT: v_or_b32_sdwa v0, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg_sexti8_to_offset:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: v_mov_b32_e32 v1, 0x7b
		; GFX900-FLATSCR-NEXT: scratch_store_dword off, v1, s32
		; GFX900-FLATSCR-NEXT: scratch_load_sbyte_d16 v0, off, s32 offset:4095
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i8], align 2, addrspace(5)		%obj1 = alloca [4096 x i8], align 2, addrspace(5)
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055		%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055
%load = load volatile i8, i8 addrspace(5)* %gep		%load = load volatile i8, i8 addrspace(5)* %gep
%load.ext = sext i8 %load to i16		%load.ext = sext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %load.ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load.ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2i16_reglo_vreg_zexti8_to_offset(i32 %reg) #0 {		define void @load_private_lo_v2i16_reglo_vreg_zexti8_to_offset(i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8_to_offset:		; GFX900-MUBUF-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8_to_offset:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX900-MUBUF-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX900-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX900-MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX900-NEXT: buffer_load_ubyte_d16 v0, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_load_ubyte_d16 v0, off, s[0:3], s32 offset:4095
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8_to_offset:		; GFX906-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8_to_offset:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX906-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095		; GFX906-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095
; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff		; GFX906-NEXT: v_mov_b32_e32 v2, 0xffff
Show All 11 Lines
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v0
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2i16_reglo_vreg_zexti8_to_offset:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: v_mov_b32_e32 v1, 0x7b
		; GFX900-FLATSCR-NEXT: scratch_store_dword off, v1, s32
		; GFX900-FLATSCR-NEXT: scratch_load_ubyte_d16 v0, off, s32 offset:4095
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i8], align 2, addrspace(5)		%obj1 = alloca [4096 x i8], align 2, addrspace(5)
%reg.bc = bitcast i32 %reg to <2 x i16>		%reg.bc = bitcast i32 %reg to <2 x i16>
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055		%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055
%load = load volatile i8, i8 addrspace(5)* %gep		%load = load volatile i8, i8 addrspace(5)* %gep
%load.ext = zext i8 %load to i16		%load.ext = zext i8 %load to i16
%build1 = insertelement <2 x i16> %reg.bc, i16 %load.ext, i32 0		%build1 = insertelement <2 x i16> %reg.bc, i16 %load.ext, i32 0
store <2 x i16> %build1, <2 x i16> addrspace(1)* undef		store <2 x i16> %build1, <2 x i16> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg_sexti8_to_offset(i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg_sexti8_to_offset(i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_sexti8_to_offset:		; GFX900-MUBUF-LABEL: load_private_lo_v2f16_reglo_vreg_sexti8_to_offset:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX900-MUBUF-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX900-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX900-MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX900-NEXT: buffer_load_sbyte_d16 v0, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_load_sbyte_d16 v0, off, s[0:3], s32 offset:4095
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_sexti8_to_offset:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_sexti8_to_offset:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX906-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095		; GFX906-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095
; GFX906-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; GFX906-NEXT: v_lshrrev_b32_e32 v0, 16, v0
Show All 11 Lines
; GFX803-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX803-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX803-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095		; GFX803-NEXT: buffer_load_sbyte v1, off, s[0:3], s32 offset:4095
; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0		; GFX803-NEXT: v_and_b32_e32 v0, 0xffff0000, v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_or_b32_sdwa v0, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD		; GFX803-NEXT: v_or_b32_sdwa v0, v1, v0 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2f16_reglo_vreg_sexti8_to_offset:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: v_mov_b32_e32 v1, 0x7b
		; GFX900-FLATSCR-NEXT: scratch_store_dword off, v1, s32
		; GFX900-FLATSCR-NEXT: scratch_load_sbyte_d16 v0, off, s32 offset:4095
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i8], align 2, addrspace(5)		%obj1 = alloca [4096 x i8], align 2, addrspace(5)
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055		%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055
%load = load volatile i8, i8 addrspace(5)* %gep		%load = load volatile i8, i8 addrspace(5)* %gep
%load.ext = sext i8 %load to i16		%load.ext = sext i8 %load to i16
%bitcast = bitcast i16 %load.ext to half		%bitcast = bitcast i16 %load.ext to half
%build1 = insertelement <2 x half> %reg.bc, half %bitcast, i32 0		%build1 = insertelement <2 x half> %reg.bc, half %bitcast, i32 0
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

define void @load_private_lo_v2f16_reglo_vreg_zexti8_to_offset(i32 %reg) #0 {		define void @load_private_lo_v2f16_reglo_vreg_zexti8_to_offset(i32 %reg) #0 {
; GFX900-LABEL: load_private_lo_v2f16_reglo_vreg_zexti8_to_offset:		; GFX900-MUBUF-LABEL: load_private_lo_v2f16_reglo_vreg_zexti8_to_offset:
; GFX900: ; %bb.0: ; %entry		; GFX900-MUBUF: ; %bb.0: ; %entry
; GFX900-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX900-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX900-MUBUF-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX900-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX900-MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX900-NEXT: buffer_load_ubyte_d16 v0, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_load_ubyte_d16 v0, off, s[0:3], s32 offset:4095
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: global_store_dword v[0:1], v0, off		; GFX900-MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GFX900-NEXT: s_waitcnt vmcnt(0)		; GFX900-MUBUF-NEXT: s_waitcnt vmcnt(0)
; GFX900-NEXT: s_setpc_b64 s[30:31]		; GFX900-MUBUF-NEXT: s_setpc_b64 s[30:31]
;		;
; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_zexti8_to_offset:		; GFX906-LABEL: load_private_lo_v2f16_reglo_vreg_zexti8_to_offset:
; GFX906: ; %bb.0: ; %entry		; GFX906: ; %bb.0: ; %entry
; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; GFX906-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b		; GFX906-NEXT: v_mov_b32_e32 v1, 0x7b
; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32		; GFX906-NEXT: buffer_store_dword v1, off, s[0:3], s32
; GFX906-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095		; GFX906-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095
; GFX906-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; GFX906-NEXT: v_lshrrev_b32_e32 v0, 16, v0
Show All 12 Lines
; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095		; GFX803-NEXT: buffer_load_ubyte v1, off, s[0:3], s32 offset:4095
; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; GFX803-NEXT: v_lshrrev_b32_e32 v0, 16, v0
; GFX803-NEXT: s_mov_b32 s4, 0x5040c00		; GFX803-NEXT: s_mov_b32 s4, 0x5040c00
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4		; GFX803-NEXT: v_perm_b32 v0, v0, v1, s4
; GFX803-NEXT: flat_store_dword v[0:1], v0		; GFX803-NEXT: flat_store_dword v[0:1], v0
; GFX803-NEXT: s_waitcnt vmcnt(0)		; GFX803-NEXT: s_waitcnt vmcnt(0)
; GFX803-NEXT: s_setpc_b64 s[30:31]		; GFX803-NEXT: s_setpc_b64 s[30:31]
		;
		; GFX900-FLATSCR-LABEL: load_private_lo_v2f16_reglo_vreg_zexti8_to_offset:
		; GFX900-FLATSCR: ; %bb.0: ; %entry
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; GFX900-FLATSCR-NEXT: v_mov_b32_e32 v1, 0x7b
		; GFX900-FLATSCR-NEXT: scratch_store_dword off, v1, s32
		; GFX900-FLATSCR-NEXT: scratch_load_ubyte_d16 v0, off, s32 offset:4095
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; GFX900-FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; GFX900-FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i8], align 2, addrspace(5)		%obj1 = alloca [4096 x i8], align 2, addrspace(5)
%reg.bc = bitcast i32 %reg to <2 x half>		%reg.bc = bitcast i32 %reg to <2 x half>
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055		%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055
%load = load volatile i8, i8 addrspace(5)* %gep		%load = load volatile i8, i8 addrspace(5)* %gep
%load.ext = zext i8 %load to i16		%load.ext = zext i8 %load to i16
%bitcast = bitcast i16 %load.ext to half		%bitcast = bitcast i16 %load.ext to half
%build1 = insertelement <2 x half> %reg.bc, half %bitcast, i32 0		%build1 = insertelement <2 x half> %reg.bc, half %bitcast, i32 0
store <2 x half> %build1, <2 x half> addrspace(1)* undef		store <2 x half> %build1, <2 x half> addrspace(1)* undef
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 < %s \| FileCheck -check-prefixes=GCN,GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 < %s \| FileCheck -check-prefixes=GCN,GFX9,MUBUF %s
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 --amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=GCN,GFX9,FLATSCR %s

	; Make sure we use the correct frame offset is used with the local			; Make sure we use the correct frame offset is used with the local
	; frame area.			; frame area.
	;			;
	; %pin.low is allocated to offset 0.			; %pin.low is allocated to offset 0.
	;			;
	; %local.area is assigned to the local frame offset by the			; %local.area is assigned to the local frame offset by the
	; LocalStackSlotAllocation pass at offset 4096.			; LocalStackSlotAllocation pass at offset 4096.
	;			;
	; The %load1 access to %gep.large.offset initially used the stack			; The %load1 access to %gep.large.offset initially used the stack
	; pointer register and directly referenced the frame index. After			; pointer register and directly referenced the frame index. After
	; LocalStackSlotAllocation, it would no longer refer to a frame index			; LocalStackSlotAllocation, it would no longer refer to a frame index
	; so eliminateFrameIndex would not adjust the access to use the			; so eliminateFrameIndex would not adjust the access to use the
	; correct FP offset.			; correct FP offset.

	define amdgpu_kernel void @local_stack_offset_uses_sp(i64 addrspace(1)* %out, i8 addrspace(1)* %in) {			define amdgpu_kernel void @local_stack_offset_uses_sp(i64 addrspace(1)* %out, i8 addrspace(1)* %in) {
	; GCN-LABEL: local_stack_offset_uses_sp:			; MUBUF-LABEL: local_stack_offset_uses_sp:
	; GCN: ; %bb.0: ; %entry			; MUBUF: ; %bb.0: ; %entry
	; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0			; MUBUF-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
	; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9			; MUBUF-NEXT: s_add_u32 flat_scratch_lo, s6, s9
	; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0			; MUBUF-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
	; GCN-NEXT: s_add_u32 s0, s0, s9			; MUBUF-NEXT: s_add_u32 s0, s0, s9
	; GCN-NEXT: v_mov_b32_e32 v1, 0x3000			; MUBUF-NEXT: v_mov_b32_e32 v1, 0x3000
	; GCN-NEXT: s_addc_u32 s1, s1, 0			; MUBUF-NEXT: s_addc_u32 s1, s1, 0
	; GCN-NEXT: v_add_u32_e32 v0, 64, v1			; MUBUF-NEXT: v_add_u32_e32 v0, 64, v1
	; GCN-NEXT: v_mov_b32_e32 v2, 0			; MUBUF-NEXT: v_mov_b32_e32 v2, 0
	; GCN-NEXT: v_mov_b32_e32 v3, 0x2000			; MUBUF-NEXT: v_mov_b32_e32 v3, 0x2000
	; GCN-NEXT: s_mov_b32 s6, 0			; MUBUF-NEXT: s_mov_b32 s6, 0
	; GCN-NEXT: buffer_store_dword v2, v3, s[0:3], 0 offen			; MUBUF-NEXT: buffer_store_dword v2, v3, s[0:3], 0 offen
	; GCN-NEXT: BB0_1: ; %loadstoreloop			; MUBUF-NEXT: BB0_1: ; %loadstoreloop
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; MUBUF-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: v_add_u32_e32 v3, s6, v1			; MUBUF-NEXT: v_add_u32_e32 v3, s6, v1
	; GCN-NEXT: s_add_i32 s6, s6, 1			; MUBUF-NEXT: s_add_i32 s6, s6, 1
	; GCN-NEXT: s_cmpk_lt_u32 s6, 0x2120			; MUBUF-NEXT: s_cmpk_lt_u32 s6, 0x2120
	; GCN-NEXT: buffer_store_byte v2, v3, s[0:3], 0 offen			; MUBUF-NEXT: buffer_store_byte v2, v3, s[0:3], 0 offen
	; GCN-NEXT: s_cbranch_scc1 BB0_1			; MUBUF-NEXT: s_cbranch_scc1 BB0_1
	; GCN-NEXT: ; %bb.2: ; %split			; MUBUF-NEXT: ; %bb.2: ; %split
	; GCN-NEXT: v_mov_b32_e32 v1, 0x3000			; MUBUF-NEXT: v_mov_b32_e32 v1, 0x3000
	; GCN-NEXT: v_add_u32_e32 v1, 0x20d0, v1			; MUBUF-NEXT: v_add_u32_e32 v1, 0x20d0, v1
	; GCN-NEXT: buffer_load_dword v2, v1, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v2, v1, s[0:3], 0 offen
	; GCN-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen offset:4
	; GCN-NEXT: buffer_load_dword v3, v0, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v3, v0, s[0:3], 0 offen
	; GCN-NEXT: buffer_load_dword v4, v0, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v4, v0, s[0:3], 0 offen offset:4
	; GCN-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; GCN-NEXT: v_add_co_u32_e32 v0, vcc, v2, v3			; MUBUF-NEXT: v_add_co_u32_e32 v0, vcc, v2, v3
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v2, s4			; MUBUF-NEXT: v_mov_b32_e32 v2, s4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v4, vcc			; MUBUF-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v4, vcc
	; GCN-NEXT: v_mov_b32_e32 v3, s5			; MUBUF-NEXT: v_mov_b32_e32 v3, s5
	; GCN-NEXT: global_store_dwordx2 v[2:3], v[0:1], off			; MUBUF-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
	; GCN-NEXT: s_endpgm			; MUBUF-NEXT: s_endpgm
				;
				; FLATSCR-LABEL: local_stack_offset_uses_sp:
				; FLATSCR: ; %bb.0: ; %entry
				; FLATSCR-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
				; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s6, s9
				; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
				; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
				; FLATSCR-NEXT: s_movk_i32 vcc_hi, 0x2000
				; FLATSCR-NEXT: s_mov_b32 s6, 0
				; FLATSCR-NEXT: scratch_store_dword off, v0, vcc_hi
				; FLATSCR-NEXT: BB0_1: ; %loadstoreloop
				; FLATSCR-NEXT: ; =>This Inner Loop Header: Depth=1
				; FLATSCR-NEXT: s_add_u32 s7, 0x3000, s6
				; FLATSCR-NEXT: s_add_i32 s6, s6, 1
				; FLATSCR-NEXT: s_cmpk_lt_u32 s6, 0x2120
				; FLATSCR-NEXT: scratch_store_byte off, v0, s7
				; FLATSCR-NEXT: s_cbranch_scc1 BB0_1
				; FLATSCR-NEXT: ; %bb.2: ; %split
				; FLATSCR-NEXT: s_movk_i32 s6, 0x20d0
				; FLATSCR-NEXT: s_add_u32 s6, 0x3000, s6
				; FLATSCR-NEXT: scratch_load_dword v1, off, s6 offset:4
				; FLATSCR-NEXT: s_movk_i32 s6, 0x2000
				; FLATSCR-NEXT: s_add_u32 s6, 0x3000, s6
				; FLATSCR-NEXT: scratch_load_dword v0, off, s6 offset:208
				; FLATSCR-NEXT: s_movk_i32 s6, 0x3000
				; FLATSCR-NEXT: scratch_load_dword v2, off, s6 offset:68
				; FLATSCR-NEXT: s_movk_i32 s6, 0x3000
				; FLATSCR-NEXT: scratch_load_dword v3, off, s6 offset:64
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: v_add_co_u32_e32 v0, vcc, v0, v3
				; FLATSCR-NEXT: v_addc_co_u32_e32 v1, vcc, v1, v2, vcc
				; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
				; FLATSCR-NEXT: v_mov_b32_e32 v2, s4
				; FLATSCR-NEXT: v_mov_b32_e32 v3, s5
				; FLATSCR-NEXT: global_store_dwordx2 v[2:3], v[0:1], off
				; FLATSCR-NEXT: s_endpgm
	entry:			entry:
	%pin.low = alloca i32, align 8192, addrspace(5)			%pin.low = alloca i32, align 8192, addrspace(5)
	%local.area = alloca [1060 x i64], align 4096, addrspace(5)			%local.area = alloca [1060 x i64], align 4096, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %pin.low			store volatile i32 0, i32 addrspace(5)* %pin.low
	%local.area.cast = bitcast [1060 x i64] addrspace(5)* %local.area to i8 addrspace(5)*			%local.area.cast = bitcast [1060 x i64] addrspace(5)* %local.area to i8 addrspace(5)*
	call void @llvm.memset.p5i8.i32(i8 addrspace(5)* align 4 %local.area.cast, i8 0, i32 8480, i1 true)			call void @llvm.memset.p5i8.i32(i8 addrspace(5)* align 4 %local.area.cast, i8 0, i32 8480, i1 true)
	%gep.large.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 1050			%gep.large.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 1050
	%gep.small.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 8			%gep.small.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 8
	%load0 = load volatile i64, i64 addrspace(5)* %gep.large.offset			%load0 = load volatile i64, i64 addrspace(5)* %gep.large.offset
	%load1 = load volatile i64, i64 addrspace(5)* %gep.small.offset			%load1 = load volatile i64, i64 addrspace(5)* %gep.small.offset
	%add0 = add i64 %load0, %load1			%add0 = add i64 %load0, %load1
	store volatile i64 %add0, i64 addrspace(1)* %out			store volatile i64 %add0, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	define void @func_local_stack_offset_uses_sp(i64 addrspace(1)* %out, i8 addrspace(1)* %in) {			define void @func_local_stack_offset_uses_sp(i64 addrspace(1)* %out, i8 addrspace(1)* %in) {
	; GCN-LABEL: func_local_stack_offset_uses_sp:			; MUBUF-LABEL: func_local_stack_offset_uses_sp:
	; GCN: ; %bb.0: ; %entry			; MUBUF: ; %bb.0: ; %entry
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_add_u32 s4, s32, 0x7ffc0			; MUBUF-NEXT: s_add_u32 s4, s32, 0x7ffc0
	; GCN-NEXT: s_mov_b32 s5, s33			; MUBUF-NEXT: s_mov_b32 s5, s33
	; GCN-NEXT: s_and_b32 s33, s4, 0xfff80000			; MUBUF-NEXT: s_and_b32 s33, s4, 0xfff80000
	; GCN-NEXT: v_lshrrev_b32_e64 v3, 6, s33			; MUBUF-NEXT: v_lshrrev_b32_e64 v3, 6, s33
	; GCN-NEXT: v_add_u32_e32 v3, 0x1000, v3			; MUBUF-NEXT: v_add_u32_e32 v3, 0x1000, v3
	; GCN-NEXT: v_mov_b32_e32 v4, 0			; MUBUF-NEXT: v_mov_b32_e32 v4, 0
	; GCN-NEXT: v_add_u32_e32 v2, 64, v3			; MUBUF-NEXT: v_add_u32_e32 v2, 64, v3
	; GCN-NEXT: s_mov_b32 s4, 0			; MUBUF-NEXT: s_mov_b32 s4, 0
	; GCN-NEXT: s_add_u32 s32, s32, 0x180000			; MUBUF-NEXT: s_add_u32 s32, s32, 0x180000
	; GCN-NEXT: buffer_store_dword v4, off, s[0:3], s33			; MUBUF-NEXT: buffer_store_dword v4, off, s[0:3], s33
	; GCN-NEXT: BB1_1: ; %loadstoreloop			; MUBUF-NEXT: BB1_1: ; %loadstoreloop
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; MUBUF-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: v_add_u32_e32 v5, s4, v3			; MUBUF-NEXT: v_add_u32_e32 v5, s4, v3
	; GCN-NEXT: s_add_i32 s4, s4, 1			; MUBUF-NEXT: s_add_i32 s4, s4, 1
	; GCN-NEXT: s_cmpk_lt_u32 s4, 0x2120			; MUBUF-NEXT: s_cmpk_lt_u32 s4, 0x2120
	; GCN-NEXT: buffer_store_byte v4, v5, s[0:3], 0 offen			; MUBUF-NEXT: buffer_store_byte v4, v5, s[0:3], 0 offen
	; GCN-NEXT: s_cbranch_scc1 BB1_1			; MUBUF-NEXT: s_cbranch_scc1 BB1_1
	; GCN-NEXT: ; %bb.2: ; %split			; MUBUF-NEXT: ; %bb.2: ; %split
	; GCN-NEXT: v_lshrrev_b32_e64 v3, 6, s33			; MUBUF-NEXT: v_lshrrev_b32_e64 v3, 6, s33
	; GCN-NEXT: v_add_u32_e32 v3, 0x1000, v3			; MUBUF-NEXT: v_add_u32_e32 v3, 0x1000, v3
	; GCN-NEXT: v_add_u32_e32 v3, 0x20d0, v3			; MUBUF-NEXT: v_add_u32_e32 v3, 0x20d0, v3
	; GCN-NEXT: buffer_load_dword v4, v3, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v4, v3, s[0:3], 0 offen
	; GCN-NEXT: buffer_load_dword v3, v3, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v3, v3, s[0:3], 0 offen offset:4
	; GCN-NEXT: buffer_load_dword v5, v2, s[0:3], 0 offen			; MUBUF-NEXT: buffer_load_dword v5, v2, s[0:3], 0 offen
	; GCN-NEXT: buffer_load_dword v6, v2, s[0:3], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v6, v2, s[0:3], 0 offen offset:4
	; GCN-NEXT: s_sub_u32 s32, s32, 0x180000			; MUBUF-NEXT: s_sub_u32 s32, s32, 0x180000
	; GCN-NEXT: s_mov_b32 s33, s5			; MUBUF-NEXT: s_mov_b32 s33, s5
	; GCN-NEXT: s_waitcnt vmcnt(1)			; MUBUF-NEXT: s_waitcnt vmcnt(1)
	; GCN-NEXT: v_add_co_u32_e32 v2, vcc, v4, v5			; MUBUF-NEXT: v_add_co_u32_e32 v2, vcc, v4, v5
	; GCN-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v6, vcc			; MUBUF-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v6, vcc
	; GCN-NEXT: global_store_dwordx2 v[0:1], v[2:3], off			; MUBUF-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
	; GCN-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
				;
				; FLATSCR-LABEL: func_local_stack_offset_uses_sp:
				; FLATSCR: ; %bb.0: ; %entry
				; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; FLATSCR-NEXT: s_add_u32 s4, s32, 0x1fff
				; FLATSCR-NEXT: s_mov_b32 s6, s33
				; FLATSCR-NEXT: s_and_b32 s33, s4, 0xffffe000
				; FLATSCR-NEXT: v_mov_b32_e32 v2, 0
				; FLATSCR-NEXT: s_mov_b32 s4, 0
				; FLATSCR-NEXT: s_add_u32 s32, s32, 0x6000
				; FLATSCR-NEXT: scratch_store_dword off, v2, s33
				; FLATSCR-NEXT: BB1_1: ; %loadstoreloop
				; FLATSCR-NEXT: ; =>This Inner Loop Header: Depth=1
				; FLATSCR-NEXT: s_add_u32 vcc_hi, s33, 0x1000
				; FLATSCR-NEXT: s_add_u32 s5, vcc_hi, s4
				; FLATSCR-NEXT: s_add_i32 s4, s4, 1
				; FLATSCR-NEXT: s_cmpk_lt_u32 s4, 0x2120
				; FLATSCR-NEXT: scratch_store_byte off, v2, s5
				; FLATSCR-NEXT: s_cbranch_scc1 BB1_1
				; FLATSCR-NEXT: ; %bb.2: ; %split
				; FLATSCR-NEXT: s_movk_i32 s4, 0x20d0
				; FLATSCR-NEXT: s_add_u32 s5, s33, 0x1000
				; FLATSCR-NEXT: s_add_u32 s4, s5, s4
				; FLATSCR-NEXT: scratch_load_dword v3, off, s4 offset:4
				; FLATSCR-NEXT: s_movk_i32 s4, 0x2000
				; FLATSCR-NEXT: s_add_u32 s5, s33, 0x1000
				; FLATSCR-NEXT: s_add_u32 s4, s5, s4
				; FLATSCR-NEXT: scratch_load_dword v2, off, s4 offset:208
				; FLATSCR-NEXT: s_add_u32 s4, s33, 0x1000
				; FLATSCR-NEXT: scratch_load_dword v4, off, s4 offset:68
				; FLATSCR-NEXT: s_add_u32 s4, s33, 0x1000
				; FLATSCR-NEXT: scratch_load_dword v5, off, s4 offset:64
				; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x6000
				; FLATSCR-NEXT: s_mov_b32 s33, s6
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: v_add_co_u32_e32 v2, vcc, v2, v5
				; FLATSCR-NEXT: v_addc_co_u32_e32 v3, vcc, v3, v4, vcc
				; FLATSCR-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	%pin.low = alloca i32, align 8192, addrspace(5)			%pin.low = alloca i32, align 8192, addrspace(5)
	%local.area = alloca [1060 x i64], align 4096, addrspace(5)			%local.area = alloca [1060 x i64], align 4096, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %pin.low			store volatile i32 0, i32 addrspace(5)* %pin.low
	%local.area.cast = bitcast [1060 x i64] addrspace(5)* %local.area to i8 addrspace(5)*			%local.area.cast = bitcast [1060 x i64] addrspace(5)* %local.area to i8 addrspace(5)*
	call void @llvm.memset.p5i8.i32(i8 addrspace(5)* align 4 %local.area.cast, i8 0, i32 8480, i1 true)			call void @llvm.memset.p5i8.i32(i8 addrspace(5)* align 4 %local.area.cast, i8 0, i32 8480, i1 true)
	%gep.large.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 1050			%gep.large.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 1050
	%gep.small.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 8			%gep.small.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 8
	Show All 10 Lines

llvm/test/CodeGen/AMDGPU/memcpy-fixed-align.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck %s -check-prefix=MUBUF
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-flat-scratch < %s \| FileCheck %s -check-prefix=FLATSCR

	; Make sure there's no assertion from passing a 0 alignment value			; Make sure there's no assertion from passing a 0 alignment value
	define void @memcpy_fixed_align(i8 addrspace(5)* %dst, i8 addrspace(1)* %src) {			define void @memcpy_fixed_align(i8 addrspace(5)* %dst, i8 addrspace(1)* %src) {
	; CHECK-LABEL: memcpy_fixed_align:			; MUBUF-LABEL: memcpy_fixed_align:
	; CHECK: ; %bb.0:			; MUBUF: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: global_load_dword v0, v[1:2], off offset:36			; MUBUF-NEXT: global_load_dword v0, v[1:2], off offset:36
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:36			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:36
	; CHECK-NEXT: global_load_dword v0, v[1:2], off offset:32			; MUBUF-NEXT: global_load_dword v0, v[1:2], off offset:32
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32 offset:32
	; CHECK-NEXT: global_load_dwordx4 v[3:6], v[1:2], off offset:16			; MUBUF-NEXT: global_load_dwordx4 v[3:6], v[1:2], off offset:16
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:28			; MUBUF-NEXT: buffer_store_dword v6, off, s[0:3], s32 offset:28
	; CHECK-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:24			; MUBUF-NEXT: buffer_store_dword v5, off, s[0:3], s32 offset:24
	; CHECK-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20			; MUBUF-NEXT: buffer_store_dword v4, off, s[0:3], s32 offset:20
	; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16			; MUBUF-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:16
	; CHECK-NEXT: global_load_dwordx4 v[0:3], v[1:2], off			; MUBUF-NEXT: global_load_dwordx4 v[0:3], v[1:2], off
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12			; MUBUF-NEXT: buffer_store_dword v3, off, s[0:3], s32 offset:12
	; CHECK-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8			; MUBUF-NEXT: buffer_store_dword v2, off, s[0:3], s32 offset:8
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4			; MUBUF-NEXT: buffer_store_dword v1, off, s[0:3], s32 offset:4
	; CHECK-NEXT: buffer_store_dword v0, off, s[0:3], s32			; MUBUF-NEXT: buffer_store_dword v0, off, s[0:3], s32
	; CHECK-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; MUBUF-NEXT: s_setpc_b64 s[30:31]
				;
				; FLATSCR-LABEL: memcpy_fixed_align:
				; FLATSCR: ; %bb.0:
				; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; FLATSCR-NEXT: global_load_dword v0, v[1:2], off offset:36
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:36
				; FLATSCR-NEXT: global_load_dword v0, v[1:2], off offset:32
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: scratch_store_dword off, v0, s32 offset:32
				; FLATSCR-NEXT: global_load_dwordx4 v[3:6], v[1:2], off offset:16
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: scratch_store_dword off, v6, s32 offset:28
				; FLATSCR-NEXT: scratch_store_dword off, v5, s32 offset:24
				; FLATSCR-NEXT: scratch_store_dword off, v4, s32 offset:20
				; FLATSCR-NEXT: scratch_store_dword off, v3, s32 offset:16
				; FLATSCR-NEXT: global_load_dwordx4 v[0:3], v[1:2], off
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: scratch_store_dword off, v3, s32 offset:12
				; FLATSCR-NEXT: scratch_store_dword off, v2, s32 offset:8
				; FLATSCR-NEXT: scratch_store_dword off, v1, s32 offset:4
				; FLATSCR-NEXT: scratch_store_dword off, v0, s32
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: s_setpc_b64 s[30:31]
	%alloca = alloca [40 x i8], addrspace(5)			%alloca = alloca [40 x i8], addrspace(5)
	%cast = bitcast [40 x i8] addrspace(5)* %alloca to i8 addrspace(5)*			%cast = bitcast [40 x i8] addrspace(5)* %alloca to i8 addrspace(5)*
	call void @llvm.memcpy.p5i8.p1i8.i64(i8 addrspace(5)* align 4 dereferenceable(40) %cast, i8 addrspace(1)* align 4 dereferenceable(40) %src, i64 40, i1 false)			call void @llvm.memcpy.p5i8.p1i8.i64(i8 addrspace(5)* align 4 dereferenceable(40) %cast, i8 addrspace(1)* align 4 dereferenceable(40) %src, i64 40, i1 false)
	ret void			ret void
	}			}

	declare void @llvm.memcpy.p5i8.p1i8.i64(i8 addrspace(5)* noalias nocapture writeonly, i8 addrspace(1)* noalias nocapture readonly, i64, i1 immarg) #0			declare void @llvm.memcpy.p5i8.p1i8.i64(i8 addrspace(5)* noalias nocapture writeonly, i8 addrspace(1)* noalias nocapture readonly, i64, i1 immarg) #0

	attributes #0 = { argmemonly nounwind willreturn }			attributes #0 = { argmemonly nounwind willreturn }

llvm/test/CodeGen/AMDGPU/multi-dword-vgpr-spill.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -stress-regalloc=8 < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -stress-regalloc=8 < %s \| FileCheck %s -check-prefixes=GCN,MUBUF
				; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -stress-regalloc=8 -amdgpu-enable-flat-scratch < %s \| FileCheck %s -check-prefixes=GCN,FLATSCR

	; CHECK-LABEL: spill_v2i32:			; GCN-LABEL: spill_v2i32:
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:16 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:16 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:20 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:20 ; 4-byte Folded Spill
	; CHECK: ;;#ASMSTART			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:16 ; 4-byte Folded Spill
	; CHECK-NEXT: ;;#ASMEND			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:20 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:16 ; 4-byte Folded Reload			; GCN: ;;#ASMSTART
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:20 ; 4-byte Folded Reload			; GCN-NEXT: ;;#ASMEND
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:16 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:20 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:16 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:20 ; 4-byte Folded Reload

	define void @spill_v2i32() {			define void @spill_v2i32() {
	entry:			entry:
	%alloca = alloca <2 x i32>, i32 2, align 4, addrspace(5)			%alloca = alloca <2 x i32>, i32 2, align 4, addrspace(5)

	%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %alloca, i32 1			%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %alloca, i32 1
	%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr			%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr

	; Force %a to spill.			; Force %a to spill.
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

	%outptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %alloca, i32 1			%outptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %alloca, i32 1
	store volatile <2 x i32> %a, <2 x i32> addrspace(5)* %outptr			store volatile <2 x i32> %a, <2 x i32> addrspace(5)* %outptr

	ret void			ret void
	}			}

	; CHECK-LABEL: spill_v2f32:			; GCN-LABEL: spill_v2f32:
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:16 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:16 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:20 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:20 ; 4-byte Folded Spill
	; CHECK: ;;#ASMSTART			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:16 ; 4-byte Folded Spill
	; CHECK-NEXT: ;;#ASMEND			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:20 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:16 ; 4-byte Folded Reload			; GCN: ;;#ASMSTART
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:20 ; 4-byte Folded Reload			; GCN-NEXT: ;;#ASMEND
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:16 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:20 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:16 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:20 ; 4-byte Folded Reload

	define void @spill_v2f32() {			define void @spill_v2f32() {
	entry:			entry:
	%alloca = alloca <2 x i32>, i32 2, align 4, addrspace(5)			%alloca = alloca <2 x i32>, i32 2, align 4, addrspace(5)

	%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %alloca, i32 1			%aptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %alloca, i32 1
	%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr			%a = load volatile <2 x i32>, <2 x i32> addrspace(5)* %aptr

	; Force %a to spill.			; Force %a to spill.
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

	%outptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %alloca, i32 1			%outptr = getelementptr <2 x i32>, <2 x i32> addrspace(5)* %alloca, i32 1
	store volatile <2 x i32> %a, <2 x i32> addrspace(5)* %outptr			store volatile <2 x i32> %a, <2 x i32> addrspace(5)* %outptr

	ret void			ret void
	}			}

	; CHECK-LABEL: spill_v3i32:			; GCN-LABEL: spill_v3i32:
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:32 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:32 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:36 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:36 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:40 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:40 ; 4-byte Folded Spill
	; CHECK: ;;#ASMSTART			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:32 ; 4-byte Folded Spill
	; CHECK-NEXT: ;;#ASMEND			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:36 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:40 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload			; GCN: ;;#ASMSTART
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload			; GCN-NEXT: ;;#ASMEND
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload

	define void @spill_v3i32() {			define void @spill_v3i32() {
	entry:			entry:
	%alloca = alloca <3 x i32>, i32 2, align 4, addrspace(5)			%alloca = alloca <3 x i32>, i32 2, align 4, addrspace(5)

	%aptr = getelementptr <3 x i32>, <3 x i32> addrspace(5)* %alloca, i32 1			%aptr = getelementptr <3 x i32>, <3 x i32> addrspace(5)* %alloca, i32 1
	%a = load volatile <3 x i32>, <3 x i32> addrspace(5)* %aptr			%a = load volatile <3 x i32>, <3 x i32> addrspace(5)* %aptr

	; Force %a to spill.			; Force %a to spill.
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

	%outptr = getelementptr <3 x i32>, <3 x i32> addrspace(5)* %alloca, i32 1			%outptr = getelementptr <3 x i32>, <3 x i32> addrspace(5)* %alloca, i32 1
	store volatile <3 x i32> %a, <3 x i32> addrspace(5)* %outptr			store volatile <3 x i32> %a, <3 x i32> addrspace(5)* %outptr

	ret void			ret void
	}			}

	; CHECK-LABEL: spill_v3f32:			; GCN-LABEL: spill_v3f32:
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:32 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:32 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:36 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:36 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:40 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:40 ; 4-byte Folded Spill
	; CHECK: ;;#ASMSTART			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:32 ; 4-byte Folded Spill
	; CHECK-NEXT: ;;#ASMEND			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:36 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:40 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload			; GCN: ;;#ASMSTART
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload			; GCN-NEXT: ;;#ASMEND
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload

	define void @spill_v3f32() {			define void @spill_v3f32() {
	entry:			entry:
	%alloca = alloca <3 x i32>, i32 2, align 4, addrspace(5)			%alloca = alloca <3 x i32>, i32 2, align 4, addrspace(5)

	%aptr = getelementptr <3 x i32>, <3 x i32> addrspace(5)* %alloca, i32 1			%aptr = getelementptr <3 x i32>, <3 x i32> addrspace(5)* %alloca, i32 1
	%a = load volatile <3 x i32>, <3 x i32> addrspace(5)* %aptr			%a = load volatile <3 x i32>, <3 x i32> addrspace(5)* %aptr

	; Force %a to spill.			; Force %a to spill.
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

	%outptr = getelementptr <3 x i32>, <3 x i32> addrspace(5)* %alloca, i32 1			%outptr = getelementptr <3 x i32>, <3 x i32> addrspace(5)* %alloca, i32 1
	store volatile <3 x i32> %a, <3 x i32> addrspace(5)* %outptr			store volatile <3 x i32> %a, <3 x i32> addrspace(5)* %outptr

	ret void			ret void
	}			}

	; CHECK-LABEL: spill_v4i32:			; GCN-LABEL: spill_v4i32:
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:32 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:32 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:36 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:36 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:40 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:40 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:44 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:44 ; 4-byte Folded Spill
	; CHECK: ;;#ASMSTART			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:32 ; 4-byte Folded Spill
	; CHECK-NEXT: ;;#ASMEND			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:36 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:40 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:44 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload			; GCN: ;;#ASMSTART
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:44 ; 4-byte Folded Reload			; GCN-NEXT: ;;#ASMEND
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:44 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:44 ; 4-byte Folded Reload

	define void @spill_v4i32() {			define void @spill_v4i32() {
	entry:			entry:
	%alloca = alloca <4 x i32>, i32 2, align 4, addrspace(5)			%alloca = alloca <4 x i32>, i32 2, align 4, addrspace(5)

	%aptr = getelementptr <4 x i32>, <4 x i32> addrspace(5)* %alloca, i32 1			%aptr = getelementptr <4 x i32>, <4 x i32> addrspace(5)* %alloca, i32 1
	%a = load volatile <4 x i32>, <4 x i32> addrspace(5)* %aptr			%a = load volatile <4 x i32>, <4 x i32> addrspace(5)* %aptr

	; Force %a to spill.			; Force %a to spill.
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

	%outptr = getelementptr <4 x i32>, <4 x i32> addrspace(5)* %alloca, i32 1			%outptr = getelementptr <4 x i32>, <4 x i32> addrspace(5)* %alloca, i32 1
	store volatile <4 x i32> %a, <4 x i32> addrspace(5)* %outptr			store volatile <4 x i32> %a, <4 x i32> addrspace(5)* %outptr

	ret void			ret void
	}			}

	; CHECK-LABEL: spill_v4f32:			; GCN-LABEL: spill_v4f32:
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:32 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:32 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:36 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:36 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:40 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:40 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:44 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:44 ; 4-byte Folded Spill
	; CHECK: ;;#ASMSTART			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:32 ; 4-byte Folded Spill
	; CHECK-NEXT: ;;#ASMEND			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:36 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:40 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:44 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload			; GCN: ;;#ASMSTART
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:44 ; 4-byte Folded Reload			; GCN-NEXT: ;;#ASMEND
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:44 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:32 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:36 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:40 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:44 ; 4-byte Folded Reload

	define void @spill_v4f32() {			define void @spill_v4f32() {
	entry:			entry:
	%alloca = alloca <4 x i32>, i32 2, align 4, addrspace(5)			%alloca = alloca <4 x i32>, i32 2, align 4, addrspace(5)

	%aptr = getelementptr <4 x i32>, <4 x i32> addrspace(5)* %alloca, i32 1			%aptr = getelementptr <4 x i32>, <4 x i32> addrspace(5)* %alloca, i32 1
	%a = load volatile <4 x i32>, <4 x i32> addrspace(5)* %aptr			%a = load volatile <4 x i32>, <4 x i32> addrspace(5)* %aptr

	; Force %a to spill.			; Force %a to spill.
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

	%outptr = getelementptr <4 x i32>, <4 x i32> addrspace(5)* %alloca, i32 1			%outptr = getelementptr <4 x i32>, <4 x i32> addrspace(5)* %alloca, i32 1
	store volatile <4 x i32> %a, <4 x i32> addrspace(5)* %outptr			store volatile <4 x i32> %a, <4 x i32> addrspace(5)* %outptr

	ret void			ret void
	}			}

	; CHECK-LABEL: spill_v5i32:			; GCN-LABEL: spill_v5i32:
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:64 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:64 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:68 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:68 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:72 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:72 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:76 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:76 ; 4-byte Folded Spill
	; CHECK: ;;#ASMSTART			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:64 ; 4-byte Folded Spill
	; CHECK-NEXT: ;;#ASMEND			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:68 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:64 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:72 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:68 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:76 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:72 ; 4-byte Folded Reload			; GCN: ;;#ASMSTART
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:76 ; 4-byte Folded Reload			; GCN-NEXT: ;;#ASMEND
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:64 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:68 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:72 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:76 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:64 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:68 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:72 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:76 ; 4-byte Folded Reload
	define void @spill_v5i32() {			define void @spill_v5i32() {
	entry:			entry:
	%alloca = alloca <5 x i32>, i32 2, align 4, addrspace(5)			%alloca = alloca <5 x i32>, i32 2, align 4, addrspace(5)

	%aptr = getelementptr <5 x i32>, <5 x i32> addrspace(5)* %alloca, i32 1			%aptr = getelementptr <5 x i32>, <5 x i32> addrspace(5)* %alloca, i32 1
	%a = load volatile <5 x i32>, <5 x i32> addrspace(5)* %aptr			%a = load volatile <5 x i32>, <5 x i32> addrspace(5)* %aptr

	; Force %a to spill.			; Force %a to spill.
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

	%outptr = getelementptr <5 x i32>, <5 x i32> addrspace(5)* %alloca, i32 1			%outptr = getelementptr <5 x i32>, <5 x i32> addrspace(5)* %alloca, i32 1
	store volatile <5 x i32> %a, <5 x i32> addrspace(5)* %outptr			store volatile <5 x i32> %a, <5 x i32> addrspace(5)* %outptr

	ret void			ret void
	}			}

	; CHECK-LABEL: spill_v5f32:			; GCN-LABEL: spill_v5f32:
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:64 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:64 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:68 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:68 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:72 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:72 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_store_dword v{{.*}} offset:76 ; 4-byte Folded Spill			; MUBUF-DAG: buffer_store_dword v{{.*}} offset:76 ; 4-byte Folded Spill
	; CHECK: ;;#ASMSTART			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:64 ; 4-byte Folded Spill
	; CHECK-NEXT: ;;#ASMEND			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:68 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:64 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:72 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:68 ; 4-byte Folded Reload			; FLATSCR-DAG: scratch_store_dword off, v{{.*}} offset:76 ; 4-byte Folded Spill
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:72 ; 4-byte Folded Reload			; GCN: ;;#ASMSTART
	; CHECK-DAG: buffer_load_dword v{{.*}} offset:76 ; 4-byte Folded Reload			; GCN-NEXT: ;;#ASMEND
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:64 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:68 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:72 ; 4-byte Folded Reload
				; MUBUF-DAG: buffer_load_dword v{{.*}} offset:76 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:64 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:68 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:72 ; 4-byte Folded Reload
				; FLATSCR-DAG: scratch_load_dword v{{.*}} offset:76 ; 4-byte Folded Reload
	define void @spill_v5f32() {			define void @spill_v5f32() {
	entry:			entry:
	%alloca = alloca <5 x i32>, i32 2, align 4, addrspace(5)			%alloca = alloca <5 x i32>, i32 2, align 4, addrspace(5)

	%aptr = getelementptr <5 x i32>, <5 x i32> addrspace(5)* %alloca, i32 1			%aptr = getelementptr <5 x i32>, <5 x i32> addrspace(5)* %alloca, i32 1
	%a = load volatile <5 x i32>, <5 x i32> addrspace(5)* %aptr			%a = load volatile <5 x i32>, <5 x i32> addrspace(5)* %aptr

	; Force %a to spill.			; Force %a to spill.
	call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()			call void asm sideeffect "", "~{v0},~{v1},~{v2},~{v3},~{v4},~{v5},~{v6},~{v7}" ()

	%outptr = getelementptr <5 x i32>, <5 x i32> addrspace(5)* %alloca, i32 1			%outptr = getelementptr <5 x i32>, <5 x i32> addrspace(5)* %alloca, i32 1
	store volatile <5 x i32> %a, <5 x i32> addrspace(5)* %outptr			store volatile <5 x i32> %a, <5 x i32> addrspace(5)* %outptr

	ret void			ret void
	}			}

llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,DEFAULTSIZE %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,DEFAULTSIZE,MUBUF %s
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -amdgpu-assume-dynamic-stack-object-size=1024 < %s \| FileCheck -check-prefixes=GCN,ASSUME1024 %s		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -amdgpu-assume-dynamic-stack-object-size=1024 < %s \| FileCheck -check-prefixes=GCN,ASSUME1024,MUBUF %s
		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=GCN,DEFAULTSIZE,FLATSCR %s
		; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-flat-scratch -amdgpu-assume-dynamic-stack-object-size=1024 < %s \| FileCheck -check-prefixes=GCN,ASSUME1024,FLATSCR %s

; FIXME: Generated test checks do not check metadata at the end of the		; FIXME: Generated test checks do not check metadata at the end of the
; function, so this also includes manually added checks.		; function, so this also includes manually added checks.

; Test that we can select a statically sized alloca outside of the		; Test that we can select a statically sized alloca outside of the
; entry block.		; entry block.

; FIXME: FunctionLoweringInfo unhelpfully doesn't preserve an		; FIXME: FunctionLoweringInfo unhelpfully doesn't preserve an
; alignment less than the stack alignment.		; alignment less than the stack alignment.
define amdgpu_kernel void @kernel_non_entry_block_static_alloca_uniformly_reached_align4(i32 addrspace(1)* %out, i32 %arg.cond0, i32 %arg.cond1, i32 %in) {		define amdgpu_kernel void @kernel_non_entry_block_static_alloca_uniformly_reached_align4(i32 addrspace(1)* %out, i32 %arg.cond0, i32 %arg.cond1, i32 %in) {
; GCN-LABEL: kernel_non_entry_block_static_alloca_uniformly_reached_align4:		; MUBUF-LABEL: kernel_non_entry_block_static_alloca_uniformly_reached_align4:
; GCN: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9		; MUBUF-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; MUBUF-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_add_u32 s0, s0, s9		; MUBUF-NEXT: s_add_u32 s0, s0, s9
; GCN-NEXT: s_load_dwordx4 s[8:11], s[4:5], 0x8		; MUBUF-NEXT: s_load_dwordx4 s[8:11], s[4:5], 0x8
; GCN-NEXT: s_addc_u32 s1, s1, 0		; MUBUF-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_movk_i32 s32, 0x400		; MUBUF-NEXT: s_movk_i32 s32, 0x400
; GCN-NEXT: s_mov_b32 s33, 0		; MUBUF-NEXT: s_mov_b32 s33, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s8, 0		; MUBUF-NEXT: s_cmp_lg_u32 s8, 0
; GCN-NEXT: s_cbranch_scc1 BB0_3		; MUBUF-NEXT: s_cbranch_scc1 BB0_3
; GCN-NEXT: ; %bb.1: ; %bb.0		; MUBUF-NEXT: ; %bb.1: ; %bb.0
; GCN-NEXT: s_cmp_lg_u32 s9, 0		; MUBUF-NEXT: s_cmp_lg_u32 s9, 0
; GCN-NEXT: s_cbranch_scc1 BB0_3		; MUBUF-NEXT: s_cbranch_scc1 BB0_3
; GCN-NEXT: ; %bb.2: ; %bb.1		; MUBUF-NEXT: ; %bb.2: ; %bb.1
; GCN-NEXT: s_add_i32 s6, s32, 0x1000		; MUBUF-NEXT: s_add_i32 s6, s32, 0x1000
; GCN-NEXT: v_mov_b32_e32 v1, 0		; MUBUF-NEXT: v_mov_b32_e32 v1, 0
; GCN-NEXT: v_mov_b32_e32 v2, s6		; MUBUF-NEXT: v_mov_b32_e32 v2, s6
; GCN-NEXT: s_lshl_b32 s7, s10, 2		; MUBUF-NEXT: s_lshl_b32 s7, s10, 2
; GCN-NEXT: s_mov_b32 s32, s6		; MUBUF-NEXT: s_mov_b32 s32, s6
; GCN-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen		; MUBUF-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen
; GCN-NEXT: v_mov_b32_e32 v1, 1		; MUBUF-NEXT: v_mov_b32_e32 v1, 1
; GCN-NEXT: s_add_i32 s6, s6, s7		; MUBUF-NEXT: s_add_i32 s6, s6, s7
; GCN-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen offset:4		; MUBUF-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen offset:4
; GCN-NEXT: v_mov_b32_e32 v1, s6		; MUBUF-NEXT: v_mov_b32_e32 v1, s6
; GCN-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen		; MUBUF-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen
; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; MUBUF-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GCN-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_add_u32_e32 v2, v1, v0		; MUBUF-NEXT: v_add_u32_e32 v2, v1, v0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, s4		; MUBUF-NEXT: v_mov_b32_e32 v0, s4
; GCN-NEXT: v_mov_b32_e32 v1, s5		; MUBUF-NEXT: v_mov_b32_e32 v1, s5
; GCN-NEXT: global_store_dword v[0:1], v2, off		; MUBUF-NEXT: global_store_dword v[0:1], v2, off
; GCN-NEXT: BB0_3: ; %bb.2		; MUBUF-NEXT: BB0_3: ; %bb.2
; GCN-NEXT: v_mov_b32_e32 v0, 0		; MUBUF-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: global_store_dword v[0:1], v0, off		; MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GCN-NEXT: s_endpgm		; MUBUF-NEXT: s_endpgm
		;
		; FLATSCR-LABEL: kernel_non_entry_block_static_alloca_uniformly_reached_align4:
		; FLATSCR: ; %bb.0: ; %entry
		; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s6, s9
		; FLATSCR-NEXT: s_load_dwordx4 s[8:11], s[4:5], 0x8
		; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
		; FLATSCR-NEXT: s_mov_b32 s32, 16
		; FLATSCR-NEXT: s_mov_b32 s33, 0
		; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
		; FLATSCR-NEXT: s_cmp_lg_u32 s8, 0
		; FLATSCR-NEXT: s_cbranch_scc1 BB0_3
		; FLATSCR-NEXT: ; %bb.1: ; %bb.0
		; FLATSCR-NEXT: s_cmp_lg_u32 s9, 0
		; FLATSCR-NEXT: s_cbranch_scc1 BB0_3
		; FLATSCR-NEXT: ; %bb.2: ; %bb.1
		; FLATSCR-NEXT: s_mov_b32 s6, s32
		; FLATSCR-NEXT: s_movk_i32 s7, 0x1000
		; FLATSCR-NEXT: s_add_i32 s8, s6, s7
		; FLATSCR-NEXT: s_add_u32 s6, s6, s7
		; FLATSCR-NEXT: v_mov_b32_e32 v1, 0
		; FLATSCR-NEXT: scratch_store_dword off, v1, s6
		; FLATSCR-NEXT: v_mov_b32_e32 v1, 1
		; FLATSCR-NEXT: s_lshl_b32 s6, s10, 2
		; FLATSCR-NEXT: s_mov_b32 s32, s8
		; FLATSCR-NEXT: scratch_store_dword off, v1, s8 offset:4
		; FLATSCR-NEXT: s_add_i32 s8, s8, s6
		; FLATSCR-NEXT: scratch_load_dword v1, off, s8
		; FLATSCR-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: v_add_u32_e32 v2, v1, v0
		; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
		; FLATSCR-NEXT: v_mov_b32_e32 v0, s4
		; FLATSCR-NEXT: v_mov_b32_e32 v1, s5
		; FLATSCR-NEXT: global_store_dword v[0:1], v2, off
		; FLATSCR-NEXT: BB0_3: ; %bb.2
		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
		; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; FLATSCR-NEXT: s_endpgm

entry:		entry:
%cond0 = icmp eq i32 %arg.cond0, 0		%cond0 = icmp eq i32 %arg.cond0, 0
br i1 %cond0, label %bb.0, label %bb.2		br i1 %cond0, label %bb.0, label %bb.2

bb.0:		bb.0:
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
%gep0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0		%gep0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0
Show All 18 Lines
}		}
; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 4112		; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 4112
; DEFAULTSIZE: ; ScratchSize: 4112		; DEFAULTSIZE: ; ScratchSize: 4112

; ASSUME1024: .amdhsa_private_segment_fixed_size 1040		; ASSUME1024: .amdhsa_private_segment_fixed_size 1040
; ASSUME1024: ; ScratchSize: 1040		; ASSUME1024: ; ScratchSize: 1040

define amdgpu_kernel void @kernel_non_entry_block_static_alloca_uniformly_reached_align64(i32 addrspace(1)* %out, i32 %arg.cond, i32 %in) {		define amdgpu_kernel void @kernel_non_entry_block_static_alloca_uniformly_reached_align64(i32 addrspace(1)* %out, i32 %arg.cond, i32 %in) {
; GCN-LABEL: kernel_non_entry_block_static_alloca_uniformly_reached_align64:		; MUBUF-LABEL: kernel_non_entry_block_static_alloca_uniformly_reached_align64:
; GCN: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; GCN-NEXT: s_add_u32 flat_scratch_lo, s6, s9		; MUBUF-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; GCN-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; MUBUF-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; GCN-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x8		; MUBUF-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x8
; GCN-NEXT: s_add_u32 s0, s0, s9		; MUBUF-NEXT: s_add_u32 s0, s0, s9
; GCN-NEXT: s_addc_u32 s1, s1, 0		; MUBUF-NEXT: s_addc_u32 s1, s1, 0
; GCN-NEXT: s_movk_i32 s32, 0x1000		; MUBUF-NEXT: s_movk_i32 s32, 0x1000
; GCN-NEXT: s_mov_b32 s33, 0		; MUBUF-NEXT: s_mov_b32 s33, 0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: s_cmp_lg_u32 s6, 0		; MUBUF-NEXT: s_cmp_lg_u32 s6, 0
; GCN-NEXT: s_cbranch_scc1 BB1_2		; MUBUF-NEXT: s_cbranch_scc1 BB1_2
; GCN-NEXT: ; %bb.1: ; %bb.0		; MUBUF-NEXT: ; %bb.1: ; %bb.0
; GCN-NEXT: s_add_i32 s6, s32, 0x1000		; MUBUF-NEXT: s_add_i32 s6, s32, 0x1000
; GCN-NEXT: s_and_b32 s6, s6, 0xfffff000		; MUBUF-NEXT: s_and_b32 s6, s6, 0xfffff000
; GCN-NEXT: v_mov_b32_e32 v1, 0		; MUBUF-NEXT: v_mov_b32_e32 v1, 0
; GCN-NEXT: v_mov_b32_e32 v2, s6		; MUBUF-NEXT: v_mov_b32_e32 v2, s6
; GCN-NEXT: s_lshl_b32 s7, s7, 2		; MUBUF-NEXT: s_lshl_b32 s7, s7, 2
; GCN-NEXT: s_mov_b32 s32, s6		; MUBUF-NEXT: s_mov_b32 s32, s6
; GCN-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen		; MUBUF-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen
; GCN-NEXT: v_mov_b32_e32 v1, 1		; MUBUF-NEXT: v_mov_b32_e32 v1, 1
; GCN-NEXT: s_add_i32 s6, s6, s7		; MUBUF-NEXT: s_add_i32 s6, s6, s7
; GCN-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen offset:4		; MUBUF-NEXT: buffer_store_dword v1, v2, s[0:3], 0 offen offset:4
; GCN-NEXT: v_mov_b32_e32 v1, s6		; MUBUF-NEXT: v_mov_b32_e32 v1, s6
; GCN-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen		; MUBUF-NEXT: buffer_load_dword v1, v1, s[0:3], 0 offen
; GCN-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; MUBUF-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; GCN-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_add_u32_e32 v2, v1, v0		; MUBUF-NEXT: v_add_u32_e32 v2, v1, v0
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_mov_b32_e32 v0, s4		; MUBUF-NEXT: v_mov_b32_e32 v0, s4
; GCN-NEXT: v_mov_b32_e32 v1, s5		; MUBUF-NEXT: v_mov_b32_e32 v1, s5
; GCN-NEXT: global_store_dword v[0:1], v2, off		; MUBUF-NEXT: global_store_dword v[0:1], v2, off
; GCN-NEXT: BB1_2: ; %bb.1		; MUBUF-NEXT: BB1_2: ; %bb.1
; GCN-NEXT: v_mov_b32_e32 v0, 0		; MUBUF-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: global_store_dword v[0:1], v0, off		; MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GCN-NEXT: s_endpgm		; MUBUF-NEXT: s_endpgm
		;
		; FLATSCR-LABEL: kernel_non_entry_block_static_alloca_uniformly_reached_align64:
		; FLATSCR: ; %bb.0: ; %entry
		; FLATSCR-NEXT: s_add_u32 flat_scratch_lo, s6, s9
		; FLATSCR-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
		; FLATSCR-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x8
		; FLATSCR-NEXT: s_mov_b32 s32, 64
		; FLATSCR-NEXT: s_mov_b32 s33, 0
		; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
		; FLATSCR-NEXT: s_cmp_lg_u32 s6, 0
		; FLATSCR-NEXT: s_cbranch_scc1 BB1_2
		; FLATSCR-NEXT: ; %bb.1: ; %bb.0
		; FLATSCR-NEXT: s_add_i32 s6, s32, 0x1000
		; FLATSCR-NEXT: s_and_b32 s6, s6, 0xfffff000
		; FLATSCR-NEXT: v_mov_b32_e32 v1, 0
		; FLATSCR-NEXT: scratch_store_dword off, v1, s6
		; FLATSCR-NEXT: v_mov_b32_e32 v1, 1
		; FLATSCR-NEXT: s_lshl_b32 s7, s7, 2
		; FLATSCR-NEXT: s_mov_b32 s32, s6
		; FLATSCR-NEXT: scratch_store_dword off, v1, s6 offset:4
		; FLATSCR-NEXT: s_add_i32 s6, s6, s7
		; FLATSCR-NEXT: scratch_load_dword v1, off, s6
		; FLATSCR-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: v_add_u32_e32 v2, v1, v0
		; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
		; FLATSCR-NEXT: v_mov_b32_e32 v0, s4
		; FLATSCR-NEXT: v_mov_b32_e32 v1, s5
		; FLATSCR-NEXT: global_store_dword v[0:1], v2, off
		; FLATSCR-NEXT: BB1_2: ; %bb.1
		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
		; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; FLATSCR-NEXT: s_endpgm
entry:		entry:
%cond = icmp eq i32 %arg.cond, 0		%cond = icmp eq i32 %arg.cond, 0
br i1 %cond, label %bb.0, label %bb.1		br i1 %cond, label %bb.0, label %bb.1

bb.0:		bb.0:
%alloca = alloca [16 x i32], align 64, addrspace(5)		%alloca = alloca [16 x i32], align 64, addrspace(5)
%gep0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0		%gep0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0
%gep1 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 1		%gep1 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 1
Show All 14 Lines
; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 4160		; DEFAULTSIZE: .amdhsa_private_segment_fixed_size 4160
; DEFAULTSIZE: ; ScratchSize: 4160		; DEFAULTSIZE: ; ScratchSize: 4160

; ASSUME1024: .amdhsa_private_segment_fixed_size 1088		; ASSUME1024: .amdhsa_private_segment_fixed_size 1088
; ASSUME1024: ; ScratchSize: 1088		; ASSUME1024: ; ScratchSize: 1088


define void @func_non_entry_block_static_alloca_align4(i32 addrspace(1)* %out, i32 %arg.cond0, i32 %arg.cond1, i32 %in) {		define void @func_non_entry_block_static_alloca_align4(i32 addrspace(1)* %out, i32 %arg.cond0, i32 %arg.cond1, i32 %in) {
; GCN-LABEL: func_non_entry_block_static_alloca_align4:		; MUBUF-LABEL: func_non_entry_block_static_alloca_align4:
; GCN: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_mov_b32 s7, s33		; MUBUF-NEXT: s_mov_b32 s7, s33
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2		; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
; GCN-NEXT: s_mov_b32 s33, s32		; MUBUF-NEXT: s_mov_b32 s33, s32
; GCN-NEXT: s_add_u32 s32, s32, 0x400		; MUBUF-NEXT: s_add_u32 s32, s32, 0x400
; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc		; MUBUF-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GCN-NEXT: s_cbranch_execz BB2_3		; MUBUF-NEXT: s_cbranch_execz BB2_3
; GCN-NEXT: ; %bb.1: ; %bb.0		; MUBUF-NEXT: ; %bb.1: ; %bb.0
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3		; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3
; GCN-NEXT: s_and_b64 exec, exec, vcc		; MUBUF-NEXT: s_and_b64 exec, exec, vcc
; GCN-NEXT: s_cbranch_execz BB2_3		; MUBUF-NEXT: s_cbranch_execz BB2_3
; GCN-NEXT: ; %bb.2: ; %bb.1		; MUBUF-NEXT: ; %bb.2: ; %bb.1
; GCN-NEXT: s_add_i32 s6, s32, 0x1000		; MUBUF-NEXT: s_add_i32 s6, s32, 0x1000
; GCN-NEXT: v_mov_b32_e32 v2, 0		; MUBUF-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: v_mov_b32_e32 v3, s6		; MUBUF-NEXT: v_mov_b32_e32 v3, s6
; GCN-NEXT: buffer_store_dword v2, v3, s[0:3], 0 offen		; MUBUF-NEXT: buffer_store_dword v2, v3, s[0:3], 0 offen
; GCN-NEXT: v_mov_b32_e32 v2, 1		; MUBUF-NEXT: v_mov_b32_e32 v2, 1
; GCN-NEXT: buffer_store_dword v2, v3, s[0:3], 0 offen offset:4		; MUBUF-NEXT: buffer_store_dword v2, v3, s[0:3], 0 offen offset:4
; GCN-NEXT: v_lshl_add_u32 v2, v4, 2, s6		; MUBUF-NEXT: v_lshl_add_u32 v2, v4, 2, s6
; GCN-NEXT: buffer_load_dword v2, v2, s[0:3], 0 offen		; MUBUF-NEXT: buffer_load_dword v2, v2, s[0:3], 0 offen
; GCN-NEXT: v_and_b32_e32 v3, 0x3ff, v5		; MUBUF-NEXT: v_and_b32_e32 v3, 0x3ff, v5
; GCN-NEXT: s_mov_b32 s32, s6		; MUBUF-NEXT: s_mov_b32 s32, s6
; GCN-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_add_u32_e32 v2, v2, v3		; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3
; GCN-NEXT: global_store_dword v[0:1], v2, off		; MUBUF-NEXT: global_store_dword v[0:1], v2, off
; GCN-NEXT: BB2_3: ; %bb.2		; MUBUF-NEXT: BB2_3: ; %bb.2
; GCN-NEXT: s_or_b64 exec, exec, s[4:5]		; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]
; GCN-NEXT: v_mov_b32_e32 v0, 0		; MUBUF-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: global_store_dword v[0:1], v0, off		; MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GCN-NEXT: s_sub_u32 s32, s32, 0x400		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x400
; GCN-NEXT: s_mov_b32 s33, s7		; MUBUF-NEXT: s_mov_b32 s33, s7
; GCN-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; MUBUF-NEXT: s_setpc_b64 s[30:31]
		;
		; FLATSCR-LABEL: func_non_entry_block_static_alloca_align4:
		; FLATSCR: ; %bb.0: ; %entry
		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; FLATSCR-NEXT: s_mov_b32 s9, s33
		; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
		; FLATSCR-NEXT: s_mov_b32 s33, s32
		; FLATSCR-NEXT: s_add_u32 s32, s32, 16
		; FLATSCR-NEXT: s_and_saveexec_b64 s[4:5], vcc
		; FLATSCR-NEXT: s_cbranch_execz BB2_3
		; FLATSCR-NEXT: ; %bb.1: ; %bb.0
		; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v3
		; FLATSCR-NEXT: s_and_b64 exec, exec, vcc
		; FLATSCR-NEXT: s_cbranch_execz BB2_3
		; FLATSCR-NEXT: ; %bb.2: ; %bb.1
		; FLATSCR-NEXT: s_mov_b32 s6, s32
		; FLATSCR-NEXT: s_movk_i32 s7, 0x1000
		; FLATSCR-NEXT: s_add_i32 s8, s6, s7
		; FLATSCR-NEXT: s_add_u32 s6, s6, s7
		; FLATSCR-NEXT: v_mov_b32_e32 v2, 0
		; FLATSCR-NEXT: scratch_store_dword off, v2, s6
		; FLATSCR-NEXT: v_mov_b32_e32 v2, 1
		; FLATSCR-NEXT: scratch_store_dword off, v2, s8 offset:4
		; FLATSCR-NEXT: v_lshl_add_u32 v2, v4, 2, s8
		; FLATSCR-NEXT: scratch_load_dword v2, v2, off
		; FLATSCR-NEXT: v_and_b32_e32 v3, 0x3ff, v5
		; FLATSCR-NEXT: s_mov_b32 s32, s8
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3
		; FLATSCR-NEXT: global_store_dword v[0:1], v2, off
		; FLATSCR-NEXT: BB2_3: ; %bb.2
		; FLATSCR-NEXT: s_or_b64 exec, exec, s[4:5]
		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
		; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 16
		; FLATSCR-NEXT: s_mov_b32 s33, s9
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: s_setpc_b64 s[30:31]

entry:		entry:
%cond0 = icmp eq i32 %arg.cond0, 0		%cond0 = icmp eq i32 %arg.cond0, 0
br i1 %cond0, label %bb.0, label %bb.2		br i1 %cond0, label %bb.0, label %bb.2

bb.0:		bb.0:
%alloca = alloca [16 x i32], align 4, addrspace(5)		%alloca = alloca [16 x i32], align 4, addrspace(5)
%gep0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0		%gep0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0
Show All 13 Lines	bb.1:
br label %bb.2		br label %bb.2

bb.2:		bb.2:
store volatile i32 0, i32 addrspace(1)* undef		store volatile i32 0, i32 addrspace(1)* undef
ret void		ret void
}		}

define void @func_non_entry_block_static_alloca_align64(i32 addrspace(1)* %out, i32 %arg.cond, i32 %in) {		define void @func_non_entry_block_static_alloca_align64(i32 addrspace(1)* %out, i32 %arg.cond, i32 %in) {
; GCN-LABEL: func_non_entry_block_static_alloca_align64:		; MUBUF-LABEL: func_non_entry_block_static_alloca_align64:
; GCN: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GCN-NEXT: s_add_u32 s4, s32, 0xfc0		; MUBUF-NEXT: s_add_u32 s4, s32, 0xfc0
; GCN-NEXT: s_mov_b32 s7, s33		; MUBUF-NEXT: s_mov_b32 s7, s33
; GCN-NEXT: s_and_b32 s33, s4, 0xfffff000		; MUBUF-NEXT: s_and_b32 s33, s4, 0xfffff000
; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2		; MUBUF-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
; GCN-NEXT: s_add_u32 s32, s32, 0x2000		; MUBUF-NEXT: s_add_u32 s32, s32, 0x2000
; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc		; MUBUF-NEXT: s_and_saveexec_b64 s[4:5], vcc
; GCN-NEXT: s_cbranch_execz BB3_2		; MUBUF-NEXT: s_cbranch_execz BB3_2
; GCN-NEXT: ; %bb.1: ; %bb.0		; MUBUF-NEXT: ; %bb.1: ; %bb.0
; GCN-NEXT: s_add_i32 s6, s32, 0x1000		; MUBUF-NEXT: s_add_i32 s6, s32, 0x1000
; GCN-NEXT: s_and_b32 s6, s6, 0xfffff000		; MUBUF-NEXT: s_and_b32 s6, s6, 0xfffff000
; GCN-NEXT: v_mov_b32_e32 v2, 0		; MUBUF-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: v_mov_b32_e32 v5, s6		; MUBUF-NEXT: v_mov_b32_e32 v5, s6
; GCN-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen		; MUBUF-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen
; GCN-NEXT: v_mov_b32_e32 v2, 1		; MUBUF-NEXT: v_mov_b32_e32 v2, 1
; GCN-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen offset:4		; MUBUF-NEXT: buffer_store_dword v2, v5, s[0:3], 0 offen offset:4
; GCN-NEXT: v_lshl_add_u32 v2, v3, 2, s6		; MUBUF-NEXT: v_lshl_add_u32 v2, v3, 2, s6
; GCN-NEXT: buffer_load_dword v2, v2, s[0:3], 0 offen		; MUBUF-NEXT: buffer_load_dword v2, v2, s[0:3], 0 offen
; GCN-NEXT: v_and_b32_e32 v3, 0x3ff, v4		; MUBUF-NEXT: v_and_b32_e32 v3, 0x3ff, v4
; GCN-NEXT: s_mov_b32 s32, s6		; MUBUF-NEXT: s_mov_b32 s32, s6
; GCN-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: v_add_u32_e32 v2, v2, v3		; MUBUF-NEXT: v_add_u32_e32 v2, v2, v3
; GCN-NEXT: global_store_dword v[0:1], v2, off		; MUBUF-NEXT: global_store_dword v[0:1], v2, off
; GCN-NEXT: BB3_2: ; %bb.1		; MUBUF-NEXT: BB3_2: ; %bb.1
; GCN-NEXT: s_or_b64 exec, exec, s[4:5]		; MUBUF-NEXT: s_or_b64 exec, exec, s[4:5]
; GCN-NEXT: v_mov_b32_e32 v0, 0		; MUBUF-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: global_store_dword v[0:1], v0, off		; MUBUF-NEXT: global_store_dword v[0:1], v0, off
; GCN-NEXT: s_sub_u32 s32, s32, 0x2000		; MUBUF-NEXT: s_sub_u32 s32, s32, 0x2000
; GCN-NEXT: s_mov_b32 s33, s7		; MUBUF-NEXT: s_mov_b32 s33, s7
; GCN-NEXT: s_waitcnt vmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0)
; GCN-NEXT: s_setpc_b64 s[30:31]		; MUBUF-NEXT: s_setpc_b64 s[30:31]
		;
		; FLATSCR-LABEL: func_non_entry_block_static_alloca_align64:
		; FLATSCR: ; %bb.0: ; %entry
		; FLATSCR-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
		; FLATSCR-NEXT: s_add_u32 s4, s32, 63
		; FLATSCR-NEXT: s_mov_b32 s7, s33
		; FLATSCR-NEXT: s_and_b32 s33, s4, 0xffffffc0
		; FLATSCR-NEXT: v_cmp_eq_u32_e32 vcc, 0, v2
		; FLATSCR-NEXT: s_add_u32 s32, s32, 0x80
		; FLATSCR-NEXT: s_and_saveexec_b64 s[4:5], vcc
		; FLATSCR-NEXT: s_cbranch_execz BB3_2
		; FLATSCR-NEXT: ; %bb.1: ; %bb.0
		; FLATSCR-NEXT: s_add_i32 s6, s32, 0x1000
		; FLATSCR-NEXT: s_and_b32 s6, s6, 0xfffff000
		; FLATSCR-NEXT: v_mov_b32_e32 v2, 0
		; FLATSCR-NEXT: scratch_store_dword off, v2, s6
		; FLATSCR-NEXT: v_mov_b32_e32 v2, 1
		; FLATSCR-NEXT: scratch_store_dword off, v2, s6 offset:4
		; FLATSCR-NEXT: v_lshl_add_u32 v2, v3, 2, s6
		; FLATSCR-NEXT: scratch_load_dword v2, v2, off
		; FLATSCR-NEXT: v_and_b32_e32 v3, 0x3ff, v4
		; FLATSCR-NEXT: s_mov_b32 s32, s6
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: v_add_u32_e32 v2, v2, v3
		; FLATSCR-NEXT: global_store_dword v[0:1], v2, off
		; FLATSCR-NEXT: BB3_2: ; %bb.1
		; FLATSCR-NEXT: s_or_b64 exec, exec, s[4:5]
		; FLATSCR-NEXT: v_mov_b32_e32 v0, 0
		; FLATSCR-NEXT: global_store_dword v[0:1], v0, off
		; FLATSCR-NEXT: s_sub_u32 s32, s32, 0x80
		; FLATSCR-NEXT: s_mov_b32 s33, s7
		; FLATSCR-NEXT: s_waitcnt vmcnt(0)
		; FLATSCR-NEXT: s_setpc_b64 s[30:31]
entry:		entry:
%cond = icmp eq i32 %arg.cond, 0		%cond = icmp eq i32 %arg.cond, 0
br i1 %cond, label %bb.0, label %bb.1		br i1 %cond, label %bb.0, label %bb.1

bb.0:		bb.0:
%alloca = alloca [16 x i32], align 64, addrspace(5)		%alloca = alloca [16 x i32], align 64, addrspace(5)
%gep0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0		%gep0 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 0
%gep1 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 1		%gep1 = getelementptr [16 x i32], [16 x i32] addrspace(5)* %alloca, i32 0, i32 1
Show All 17 Lines

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck %s			# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck --check-prefix=MUBUF %s
				# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog -amdgpu-enable-flat-scratch %s -o - \| FileCheck --check-prefix=FLATSCR %s

	# Test what happens when an SGPR is unavailable for the unused add. The non-inline constant needs to be folded into the add instruction and not materialized in a register.			# Test what happens when an SGPR is unavailable for the unused add. The non-inline constant needs to be folded into the add instruction and not materialized in a register.

	---			---
	name: scavenge_sgpr_pei_no_sgprs			name: scavenge_sgpr_pei_no_sgprs
	tracksRegLiveness: true			tracksRegLiveness: true

	stack:			stack:
	- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 0, type: default, offset: 0, size: 4, alignment: 8192 }
	- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }			- { id: 1, type: default, offset: 0, size: 4, alignment: 8192 }

	machineFunctionInfo:			machineFunctionInfo:
	isEntryFunction: false			isEntryFunction: false
	scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3			scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
	frameOffsetReg: $sgpr33			frameOffsetReg: $sgpr33
	stackPtrOffsetReg: $sgpr32			stackPtrOffsetReg: $sgpr32

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr1			liveins: $vgpr1

	; CHECK-LABEL: name: scavenge_sgpr_pei_no_sgprs			; MUBUF-LABEL: name: scavenge_sgpr_pei_no_sgprs
	; CHECK: liveins: $sgpr27, $vgpr1			; MUBUF: liveins: $sgpr27, $vgpr1
	; CHECK: $sgpr27 = frame-setup COPY $sgpr33			; MUBUF: $sgpr27 = frame-setup COPY $sgpr33
	; CHECK: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc			; MUBUF: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 524224, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc			; MUBUF: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294443008, implicit-def $scc
	; CHECK: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc			; MUBUF: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			; MUBUF: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	; CHECK: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec			; MUBUF: $vgpr2 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
	; CHECK: $vgpr2 = V_ADD_U32_e32 8192, killed $vgpr2, implicit $exec			; MUBUF: $vgpr2 = V_ADD_U32_e32 8192, killed $vgpr2, implicit $exec
	; CHECK: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			; MUBUF: $vgpr0 = V_OR_B32_e32 killed $vgpr2, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	; CHECK: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc			; MUBUF: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
	; CHECK: $sgpr33 = frame-setup COPY $sgpr27			; MUBUF: $sgpr33 = frame-setup COPY $sgpr27
	; CHECK: S_ENDPGM 0, implicit $vcc			; MUBUF: S_ENDPGM 0, implicit $vcc
				; FLATSCR-LABEL: name: scavenge_sgpr_pei_no_sgprs
				; FLATSCR: liveins: $sgpr27, $vgpr1
				; FLATSCR: $sgpr27 = frame-setup COPY $sgpr33
				; FLATSCR: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 8191, implicit-def $scc
				; FLATSCR: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294959104, implicit-def $scc
				; FLATSCR: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 24576, implicit-def $scc
				; FLATSCR: S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
				; FLATSCR: $sgpr33 = S_ADD_U32 $sgpr33, 8192, implicit-def $scc
				; FLATSCR: $vgpr0 = V_OR_B32_e32 $sgpr33, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
				; FLATSCR: $sgpr33 = S_SUB_U32 $sgpr33, 8192, implicit-def $scc
				; FLATSCR: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 24576, implicit-def $scc
				; FLATSCR: $sgpr33 = frame-setup COPY $sgpr27
				; FLATSCR: S_ENDPGM 0, implicit $vcc
	S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc			S_NOP 0, implicit-def $sgpr4, implicit-def $sgpr5, implicit-def $sgpr6, implicit-def $sgpr7, implicit-def $sgpr8, implicit-def $sgpr9, implicit-def $sgpr10, implicit-def $sgpr11, implicit-def $sgpr12, implicit-def $sgpr13, implicit-def $sgpr14, implicit-def $sgpr15, implicit-def $sgpr16, implicit-def $sgpr17, implicit-def $sgpr18, implicit-def $sgpr19, implicit-def $sgpr20, implicit-def $sgpr21, implicit-def $sgpr22, implicit-def $sgpr23, implicit-def $sgpr24, implicit-def $sgpr25, implicit-def $sgpr26, implicit-def $sgpr17, implicit-def $sgpr28, implicit-def $sgpr29, implicit-def $sgpr30, implicit-def $sgpr31, implicit-def $vcc
	$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31			$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec, implicit $sgpr4, implicit $sgpr5, implicit $sgpr6, implicit $sgpr7, implicit $sgpr8, implicit $sgpr9, implicit $sgpr10, implicit $sgpr11, implicit $sgpr12, implicit $sgpr13, implicit $sgpr14, implicit $sgpr15, implicit $sgpr16, implicit $sgpr17, implicit $sgpr18, implicit $sgpr19, implicit $sgpr20, implicit $sgpr21, implicit $sgpr22, implicit $sgpr23, implicit $sgpr24, implicit $sgpr25, implicit $sgpr26, implicit $sgpr17, implicit $sgpr28, implicit $sgpr29, implicit $sgpr30, implicit $sgpr31
	S_ENDPGM 0, implicit $vcc			S_ENDPGM 0, implicit $vcc
	...			...

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py		# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck -check-prefix=GFX8 %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck -check-prefix=GFX8 %s
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck -check-prefix=GFX9 %s		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck -check-prefix=GFX9 %s
		# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog -amdgpu-enable-flat-scratch %s -o - \| FileCheck -check-prefix=GFX9-FLATSCR %s

# Test case where spilling a VGPR to an emergency slot is needed during frame index elimination.		# Test case where spilling a VGPR to an emergency slot is needed during frame index elimination.

---		---
name: pei_scavenge_vgpr_spill		name: pei_scavenge_vgpr_spill
tracksRegLiveness: true		tracksRegLiveness: true

stack:		stack:
Show All 38 Lines	bb.0:
; GFX9: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec		; GFX9: $vgpr3 = V_LSHRREV_B32_e64 6, $sgpr33, implicit $exec
; GFX9: $vgpr3 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec		; GFX9: $vgpr3 = V_ADD_U32_e32 8192, killed $vgpr3, implicit $exec
; GFX9: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec		; GFX9: $vgpr0 = V_OR_B32_e32 killed $vgpr3, $vgpr1, implicit $exec
; GFX9: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc		; GFX9: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 1572864, implicit-def $scc
; GFX9: $sgpr33 = V_READLANE_B32_vi $vgpr2, 0		; GFX9: $sgpr33 = V_READLANE_B32_vi $vgpr2, 0
; GFX9: $sgpr4 = S_ADD_U32 $sgpr33, 524544, implicit-def $scc		; GFX9: $sgpr4 = S_ADD_U32 $sgpr33, 524544, implicit-def $scc
; GFX9: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)		; GFX9: $vgpr3 = BUFFER_LOAD_DWORD_OFFSET $sgpr0_sgpr1_sgpr2_sgpr3, killed $sgpr4, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4 from %stack.3, addrspace 5)
; GFX9: S_ENDPGM 0, csr_amdgpu_allvgprs		; GFX9: S_ENDPGM 0, csr_amdgpu_allvgprs
		; GFX9-FLATSCR-LABEL: name: pei_scavenge_vgpr_spill
		; GFX9-FLATSCR: liveins: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31, $vgpr32_vgpr33_vgpr34_vgpr35_vgpr36_vgpr37_vgpr38_vgpr39_vgpr40_vgpr41_vgpr42_vgpr43_vgpr44_vgpr45_vgpr46_vgpr47, $vgpr48_vgpr49_vgpr50_vgpr51_vgpr52_vgpr53_vgpr54_vgpr55_vgpr56_vgpr57_vgpr58_vgpr59_vgpr60_vgpr61_vgpr62_vgpr63, $vgpr64_vgpr65_vgpr66_vgpr67_vgpr68_vgpr69_vgpr70_vgpr71_vgpr72_vgpr73_vgpr74_vgpr75_vgpr76_vgpr77_vgpr78_vgpr79, $vgpr80_vgpr81_vgpr82_vgpr83_vgpr84_vgpr85_vgpr86_vgpr87_vgpr88_vgpr89_vgpr90_vgpr91_vgpr92_vgpr93_vgpr94_vgpr95, $vgpr96_vgpr97_vgpr98_vgpr99_vgpr100_vgpr101_vgpr102_vgpr103_vgpr104_vgpr105_vgpr106_vgpr107_vgpr108_vgpr109_vgpr110_vgpr111, $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127, $vgpr128_vgpr129_vgpr130_vgpr131_vgpr132_vgpr133_vgpr134_vgpr135_vgpr136_vgpr137_vgpr138_vgpr139_vgpr140_vgpr141_vgpr142_vgpr143, $vgpr144_vgpr145_vgpr146_vgpr147_vgpr148_vgpr149_vgpr150_vgpr151_vgpr152_vgpr153_vgpr154_vgpr155_vgpr156_vgpr157_vgpr158_vgpr159, $vgpr160_vgpr161_vgpr162_vgpr163_vgpr164_vgpr165_vgpr166_vgpr167_vgpr168_vgpr169_vgpr170_vgpr171_vgpr172_vgpr173_vgpr174_vgpr175, $vgpr176_vgpr177_vgpr178_vgpr179_vgpr180_vgpr181_vgpr182_vgpr183_vgpr184_vgpr185_vgpr186_vgpr187_vgpr188_vgpr189_vgpr190_vgpr191, $vgpr192_vgpr193_vgpr194_vgpr195_vgpr196_vgpr197_vgpr198_vgpr199_vgpr200_vgpr201_vgpr202_vgpr203_vgpr204_vgpr205_vgpr206_vgpr207, $vgpr208_vgpr209_vgpr210_vgpr211_vgpr212_vgpr213_vgpr214_vgpr215_vgpr216_vgpr217_vgpr218_vgpr219_vgpr220_vgpr221_vgpr222_vgpr223, $vgpr224_vgpr225_vgpr226_vgpr227_vgpr228_vgpr229_vgpr230_vgpr231_vgpr232_vgpr233_vgpr234_vgpr235_vgpr236_vgpr237_vgpr238_vgpr239, $vgpr240_vgpr241_vgpr242_vgpr243_vgpr244_vgpr245_vgpr246_vgpr247, $vgpr248_vgpr249_vgpr250_vgpr251, $vgpr252_vgpr253_vgpr254_vgpr255, $vgpr2
		; GFX9-FLATSCR: $vgpr2 = V_WRITELANE_B32_vi $sgpr33, 0, undef $vgpr2
		; GFX9-FLATSCR: $sgpr4 = frame-setup S_ADD_U32 $sgpr32, 8191, implicit-def $scc
		; GFX9-FLATSCR: $sgpr33 = frame-setup S_AND_B32 killed $sgpr4, 4294959104, implicit-def $scc
		; GFX9-FLATSCR: $sgpr32 = frame-setup S_ADD_U32 $sgpr32, 24576, implicit-def $scc
		; GFX9-FLATSCR: $vcc_hi = S_ADD_U32 $sgpr33, 8192, implicit-def $scc
		; GFX9-FLATSCR: $vgpr0 = V_OR_B32_e32 killed $vcc_hi, $vgpr1, implicit $exec
		; GFX9-FLATSCR: $sgpr32 = frame-destroy S_SUB_U32 $sgpr32, 24576, implicit-def $scc
		; GFX9-FLATSCR: $sgpr33 = V_READLANE_B32_vi $vgpr2, 0
		; GFX9-FLATSCR: S_ENDPGM 0, csr_amdgpu_allvgprs
$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec		$vgpr0 = V_OR_B32_e32 %stack.1, $vgpr1, implicit $exec
S_ENDPGM 0, csr_amdgpu_allvgprs		S_ENDPGM 0, csr_amdgpu_allvgprs
...		...

llvm/test/CodeGen/AMDGPU/scratch-simple.ll

	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=verde -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,SI,SIVI %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=verde -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,SI,SIVI,MUBUF %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx803 -mattr=-flat-for-global -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,VI,SIVI %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx803 -mattr=-flat-for-global -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,VI,SIVI,MUBUF %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX9,GFX9_10 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX9,GFX9_10,MUBUF,GFX9-MUBUF,GFX9_10-MUBUF %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx900 -filetype=obj -amdgpu-use-divergent-register-indexing < %s \| llvm-readobj -r - \| FileCheck --check-prefix=RELS %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx900 -filetype=obj -amdgpu-use-divergent-register-indexing < %s \| llvm-readobj -r - \| FileCheck --check-prefix=RELS %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx1010 -mattr=-flat-for-global -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10_W32,GFX9_10 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx1010 -mattr=-flat-for-global -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10_W32,GFX9_10,MUBUF,GFX10_W32-MUBUF,GFX9_10-MUBUF %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx1010 -mattr=-flat-for-global,+wavefrontsize64 -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10_W64,GFX9_10 %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx1010 -mattr=-flat-for-global,+wavefrontsize64 -amdgpu-use-divergent-register-indexing -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10_W64,GFX9_10,MUBUF,GFX10_W64-MUBUF,GFX9_10-MUBUF %s
				; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global -amdgpu-use-divergent-register-indexing -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX9,GFX9_10,FLATSCR,GFX9-FLATSCR %s
				; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx1030 -mattr=-flat-for-global -amdgpu-use-divergent-register-indexing -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck --check-prefixes=GCN,GFX10_W32,GFX9_10,FLATSCR,GFX10-FLATSCR,GFX9_10-FLATSCR %s

	; RELS: R_AMDGPU_ABS32_LO SCRATCH_RSRC_DWORD0 0x0			; RELS: R_AMDGPU_ABS32_LO SCRATCH_RSRC_DWORD0 0x0
	; RELS: R_AMDGPU_ABS32_LO SCRATCH_RSRC_DWORD1 0x0			; RELS: R_AMDGPU_ABS32_LO SCRATCH_RSRC_DWORD1 0x0

	; This used to fail due to a v_add_i32 instruction with an illegal immediate			; This used to fail due to a v_add_i32 instruction with an illegal immediate
	; operand that was created during Local Stack Slot Allocation. Test case derived			; operand that was created during Local Stack Slot Allocation. Test case derived
	; from https://bugs.freedesktop.org/show_bug.cgi?id=96602			; from https://bugs.freedesktop.org/show_bug.cgi?id=96602
	;			;
	; GCN-LABEL: {{^}}ps_main:			; GCN-LABEL: {{^}}ps_main:

	; GCN-DAG: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9-FLATSCR: s_add_u32 flat_scratch_lo, s0, s2
	; GCN-DAG: s_mov_b32 s1, SCRATCH_RSRC_DWORD1			; GFX9-FLATSCR: s_addc_u32 flat_scratch_hi, s1, 0
	; GCN-DAG: s_mov_b32 s2, -1
				; GFX10-FLATSCR: s_add_u32 s0, s0, s2
				; GFX10-FLATSCR: s_addc_u32 s1, s1, 0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1

				; MUBUF-DAG: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
				; MUBUF-DAG: s_mov_b32 s1, SCRATCH_RSRC_DWORD1
				; MUBUF-DAG: s_mov_b32 s2, -1
	; SI-DAG: s_mov_b32 s3, 0xe8f000			; SI-DAG: s_mov_b32 s3, 0xe8f000
	; VI-DAG: s_mov_b32 s3, 0xe80000			; VI-DAG: s_mov_b32 s3, 0xe80000
	; GFX9-DAG: s_mov_b32 s3, 0xe00000			; GFX9-MUBUF-DAG: s_mov_b32 s3, 0xe00000
	; GFX10_W32-DAG: s_mov_b32 s3, 0x31c16000			; GFX10_W32-MUBUF-DAG: s_mov_b32 s3, 0x31c16000
	; GFX10_W64-DAG: s_mov_b32 s3, 0x31e16000			; GFX10_W64-MUBUF-DAG: s_mov_b32 s3, 0x31e16000

				; FLATSCR-NOT: SCRATCH_RSRC_DWORD

				; GFX9-FLATSCR: s_mov_b32 [[SP:[^,]+]], 0
				; GFX9-FLATSCR: scratch_store_dword off, v2, [[SP]] offset:
				; GFX9-FLATSCR: s_mov_b32 [[SP:[^,]+]], 0
				; GFX9-FLATSCR: scratch_store_dword off, v2, [[SP]] offset:

				; GFX10-FLATSCR: scratch_store_dword off, v2, off offset:
				; GFX10-FLATSCR: scratch_store_dword off, v2, off offset:

	; GCN-DAG: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0			; GCN-DAG: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0
	; GCN-DAG: v_and_b32_e32 [[CLAMP_IDX:v[0-9]+]], 0x1fc, [[BYTES]]			; GCN-DAG: v_and_b32_e32 [[CLAMP_IDX:v[0-9]+]], 0x1fc, [[BYTES]]
	; GCN-NOT: s_mov_b32 s0			; GCN-NOT: s_mov_b32 s0

	; GCN-DAG: v_add{{_\|_nc_}}{{i\|u}}32_e32 [[HI_OFF:v[0-9]+]],{{.*}} 0x280, [[CLAMP_IDX]]			; GCN-DAG: v_add{{_\|_nc_}}{{i\|u}}32_e32 [[HI_OFF:v[0-9]+]],{{.*}} 0x280, [[CLAMP_IDX]]
	; GCN-DAG: v_add{{_\|_nc_}}{{i\|u}}32_e32 [[LO_OFF:v[0-9]+]],{{.*}} {{v2\|0x80}}, [[CLAMP_IDX]]			; GCN-DAG: v_add{{_\|_nc_}}{{i\|u}}32_e32 [[LO_OFF:v[0-9]+]],{{.*}} {{v2\|0x80}}, [[CLAMP_IDX]]

	; GCN: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; MUBUF: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; MUBUF: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, 0 offen
				; FLATSCR: scratch_load_dword {{v[0-9]+}}, [[LO_OFF]], off
				; FLATSCR: scratch_load_dword {{v[0-9]+}}, [[HI_OFF]], off
	define amdgpu_ps float @ps_main(i32 %idx) {			define amdgpu_ps float @ps_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}vs_main:			; GCN-LABEL: {{^}}vs_main:
	; GCN-DAG: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9-FLATSCR: s_add_u32 flat_scratch_lo, s0, s2
				; GFX9-FLATSCR: s_addc_u32 flat_scratch_hi, s1, 0

				; GFX10-FLATSCR: s_add_u32 s0, s0, s2
				; GFX10-FLATSCR: s_addc_u32 s1, s1, 0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1

				; MUBUF-DAG: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; GCN-NOT: s_mov_b32 s0			; GCN-NOT: s_mov_b32 s0
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; FLATSCR-NOT: SCRATCH_RSRC_DWORD

				; MUBUF: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
				; MUBUF: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

				; GFX9-FLATSCR: s_mov_b32 [[SP:[^,]+]], 0
				; GFX9-FLATSCR: scratch_store_dword off, v2, [[SP]] offset:
				; GFX9-FLATSCR: s_mov_b32 [[SP:[^,]+]], 0
				; GFX9-FLATSCR: scratch_store_dword off, v2, [[SP]] offset:

				; FLATSCR: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
				; FLATSCR: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off

	define amdgpu_vs float @vs_main(i32 %idx) {			define amdgpu_vs float @vs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}cs_main:			; GCN-LABEL: {{^}}cs_main:
	; GCN-DAG: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9-FLATSCR: s_add_u32 flat_scratch_lo, s0, s2
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9-FLATSCR: s_addc_u32 flat_scratch_hi, s1, 0
	; GCN: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
				; GFX10-FLATSCR: s_add_u32 s0, s0, s2
				; GFX10-FLATSCR: s_addc_u32 s1, s1, 0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1

				; MUBUF-DAG: s_mov_b32 s0, SCRATCH_RSRC_DWORD0

				; FLATSCR-NOT: SCRATCH_RSRC_DWORD

				; MUBUF: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
				; MUBUF: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

				; FLATSCR: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
				; FLATSCR: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
	define amdgpu_cs float @cs_main(i32 %idx) {			define amdgpu_cs float @cs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}hs_main:			; GCN-LABEL: {{^}}hs_main:
				; GFX9-FLATSCR: s_add_u32 flat_scratch_lo, s0, s5
				; GFX9-FLATSCR: s_addc_u32 flat_scratch_hi, s1, 0

				; GFX10-FLATSCR: s_add_u32 s0, s0, s5
				; GFX10-FLATSCR: s_addc_u32 s1, s1, 0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1

	; SIVI: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; SIVI: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; SIVI-NOT: s_mov_b32 s0			; SIVI-NOT: s_mov_b32 s0
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9_10-MUBUF: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; GFX9_10-NOT: s_mov_b32 s5			; GFX9_10-NOT: s_mov_b32 s5
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9_10-MUBUF: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9_10-MUBUF: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

				; FLATSCR-NOT: SCRATCH_RSRC_DWORD
				; FLATSCR: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
				; FLATSCR: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
	define amdgpu_hs float @hs_main(i32 %idx) {			define amdgpu_hs float @hs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; GCN-LABEL: {{^}}gs_main:			; GCN-LABEL: {{^}}gs_main:
				; GFX9-FLATSCR: s_add_u32 flat_scratch_lo, s0, s5
				; GFX9-FLATSCR: s_addc_u32 flat_scratch_hi, s1, 0

				; GFX10-FLATSCR: s_add_u32 s0, s0, s5
				; GFX10-FLATSCR: s_addc_u32 s1, s1, 0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1

	; SIVI: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; SIVI: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10: s_mov_b32 s0, SCRATCH_RSRC_DWORD0			; GFX9_10-MUBUF: s_mov_b32 s0, SCRATCH_RSRC_DWORD0
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9_10-MUBUF: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9_10-MUBUF: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

				; FLATSCR-NOT: SCRATCH_RSRC_DWORD
				; FLATSCR: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
				; FLATSCR: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
	define amdgpu_gs float @gs_main(i32 %idx) {			define amdgpu_gs float @gs_main(i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%r = fadd float %v1, %v2			%r = fadd float %v1, %v2
	ret float %r			ret float %r
	}			}

	; Mesa GS and HS shaders have the preloaded scratch wave offset SGPR fixed at			; Mesa GS and HS shaders have the preloaded scratch wave offset SGPR fixed at
	; SGPR5, and the inreg implementation is used to reference it in the IR. The			; SGPR5, and the inreg implementation is used to reference it in the IR. The
	; following tests confirm the shader and anything inserted after the return			; following tests confirm the shader and anything inserted after the return
	; (i.e. SI_RETURN_TO_EPILOG) can access the scratch wave offset.			; (i.e. SI_RETURN_TO_EPILOG) can access the scratch wave offset.

	; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:			; GCN-LABEL: {{^}}hs_ir_uses_scratch_offset:
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GFX9-FLATSCR: s_add_u32 flat_scratch_lo, s0, s5
				; GFX9-FLATSCR: s_addc_u32 flat_scratch_hi, s1, 0

				; GFX10-FLATSCR: s_add_u32 s0, s0, s5
				; GFX10-FLATSCR: s_addc_u32 s1, s1, 0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1

				; MUBUF: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
				; FLATSCR-NOT: SCRATCH_RSRC_DWORD

	; SIVI-NOT: s_mov_b32 s6			; SIVI-NOT: s_mov_b32 s6
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10-NOT: s_mov_b32 s5			; GFX9_10-NOT: s_mov_b32 s5
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9_10-MUBUF-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9_10-MUBUF-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

				; MUBUF-DAG: s_mov_b32 s2, s5

	; GCN-DAG: s_mov_b32 s2, s5			; FLATSCR-DAG: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
				; FLATSCR-DAG: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
	define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {			define amdgpu_hs <{i32, i32, i32, float}> @hs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%f = fadd float %v1, %v2			%f = fadd float %v1, %v2
	%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2			%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
	%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3			%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
	ret <{i32, i32, i32, float}> %r2			ret <{i32, i32, i32, float}> %r2
	}			}

	; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:			; GCN-LABEL: {{^}}gs_ir_uses_scratch_offset:
	; GCN: s_mov_b32 s8, SCRATCH_RSRC_DWORD0			; GFX9-FLATSCR: s_add_u32 flat_scratch_lo, s0, s5
				; GFX9-FLATSCR: s_addc_u32 flat_scratch_hi, s1, 0

				; GFX10-FLATSCR: s_add_u32 s0, s0, s5
				; GFX10-FLATSCR: s_addc_u32 s1, s1, 0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s0
				; GFX10-FLATSCR: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s1

				; MUBUF: s_mov_b32 s8, SCRATCH_RSRC_DWORD0
				; FLATSCR-NOT: SCRATCH_RSRC_DWORD

	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; SIVI: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9_10-MUBUF-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen
	; GFX9_10-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen			; GFX9_10-MUBUF-DAG: buffer_load_dword {{v[0-9]+}}, {{v[0-9]+}}, {{s\[[0-9]+:[0-9]+\]}}, 0 offen

				; MUBUF-DAG: s_mov_b32 s2, s5

	; GCN-DAG: s_mov_b32 s2, s5			; FLATSCR-DAG: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
				; FLATSCR-DAG: scratch_load_dword {{v[0-9]+}}, {{v[0-9]+}}, off
	define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {			define amdgpu_gs <{i32, i32, i32, float}> @gs_ir_uses_scratch_offset(i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg, i32 inreg %swo, i32 %idx) {
	%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx			%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
	%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx			%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
	%f = fadd float %v1, %v2			%f = fadd float %v1, %v2
	%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2			%r1 = insertvalue <{i32, i32, i32, float}> undef, i32 %swo, 2
	%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3			%r2 = insertvalue <{i32, i32, i32, float}> %r1, float %f, 3
	ret <{i32, i32, i32, float}> %r2			ret <{i32, i32, i32, float}> %r2
	}			}

llvm/test/CodeGen/AMDGPU/sgpr-spill.mir

# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck -check-prefix=CHECK -check-prefix=GCN64 %s		# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck -check-prefixes=CHECK,GCN64,MUBUF %s
# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck -check-prefix=CHECK -check-prefix=GCN32 %s		# RUN: llc -mtriple=amdgcn -mcpu=gfx1010 -mattr=+wavefrontsize32,-wavefrontsize64 -verify-machineinstrs -run-pass=prologepilog %s -o - \| FileCheck -check-prefixes=CHECK,GCN32,MUBUF %s
		# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-flat-scratch -run-pass=prologepilog %s -o - \| FileCheck -check-prefixes=CHECK,GCN64,FLATSCR %s


# CHECK-LABEL: name: check_spill		# CHECK-LABEL: name: check_spill

		# FLATSCR: $sgpr33 = S_MOV_B32 0
		# FLATSCR: $flat_scr_lo = S_ADD_U32 $sgpr0, $sgpr11, implicit-def $scc
		# FLATSCR: $flat_scr_hi = S_ADDC_U32 $sgpr1, 0, implicit-def $scc, implicit $scc

# S32 with kill		# S32 with kill
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: $sgpr12 = S_MOV_B32 $exec_lo		# CHECK: $sgpr12 = S_MOV_B32 $exec_lo
# CHECK: $exec_lo = S_MOV_B32 1		# CHECK: $exec_lo = S_MOV_B32 1
# CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 4		# MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 4
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr{{[0-9]+}}, $sgpr33, 4
# CHECK: $exec_lo = S_MOV_B32 killed $sgpr12		# CHECK: $exec_lo = S_MOV_B32 killed $sgpr12

# S32 without kill		# S32 without kill
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: $sgpr12 = S_MOV_B32 $exec_lo		# CHECK: $sgpr12 = S_MOV_B32 $exec_lo
# CHECK: $exec_lo = S_MOV_B32 1		# CHECK: $exec_lo = S_MOV_B32 1
# CHECK: BUFFER_STORE_DWORD_OFFSET $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 4		# MUBUF: BUFFER_STORE_DWORD_OFFSET $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 4
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR $vgpr{{[0-9]+}}, $sgpr33, 4
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE

# S64 with kill		# S64 with kill
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 3		# GCN32: $exec_lo = S_MOV_B32 3
# GCN64: $exec = S_MOV_B64 3		# GCN64: $exec = S_MOV_B64 3
# CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 8		# MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 8
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr{{[0-9]+}}, $sgpr33, 8
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13

# S64 without kill		# S64 without kill
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 3		# GCN32: $exec_lo = S_MOV_B32 3
# GCN64: $exec = S_MOV_B64 3		# GCN64: $exec = S_MOV_B64 3
# CHECK: BUFFER_STORE_DWORD_OFFSET $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 8		# MUBUF: BUFFER_STORE_DWORD_OFFSET $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 8
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR $vgpr{{[0-9]+}}, $sgpr33, 8
# GCN32: $exec_lo = S_MOV_B32 $sgpr12		# GCN32: $exec_lo = S_MOV_B32 $sgpr12
# GCN64: $exec = S_MOV_B64 $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 $sgpr12_sgpr13
# GCN64: $sgpr13 = V_READLANE		# GCN64: $sgpr13 = V_READLANE
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE

# S96		# S96
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 7		# GCN32: $exec_lo = S_MOV_B32 7
# GCN64: $exec = S_MOV_B64 7		# GCN64: $exec = S_MOV_B64 7
# CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 16		# MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 16
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr{{[0-9]+}}, $sgpr33, 16
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13

# S128		# S128
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 15		# GCN32: $exec_lo = S_MOV_B32 15
# GCN64: $exec = S_MOV_B64 15		# GCN64: $exec = S_MOV_B64 15
# CHECK: BUFFER_STORE_DWORD_OFFSET killed $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 28		# MUBUF: BUFFER_STORE_DWORD_OFFSET killed $vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 28
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR killed $vgpr{{[0-9]+}}, $sgpr33, 28
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13

# S160		# S160
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 31		# GCN32: $exec_lo = S_MOV_B32 31
# GCN64: $exec = S_MOV_B64 31		# GCN64: $exec = S_MOV_B64 31
# CHECK: BUFFER_STORE_DWORD_OFFSET {{(killed )?}}$vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 44		# MUBUF: BUFFER_STORE_DWORD_OFFSET {{(killed )?}}$vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 44
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR {{(killed )?}}$vgpr{{[0-9]+}}, $sgpr33, 44
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13

# S256		# S256
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 255		# GCN32: $exec_lo = S_MOV_B32 255
# GCN64: $exec = S_MOV_B64 255		# GCN64: $exec = S_MOV_B64 255
# CHECK: BUFFER_STORE_DWORD_OFFSET {{(killed )?}}$vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 64		# MUBUF: BUFFER_STORE_DWORD_OFFSET {{(killed )?}}$vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 64
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR {{(killed )?}}$vgpr{{[0-9]+}}, $sgpr33, 64
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13

# S512		# S512
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 65535		# GCN32: $exec_lo = S_MOV_B32 65535
# GCN64: $exec = S_MOV_B64 65535		# GCN64: $exec = S_MOV_B64 65535
# CHECK: BUFFER_STORE_DWORD_OFFSET {{(killed )?}}$vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 96		# MUBUF: BUFFER_STORE_DWORD_OFFSET {{(killed )?}}$vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 96
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR {{(killed )?}}$vgpr{{[0-9]+}}, $sgpr33, 96
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13

# S1024		# S1024
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
Show All 24 Lines
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# CHECK: V_WRITELANE		# CHECK: V_WRITELANE
# GCN32: $sgpr64 = S_MOV_B32 $exec_lo		# GCN32: $sgpr64 = S_MOV_B32 $exec_lo
# GCN64: $sgpr64_sgpr65 = S_MOV_B64 $exec		# GCN64: $sgpr64_sgpr65 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 4294967295		# GCN32: $exec_lo = S_MOV_B32 4294967295
# GCN64: $exec = S_MOV_B64 4294967295		# GCN64: $exec = S_MOV_B64 4294967295
# CHECK: BUFFER_STORE_DWORD_OFFSET {{(killed )?}}$vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 160		# MUBUF: BUFFER_STORE_DWORD_OFFSET {{(killed )?}}$vgpr{{[0-9]+}}, ${{(sgpr[0-9_]+)*}}, $sgpr33, 160
		# FLATSCR: SCRATCH_STORE_DWORD_SADDR {{(killed )?}}$vgpr{{[0-9]+}}, $sgpr33, 160
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr64		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr64
# GCN64: $exec = S_MOV_B64 killed $sgpr64_sgpr65		# GCN64: $exec = S_MOV_B64 killed $sgpr64_sgpr65

--- \|		--- \|

define amdgpu_kernel void @check_spill() #0 {		define amdgpu_kernel void @check_spill() #0 {
ret void		ret void
}		}
Show All 26 Lines	machineFunctionInfo:
explicitKernArgSize: 660		explicitKernArgSize: 660
maxKernArgAlign: 4		maxKernArgAlign: 4
isEntryFunction: true		isEntryFunction: true
waveLimiter: true		waveLimiter: true
scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'		scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
frameOffsetReg: '$sgpr33'		frameOffsetReg: '$sgpr33'
argumentInfo:		argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		flatScratchInit: { reg: '$sgpr0_sgpr1' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr2_sgpr3' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		privateSegmentBuffer: { reg: '$sgpr4_sgpr5_sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
privateSegmentWaveByteOffset: { reg: '$sgpr9' }		workGroupIDX: { reg: '$sgpr10' }
		privateSegmentWaveByteOffset: { reg: '$sgpr11' }
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7		liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7

renamable $sgpr12 = IMPLICIT_DEF		renamable $sgpr12 = IMPLICIT_DEF
SI_SPILL_S32_SAVE killed $sgpr12, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S32_SAVE killed $sgpr12, %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32

renamable $sgpr12 = IMPLICIT_DEF		renamable $sgpr12 = IMPLICIT_DEF
Show All 21 Lines	bb.0:
SI_SPILL_S512_SAVE killed $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19_sgpr20_sgpr21_sgpr22_sgpr23_sgpr24_sgpr25_sgpr26_sgpr27, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S512_SAVE killed $sgpr12_sgpr13_sgpr14_sgpr15_sgpr16_sgpr17_sgpr18_sgpr19_sgpr20_sgpr21_sgpr22_sgpr23_sgpr24_sgpr25_sgpr26_sgpr27, %stack.6, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32

renamable $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95 = IMPLICIT_DEF		renamable $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95 = IMPLICIT_DEF
SI_SPILL_S1024_SAVE killed $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, %stack.7, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		SI_SPILL_S1024_SAVE killed $sgpr64_sgpr65_sgpr66_sgpr67_sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75_sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83_sgpr84_sgpr85_sgpr86_sgpr87_sgpr88_sgpr89_sgpr90_sgpr91_sgpr92_sgpr93_sgpr94_sgpr95, %stack.7, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32


# CHECK-LABEL: name: check_reload		# CHECK-LABEL: name: check_reload

		# FLATSCR: $sgpr33 = S_MOV_B32 0
		# FLATSCR: $flat_scr_lo = S_ADD_U32 $sgpr0, $sgpr11, implicit-def $scc
		# FLATSCR: $flat_scr_hi = S_ADDC_U32 $sgpr1, 0, implicit-def $scc, implicit $scc

# S32		# S32
# CHECK: $sgpr12 = S_MOV_B32 $exec_lo		# CHECK: $sgpr12 = S_MOV_B32 $exec_lo
# CHECK: $exec_lo = S_MOV_B32 1		# CHECK: $exec_lo = S_MOV_B32 1
# CHECK: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 4		# MUBUF: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 4
		# FLATSCR: SCRATCH_LOAD_DWORD_SADDR $sgpr33, 4
# CHECK: $exec_lo = S_MOV_B32 killed $sgpr12		# CHECK: $exec_lo = S_MOV_B32 killed $sgpr12
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE

# S64		# S64
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 3		# GCN32: $exec_lo = S_MOV_B32 3
# GCN64: $exec = S_MOV_B64 3		# GCN64: $exec = S_MOV_B64 3
# CHECK: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 8		# MUBUF: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 8
		# FLATSCR: SCRATCH_LOAD_DWORD_SADDR $sgpr33, 8
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE
# CHECK: $sgpr13 = V_READLANE		# CHECK: $sgpr13 = V_READLANE

# S96		# S96
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 7		# GCN32: $exec_lo = S_MOV_B32 7
# GCN64: $exec = S_MOV_B64 7		# GCN64: $exec = S_MOV_B64 7
# CHECK: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 16		# MUBUF: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 16
		# FLATSCR: SCRATCH_LOAD_DWORD_SADDR $sgpr33, 16
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE
# CHECK: $sgpr13 = V_READLANE		# CHECK: $sgpr13 = V_READLANE
# CHECK: $sgpr14 = V_READLANE		# CHECK: $sgpr14 = V_READLANE

# S128		# S128
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 15		# GCN32: $exec_lo = S_MOV_B32 15
# GCN64: $exec = S_MOV_B64 15		# GCN64: $exec = S_MOV_B64 15
# CHECK: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 28		# MUBUF: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 28
		# FLATSCR: SCRATCH_LOAD_DWORD_SADDR $sgpr33, 28
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE
# CHECK: $sgpr13 = V_READLANE		# CHECK: $sgpr13 = V_READLANE
# CHECK: $sgpr14 = V_READLANE		# CHECK: $sgpr14 = V_READLANE
# CHECK: $sgpr15 = V_READLANE		# CHECK: $sgpr15 = V_READLANE

# S160		# S160
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 31		# GCN32: $exec_lo = S_MOV_B32 31
# GCN64: $exec = S_MOV_B64 31		# GCN64: $exec = S_MOV_B64 31
# CHECK: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 44		# MUBUF: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 44
		# FLATSCR: SCRATCH_LOAD_DWORD_SADDR $sgpr33, 44
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE
# CHECK: $sgpr13 = V_READLANE		# CHECK: $sgpr13 = V_READLANE
# CHECK: $sgpr14 = V_READLANE		# CHECK: $sgpr14 = V_READLANE
# CHECK: $sgpr15 = V_READLANE		# CHECK: $sgpr15 = V_READLANE
# CHECK: $sgpr16 = V_READLANE		# CHECK: $sgpr16 = V_READLANE

# S256		# S256
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 255		# GCN32: $exec_lo = S_MOV_B32 255
# GCN64: $exec = S_MOV_B64 255		# GCN64: $exec = S_MOV_B64 255
# CHECK: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 64		# MUBUF: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 64
		# FLATSCR: SCRATCH_LOAD_DWORD_SADDR $sgpr33, 64
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE
# CHECK: $sgpr13 = V_READLANE		# CHECK: $sgpr13 = V_READLANE
# CHECK: $sgpr14 = V_READLANE		# CHECK: $sgpr14 = V_READLANE
# CHECK: $sgpr15 = V_READLANE		# CHECK: $sgpr15 = V_READLANE
# CHECK: $sgpr16 = V_READLANE		# CHECK: $sgpr16 = V_READLANE
# CHECK: $sgpr17 = V_READLANE		# CHECK: $sgpr17 = V_READLANE
# CHECK: $sgpr18 = V_READLANE		# CHECK: $sgpr18 = V_READLANE
# CHECK: $sgpr19 = V_READLANE		# CHECK: $sgpr19 = V_READLANE

# S512		# S512
# GCN32: $sgpr12 = S_MOV_B32 $exec_lo		# GCN32: $sgpr12 = S_MOV_B32 $exec_lo
# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec		# GCN64: $sgpr12_sgpr13 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 65535		# GCN32: $exec_lo = S_MOV_B32 65535
# GCN64: $exec = S_MOV_B64 65535		# GCN64: $exec = S_MOV_B64 65535
# CHECK: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 96		# MUBUF: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 96
		# FLATSCR: SCRATCH_LOAD_DWORD_SADDR $sgpr33, 96
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr12
# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13		# GCN64: $exec = S_MOV_B64 killed $sgpr12_sgpr13
# CHECK: $sgpr12 = V_READLANE		# CHECK: $sgpr12 = V_READLANE
# CHECK: $sgpr13 = V_READLANE		# CHECK: $sgpr13 = V_READLANE
# CHECK: $sgpr14 = V_READLANE		# CHECK: $sgpr14 = V_READLANE
# CHECK: $sgpr15 = V_READLANE		# CHECK: $sgpr15 = V_READLANE
# CHECK: $sgpr16 = V_READLANE		# CHECK: $sgpr16 = V_READLANE
# CHECK: $sgpr17 = V_READLANE		# CHECK: $sgpr17 = V_READLANE
# CHECK: $sgpr18 = V_READLANE		# CHECK: $sgpr18 = V_READLANE
# CHECK: $sgpr19 = V_READLANE		# CHECK: $sgpr19 = V_READLANE
# CHECK: $sgpr20 = V_READLANE		# CHECK: $sgpr20 = V_READLANE
# CHECK: $sgpr21 = V_READLANE		# CHECK: $sgpr21 = V_READLANE
# CHECK: $sgpr22 = V_READLANE		# CHECK: $sgpr22 = V_READLANE
# CHECK: $sgpr23 = V_READLANE		# CHECK: $sgpr23 = V_READLANE
# CHECK: $sgpr24 = V_READLANE		# CHECK: $sgpr24 = V_READLANE
# CHECK: $sgpr25 = V_READLANE		# CHECK: $sgpr25 = V_READLANE
# CHECK: $sgpr26 = V_READLANE		# CHECK: $sgpr26 = V_READLANE
# CHECK: $sgpr27 = V_READLANE		# CHECK: $sgpr27 = V_READLANE

# S1024		# S1024
# GCN32: $sgpr64 = S_MOV_B32 $exec_lo		# GCN32: $sgpr64 = S_MOV_B32 $exec_lo
# GCN64: $sgpr64_sgpr65 = S_MOV_B64 $exec		# GCN64: $sgpr64_sgpr65 = S_MOV_B64 $exec
# GCN32: $exec_lo = S_MOV_B32 4294967295		# GCN32: $exec_lo = S_MOV_B32 4294967295
# GCN64: $exec = S_MOV_B64 4294967295		# GCN64: $exec = S_MOV_B64 4294967295
# CHECK: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 160		# MUBUF: BUFFER_LOAD_DWORD_OFFSET ${{(sgpr[0-9_]+)*}}, $sgpr33, 160
		# FLATSCR: SCRATCH_LOAD_DWORD_SADDR $sgpr33, 160
# GCN32: $exec_lo = S_MOV_B32 killed $sgpr64		# GCN32: $exec_lo = S_MOV_B32 killed $sgpr64
# GCN64: $exec = S_MOV_B64 killed $sgpr64_sgpr65		# GCN64: $exec = S_MOV_B64 killed $sgpr64_sgpr65
# CHECK: $sgpr64 = V_READLANE		# CHECK: $sgpr64 = V_READLANE
# CHECK: $sgpr65 = V_READLANE		# CHECK: $sgpr65 = V_READLANE
# CHECK: $sgpr66 = V_READLANE		# CHECK: $sgpr66 = V_READLANE
# CHECK: $sgpr67 = V_READLANE		# CHECK: $sgpr67 = V_READLANE
# CHECK: $sgpr68 = V_READLANE		# CHECK: $sgpr68 = V_READLANE
# CHECK: $sgpr69 = V_READLANE		# CHECK: $sgpr69 = V_READLANE
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	machineFunctionInfo:
explicitKernArgSize: 660		explicitKernArgSize: 660
maxKernArgAlign: 4		maxKernArgAlign: 4
isEntryFunction: true		isEntryFunction: true
waveLimiter: true		waveLimiter: true
scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'		scratchRSrcReg: '$sgpr96_sgpr97_sgpr98_sgpr99'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
frameOffsetReg: '$sgpr33'		frameOffsetReg: '$sgpr33'
argumentInfo:		argumentInfo:
privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }		flatScratchInit: { reg: '$sgpr0_sgpr1' }
dispatchPtr: { reg: '$sgpr4_sgpr5' }		dispatchPtr: { reg: '$sgpr2_sgpr3' }
kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }		privateSegmentBuffer: { reg: '$sgpr4_sgpr5_sgpr6_sgpr7' }
workGroupIDX: { reg: '$sgpr8' }		kernargSegmentPtr: { reg: '$sgpr8_sgpr9' }
privateSegmentWaveByteOffset: { reg: '$sgpr9' }		workGroupIDX: { reg: '$sgpr10' }
		privateSegmentWaveByteOffset: { reg: '$sgpr11' }
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7		liveins: $sgpr8, $sgpr4_sgpr5, $sgpr6_sgpr7

renamable $sgpr12 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr12 = SI_SPILL_S32_RESTORE %stack.0, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32

renamable $sgpr12_sgpr13 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32		renamable $sgpr12_sgpr13 = SI_SPILL_S64_RESTORE %stack.1, implicit $exec, implicit $sgpr96_sgpr97_sgpr98_sgpr99, implicit $sgpr32

Show All 11 Lines

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

	; RUN: llc -march=amdgcn -mcpu=verde -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 < %s \| FileCheck -check-prefixes=CHECK,GFX6 %s			; RUN: llc -march=amdgcn -mcpu=verde -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 < %s \| FileCheck -check-prefixes=CHECK,GFX6 %s
	; RUN: llc -regalloc=basic -march=amdgcn -mcpu=tonga -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 < %s \| FileCheck -check-prefixes=CHECK,GFX7 %s			; RUN: llc -regalloc=basic -march=amdgcn -mcpu=tonga -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 < %s \| FileCheck -check-prefixes=CHECK,GFX7 %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=CHECK,GFX9-FLATSCR,FLATSCR %s
				; RUN: llc -march=amdgcn -mcpu=gfx1030 -enable-misched=0 -post-RA-scheduler=0 -amdgpu-spill-sgpr-to-vgpr=0 -amdgpu-enable-flat-scratch < %s \| FileCheck -check-prefixes=CHECK,GFX10-FLATSCR,FLATSCR %s
	;			;
	; There is something about Tonga that causes this test to spend a lot of time			; There is something about Tonga that causes this test to spend a lot of time
	; in the default register allocator.			; in the default register allocator.


	; When the offset of VGPR spills into scratch space gets too large, an additional SGPR			; When the offset of VGPR spills into scratch space gets too large, an additional SGPR
	; is used to calculate the scratch load/store address. Make sure that this			; is used to calculate the scratch load/store address. Make sure that this
	; mechanism works even when many spills happen.			; mechanism works even when many spills happen.

	; Just test that it compiles successfully.			; Just test that it compiles successfully.
	; CHECK-LABEL: test			; CHECK-LABEL: test

				; GFX9-FLATSCR: s_mov_b32 [[SOFF1:s[0-9]+]], 4{{$}}
				; GFX9-FLATSCR-DAG: scratch_store_dword off, v{{[0-9]+}}, [[SOFF1]] ; 4-byte Folded Spill
				; GFX9-FLATSCR-DAG: scratch_store_dword off, v{{[0-9]+}}, [[SOFF1]] offset:{{[0-9]+}} ; 4-byte Folded Spill
				; GFX9-FLATSCR: s_movk_i32 [[SOFF2:s[0-9]+]], 0x{{[0-9a-f]+}}{{$}}
				; GFX9-FLATSCR-DAG: scratch_load_dword v{{[0-9]+}}, off, [[SOFF2]] ; 4-byte Folded Reload
				; GFX9-FLATSCR-DAG: scratch_load_dword v{{[0-9]+}}, off, [[SOFF2]] offset:{{[0-9]+}} ; 4-byte Folded Reload

				; GFX10-FLATSCR: scratch_store_dword off, v{{[0-9]+}}, off offset:{{[0-9]+}} ; 4-byte Folded Spill
				; GFX10-FLATSCR: scratch_load_dword v{{[0-9]+}}, off, off offset:{{[0-9]+}} ; 4-byte Folded Reload
	define amdgpu_kernel void @test(<1280 x i32> addrspace(1)* %out, <1280 x i32> addrspace(1)* %in) {			define amdgpu_kernel void @test(<1280 x i32> addrspace(1)* %out, <1280 x i32> addrspace(1)* %in) {
	entry:			entry:
	%lo = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)			%lo = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)
	%tid = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %lo)			%tid = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %lo)

	%aptr = getelementptr <1280 x i32>, <1280 x i32> addrspace(1)* %in, i32 %tid			%aptr = getelementptr <1280 x i32>, <1280 x i32> addrspace(1)* %in, i32 %tid
	%a = load <1280 x i32>, <1280 x i32> addrspace(1)* %aptr			%a = load <1280 x i32>, <1280 x i32> addrspace(1)* %aptr

	; mark most VGPR registers as used to increase register pressure			; mark most VGPR registers as used to increase register pressure
	call void asm sideeffect "", "~{v4},~{v8},~{v12},~{v16},~{v20},~{v24},~{v28},~{v32}" ()			call void asm sideeffect "", "~{v4},~{v8},~{v12},~{v16},~{v20},~{v24},~{v28},~{v32}" ()
	call void asm sideeffect "", "~{v36},~{v40},~{v44},~{v48},~{v52},~{v56},~{v60},~{v64}" ()			call void asm sideeffect "", "~{v36},~{v40},~{v44},~{v48},~{v52},~{v56},~{v60},~{v64}" ()
	call void asm sideeffect "", "~{v68},~{v72},~{v76},~{v80},~{v84},~{v88},~{v92},~{v96}" ()			call void asm sideeffect "", "~{v68},~{v72},~{v76},~{v80},~{v84},~{v88},~{v92},~{v96}" ()
	call void asm sideeffect "", "~{v100},~{v104},~{v108},~{v112},~{v116},~{v120},~{v124},~{v128}" ()			call void asm sideeffect "", "~{v100},~{v104},~{v108},~{v112},~{v116},~{v120},~{v124},~{v128}" ()
	call void asm sideeffect "", "~{v132},~{v136},~{v140},~{v144},~{v148},~{v152},~{v156},~{v160}" ()			call void asm sideeffect "", "~{v132},~{v136},~{v140},~{v144},~{v148},~{v152},~{v156},~{v160}" ()
	call void asm sideeffect "", "~{v164},~{v168},~{v172},~{v176},~{v180},~{v184},~{v188},~{v192}" ()			call void asm sideeffect "", "~{v164},~{v168},~{v172},~{v176},~{v180},~{v184},~{v188},~{v192}" ()
	call void asm sideeffect "", "~{v196},~{v200},~{v204},~{v208},~{v212},~{v216},~{v220},~{v224}" ()			call void asm sideeffect "", "~{v196},~{v200},~{v204},~{v208},~{v212},~{v216},~{v220},~{v224}" ()

	%outptr = getelementptr <1280 x i32>, <1280 x i32> addrspace(1)* %out, i32 %tid			%outptr = getelementptr <1280 x i32>, <1280 x i32> addrspace(1)* %out, i32 %tid
	store <1280 x i32> %a, <1280 x i32> addrspace(1)* %outptr			store <1280 x i32> %a, <1280 x i32> addrspace(1)* %outptr

	ret void			ret void
	}			}

	; CHECK-LABEL: test_limited_sgpr			; CHECK-LABEL: test_limited_sgpr
	; GFX6: s_add_u32 s32, s32, 0x[[OFFSET:[0-9]+]]			; GFX6: s_add_u32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]
	; GFX6-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9:]+}}], s32			; GFX6-NEXT: buffer_load_dword v{{[0-9]+}}, off, s[{{[0-9:]+}}], s32
	; GFX6-NEXT: s_sub_u32 s32, s32, 0x[[OFFSET:[0-9]+]]			; GFX6-NEXT: s_sub_u32 s32, s32, 0x[[OFFSET:[0-9a-f]+]]
	; GFX6: NumSgprs: 48			; GFX6: NumSgprs: 48
	; GFX6: ScratchSize: 8608			; GFX6: ScratchSize: 8608

				; FLATSCR: s_movk_i32 [[SOFF1:s[0-9]+]], 0x
				; GFX9-FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: scratch_store_dword off, v{{[0-9]+}}, [[SOFF1]] ; 4-byte Folded Spill
				; FLATSCR: s_movk_i32 [[SOFF2:s[0-9]+]], 0x
				; FLATSCR: scratch_load_dword v{{[0-9]+}}, off, [[SOFF2]] ; 4-byte Folded Reload
	define amdgpu_kernel void @test_limited_sgpr(<64 x i32> addrspace(1)* %out, <64 x i32> addrspace(1)* %in) #0 {			define amdgpu_kernel void @test_limited_sgpr(<64 x i32> addrspace(1)* %out, <64 x i32> addrspace(1)* %in) #0 {
	entry:			entry:
	%lo = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)			%lo = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0)
	%tid = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %lo)			%tid = call i32 @llvm.amdgcn.mbcnt.hi(i32 -1, i32 %lo)

	; allocate enough scratch to go beyond 2^12 addressing			; allocate enough scratch to go beyond 2^12 addressing
	%scratch = alloca <1280 x i32>, align 8, addrspace(5)			%scratch = alloca <1280 x i32>, align 8, addrspace(5)

	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefix=GCN %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefix=MUBUF %s
				; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -amdgpu-enable-flat-scratch -verify-machineinstrs \| FileCheck -check-prefix=FLATSCR %s

	; FIXME: The MUBUF loads in this test output are incorrect, their SOffset			; FIXME: The MUBUF loads in this test output are incorrect, their SOffset
	; should use the frame offset register, not the ABI stack pointer register. We			; should use the frame offset register, not the ABI stack pointer register. We
	; rely on the frame index argument of MUBUF stack accesses to survive until PEI			; rely on the frame index argument of MUBUF stack accesses to survive until PEI
	; so we can fix up the SOffset to use the correct frame register in			; so we can fix up the SOffset to use the correct frame register in
	; eliminateFrameIndex. Some things like LocalStackSlotAllocation can lift the			; eliminateFrameIndex. Some things like LocalStackSlotAllocation can lift the
	; frame index up into something (e.g. `v_add_nc_u32`) that we cannot fold back			; frame index up into something (e.g. `v_add_nc_u32`) that we cannot fold back
	; into the MUBUF instruction, and so we end up emitting an incorrect offset.			; into the MUBUF instruction, and so we end up emitting an incorrect offset.
	; Fixing this may involve adding stack access pseudos so that we don't have to			; Fixing this may involve adding stack access pseudos so that we don't have to
	; speculatively refer to the ABI stack pointer register at all.			; speculatively refer to the ABI stack pointer register at all.

	; An assert was hit when frame offset register was used to address FrameIndex.			; An assert was hit when frame offset register was used to address FrameIndex.
	define amdgpu_kernel void @kernel_background_evaluate(float addrspace(5)* %kg, <4 x i32> addrspace(1)* %input, <4 x float> addrspace(1)* %output, i32 %i) {			define amdgpu_kernel void @kernel_background_evaluate(float addrspace(5)* %kg, <4 x i32> addrspace(1)* %input, <4 x float> addrspace(1)* %output, i32 %i) {
	; GCN-LABEL: kernel_background_evaluate:			; MUBUF-LABEL: kernel_background_evaluate:
	; GCN: ; %bb.0: ; %entry			; MUBUF: ; %bb.0: ; %entry
	; GCN-NEXT: s_load_dword s0, s[0:1], 0x24			; MUBUF-NEXT: s_load_dword s0, s[0:1], 0x24
	; GCN-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0			; MUBUF-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0
	; GCN-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1			; MUBUF-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1
	; GCN-NEXT: s_mov_b32 s38, -1			; MUBUF-NEXT: s_mov_b32 s38, -1
	; GCN-NEXT: s_mov_b32 s39, 0x31c16000			; MUBUF-NEXT: s_mov_b32 s39, 0x31c16000
	; GCN-NEXT: s_add_u32 s36, s36, s3			; MUBUF-NEXT: s_add_u32 s36, s36, s3
	; GCN-NEXT: s_addc_u32 s37, s37, 0			; MUBUF-NEXT: s_addc_u32 s37, s37, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0x2000			; MUBUF-NEXT: v_mov_b32_e32 v1, 0x2000
	; GCN-NEXT: v_mov_b32_e32 v2, 0x4000			; MUBUF-NEXT: v_mov_b32_e32 v2, 0x4000
	; GCN-NEXT: v_mov_b32_e32 v3, 0			; MUBUF-NEXT: v_mov_b32_e32 v3, 0
	; GCN-NEXT: v_mov_b32_e32 v4, 0x400000			; MUBUF-NEXT: v_mov_b32_e32 v4, 0x400000
	; GCN-NEXT: s_mov_b32 s32, 0xc0000			; MUBUF-NEXT: s_mov_b32 s32, 0xc0000
	; GCN-NEXT: v_add_nc_u32_e64 v40, 4, 0x4000			; MUBUF-NEXT: v_add_nc_u32_e64 v40, 4, 0x4000
	; GCN-NEXT: ; implicit-def: $vcc_hi			; MUBUF-NEXT: ; implicit-def: $vcc_hi
	; GCN-NEXT: s_getpc_b64 s[4:5]			; MUBUF-NEXT: s_getpc_b64 s[4:5]
	; GCN-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4			; MUBUF-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+12			; MUBUF-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+12
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; MUBUF-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_mov_b32_e32 v0, s0			; MUBUF-NEXT: v_mov_b32_e32 v0, s0
	; GCN-NEXT: s_mov_b64 s[0:1], s[36:37]			; MUBUF-NEXT: s_mov_b64 s[0:1], s[36:37]
	; GCN-NEXT: s_mov_b64 s[2:3], s[38:39]			; MUBUF-NEXT: s_mov_b64 s[2:3], s[38:39]
	; GCN-NEXT: s_swappc_b64 s[30:31], s[4:5]			; MUBUF-NEXT: s_swappc_b64 s[30:31], s[4:5]
	; GCN-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0			; MUBUF-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
	; GCN-NEXT: s_and_saveexec_b32 s0, vcc_lo			; MUBUF-NEXT: s_and_saveexec_b32 s0, vcc_lo
	; GCN-NEXT: s_cbranch_execz BB0_2			; MUBUF-NEXT: s_cbranch_execz BB0_2
	; GCN-NEXT: ; %bb.1: ; %if.then4.i			; MUBUF-NEXT: ; %bb.1: ; %if.then4.i
	; GCN-NEXT: s_clause 0x1			; MUBUF-NEXT: s_clause 0x1
	; GCN-NEXT: buffer_load_dword v0, v40, s[36:39], 0 offen			; MUBUF-NEXT: buffer_load_dword v0, v40, s[36:39], 0 offen
	; GCN-NEXT: buffer_load_dword v1, v40, s[36:39], 0 offen offset:4			; MUBUF-NEXT: buffer_load_dword v1, v40, s[36:39], 0 offen offset:4
	; GCN-NEXT: s_waitcnt vmcnt(0)			; MUBUF-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: v_add_nc_u32_e32 v0, v1, v0			; MUBUF-NEXT: v_add_nc_u32_e32 v0, v1, v0
	; GCN-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0			; MUBUF-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0
	; GCN-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0			; MUBUF-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0
	; GCN-NEXT: buffer_store_dword v0, v0, s[36:39], 0 offen			; MUBUF-NEXT: buffer_store_dword v0, v0, s[36:39], 0 offen
	; GCN-NEXT: BB0_2: ; %shader_eval_surface.exit			; MUBUF-NEXT: BB0_2: ; %shader_eval_surface.exit
	; GCN-NEXT: s_endpgm			; MUBUF-NEXT: s_endpgm
				;
				; FLATSCR-LABEL: kernel_background_evaluate:
				; FLATSCR: ; %bb.0: ; %entry
				; FLATSCR-NEXT: s_add_u32 s2, s2, s5
				; FLATSCR-NEXT: s_movk_i32 s32, 0x6000
				; FLATSCR-NEXT: s_addc_u32 s3, s3, 0
				; FLATSCR-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s2
				; FLATSCR-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s3
				; FLATSCR-NEXT: s_load_dword s0, s[0:1], 0x24
				; FLATSCR-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0
				; FLATSCR-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1
				; FLATSCR-NEXT: s_mov_b32 s38, -1
				; FLATSCR-NEXT: s_mov_b32 s39, 0x31c16000
				; FLATSCR-NEXT: s_add_u32 s36, s36, s5
				; FLATSCR-NEXT: s_addc_u32 s37, s37, 0
				; FLATSCR-NEXT: v_mov_b32_e32 v1, 0x2000
				; FLATSCR-NEXT: v_mov_b32_e32 v2, 0x4000
				; FLATSCR-NEXT: v_mov_b32_e32 v3, 0
				; FLATSCR-NEXT: v_mov_b32_e32 v4, 0x400000
				; FLATSCR-NEXT: ; implicit-def: $vcc_hi
				; FLATSCR-NEXT: s_getpc_b64 s[4:5]
				; FLATSCR-NEXT: s_add_u32 s4, s4, svm_eval_nodes@rel32@lo+4
				; FLATSCR-NEXT: s_addc_u32 s5, s5, svm_eval_nodes@rel32@hi+12
				; FLATSCR-NEXT: s_waitcnt lgkmcnt(0)
				; FLATSCR-NEXT: v_mov_b32_e32 v0, s0
				; FLATSCR-NEXT: s_mov_b64 s[0:1], s[36:37]
				; FLATSCR-NEXT: s_mov_b64 s[2:3], s[38:39]
				; FLATSCR-NEXT: s_swappc_b64 s[30:31], s[4:5]
				; FLATSCR-NEXT: v_cmp_ne_u32_e32 vcc_lo, 0, v0
				; FLATSCR-NEXT: s_and_saveexec_b32 s0, vcc_lo
				; FLATSCR-NEXT: s_cbranch_execz BB0_2
				; FLATSCR-NEXT: ; %bb.1: ; %if.then4.i
				; FLATSCR-NEXT: s_movk_i32 vcc_lo, 0x4000
				; FLATSCR-NEXT: s_nop 1
				; FLATSCR-NEXT: scratch_load_dword v0, off, vcc_lo offset:4
				; FLATSCR-NEXT: s_waitcnt_depctr 0xffe3
				; FLATSCR-NEXT: s_movk_i32 vcc_lo, 0x4000
				; FLATSCR-NEXT: scratch_load_dword v1, off, vcc_lo offset:8
				; FLATSCR-NEXT: s_waitcnt vmcnt(0)
				; FLATSCR-NEXT: v_add_nc_u32_e32 v0, v1, v0
				; FLATSCR-NEXT: v_mul_lo_u32 v0, 0x41c64e6d, v0
				; FLATSCR-NEXT: v_add_nc_u32_e32 v0, 0x3039, v0
				; FLATSCR-NEXT: scratch_store_dword off, v0, s0
				; FLATSCR-NEXT: BB0_2: ; %shader_eval_surface.exit
				; FLATSCR-NEXT: s_endpgm
	entry:			entry:
	%sd = alloca < 1339 x i32>, align 8192, addrspace(5)			%sd = alloca < 1339 x i32>, align 8192, addrspace(5)
	%state = alloca <4 x i32>, align 16, addrspace(5)			%state = alloca <4 x i32>, align 16, addrspace(5)
	%rslt = call i32 @svm_eval_nodes(float addrspace(5)* %kg, <1339 x i32> addrspace(5)* %sd, <4 x i32> addrspace(5)* %state, i32 0, i32 4194304)			%rslt = call i32 @svm_eval_nodes(float addrspace(5)* %kg, <1339 x i32> addrspace(5)* %sd, <4 x i32> addrspace(5)* %state, i32 0, i32 4194304)
	%cmp = icmp eq i32 %rslt, 0			%cmp = icmp eq i32 %rslt, 0
	br i1 %cmp, label %shader_eval_surface.exit, label %if.then4.i			br i1 %cmp, label %shader_eval_surface.exit, label %if.then4.i

	if.then4.i: ; preds = %entry			if.then4.i: ; preds = %entry
	Show All 16 Lines

llvm/test/CodeGen/AMDGPU/store-hi16.ll

; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX900,GFX9 %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX900,GFX9,GFX900-MUBUF %s
; RUN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca,+sram-ecc -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX906,GFX9,NO-D16-HI %s		; RUN: llc -march=amdgcn -mcpu=gfx906 -amdgpu-sroa=0 -mattr=-promote-alloca,+sram-ecc -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX906,GFX9,NO-D16-HI %s
; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX803,NO-D16-HI %s		; RUN: llc -march=amdgcn -mcpu=fiji -amdgpu-sroa=0 -mattr=-promote-alloca -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX803,NO-D16-HI %s
		; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-sroa=0 -mattr=-promote-alloca -amdgpu-enable-flat-scratch -verify-machineinstrs < %s \| FileCheck -allow-deprecated-dag-overlap -check-prefixes=GCN,GFX900,GFX9,GFX900-FLATSCR %s

; GCN-LABEL: {{^}}store_global_hi_v2i16:		; GCN-LABEL: {{^}}store_global_hi_v2i16:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: global_store_short_d16_hi v[0:1], v2, off		; GFX900-NEXT: global_store_short_d16_hi v[0:1], v2, off

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v2, 16, v2		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v2, 16, v2
; GFX803-NEXT: flat_store_short v[0:1], v2		; GFX803-NEXT: flat_store_short v[0:1], v2
▲ Show 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	entry:
%gep = getelementptr inbounds i8, i8* %out, i64 -4095		%gep = getelementptr inbounds i8, i8* %out, i64 -4095
store i8 %trunc, i8* %gep		store i8 %trunc, i8* %gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16:		; GCN-LABEL: {{^}}store_private_hi_v2i16:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}		; GFX900-MUBUF-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}
		; GFX900-FLATSCR-NEXT: scratch_store_short_d16_hi v0, v1, off

; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}		; NO-D16-HI: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16(i16 addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2i16(i16 addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
store i16 %hi, i16 addrspace(5)* %out		store i16 %hi, i16 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2f16:		; GCN-LABEL: {{^}}store_private_hi_v2f16:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}		; GFX900-MUBUF-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}
		; GFX900-FLATSCR-NEXT: scratch_store_short_d16_hi v0, v1, off{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}		; NO-D16-HI: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2f16(half addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2f16(half addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x half>		%value = bitcast i32 %arg to <2 x half>
%hi = extractelement <2 x half> %value, i32 1		%hi = extractelement <2 x half> %value, i32 1
store half %hi, half addrspace(5)* %out		store half %hi, half addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_i32_shift:		; GCN-LABEL: {{^}}store_private_hi_i32_shift:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}		; GFX900-MUBUF-NEXT: buffer_store_short_d16_hi v1, v0, s[0:3], 0 offen{{$}}
		; GFX900-FLATSCR-NEXT: scratch_store_short_d16_hi v0, v1, off{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_short v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_i32_shift(i16 addrspace(5)* %out, i32 %value) #0 {		define void @store_private_hi_i32_shift(i16 addrspace(5)* %out, i32 %value) #0 {
entry:		entry:
%hi32 = lshr i32 %value, 16		%hi32 = lshr i32 %value, 16
%hi = trunc i32 %hi32 to i16		%hi = trunc i32 %hi32 to i16
store i16 %hi, i16 addrspace(5)* %out		store i16 %hi, i16 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16_i8:		; GCN-LABEL: {{^}}store_private_hi_v2i16_i8:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], 0 offen{{$}}		; GFX900-MUBUF-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], 0 offen{{$}}
		; GFX900-FLATSCR-NEXT: scratch_store_byte_d16_hi v0, v1, off{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], 0 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_i8(i8 addrspace(5)* %out, i32 %arg) #0 {		define void @store_private_hi_v2i16_i8(i8 addrspace(5)* %out, i32 %arg) #0 {
entry:		entry:
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%trunc = trunc i16 %hi to i8		%trunc = trunc i16 %hi to i8
store i8 %trunc, i8 addrspace(5)* %out		store i8 %trunc, i8 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_i8_shift:		; GCN-LABEL: {{^}}store_private_hi_i8_shift:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], 0 offen{{$}}		; GFX900-MUBUF-NEXT: buffer_store_byte_d16_hi v1, v0, s[0:3], 0 offen{{$}}
		; GFX900-FLATSCR-NEXT: scratch_store_byte_d16_hi v0, v1, off{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v1, 16, v1
; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], 0 offen{{$}}		; NO-D16-HI-NEXT: buffer_store_byte v1, v0, s[0:3], 0 offen{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_i8_shift(i8 addrspace(5)* %out, i32 %value) #0 {		define void @store_private_hi_i8_shift(i8 addrspace(5)* %out, i32 %value) #0 {
entry:		entry:
%hi32 = lshr i32 %value, 16		%hi32 = lshr i32 %value, 16
%hi = trunc i32 %hi32 to i8		%hi = trunc i32 %hi32 to i8
store i8 %hi, i8 addrspace(5)* %out		store i8 %hi, i8 addrspace(5)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16_max_offset:		; GCN-LABEL: {{^}}store_private_hi_v2i16_max_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_store_short_d16_hi v0, off, s[0:3], s32 offset:4094{{$}}		; GFX900-MUBUF: buffer_store_short_d16_hi v0, off, s[0:3], s32 offset:4094{{$}}
		; GFX900-FLATSCR: scratch_store_short_d16_hi off, v0, s32 offset:4094{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v0, 16, v0		; NO-D16-HI: v_lshrrev_b32_e32 v0, 16, v0
; NO-D16-HI-NEXT: buffer_store_short v0, off, s[0:3], s32 offset:4094{{$}}		; NO-D16-HI-NEXT: buffer_store_short v0, off, s[0:3], s32 offset:4094{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_max_offset(i16 addrspace(5)* byval %out, i32 %arg) #0 {		define void @store_private_hi_v2i16_max_offset(i16 addrspace(5)* byval %out, i32 %arg) #0 {
entry:		entry:
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%gep = getelementptr inbounds i16, i16 addrspace(5)* %out, i64 2047		%gep = getelementptr inbounds i16, i16 addrspace(5)* %out, i64 2047
store i16 %hi, i16 addrspace(5)* %gep		store i16 %hi, i16 addrspace(5)* %gep
ret void		ret void
}		}



; GCN-LABEL: {{^}}store_private_hi_v2i16_nooff:		; GCN-LABEL: {{^}}store_private_hi_v2i16_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_short_d16_hi v0, off, s[0:3], 0{{$}}		; GFX900-MUBUF-NEXT: buffer_store_short_d16_hi v0, off, s[0:3], 0{{$}}
		; GFX900-FLATSCR-NEXT: s_mov_b32 [[SOFF:s[0-9]+]], 0
		; GFX900-FLATSCR-NEXT: scratch_store_short_d16_hi off, v0, [[SOFF]]{{$}}

; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v0, 16, v0		; NO-D16-HI-NEXT: v_lshrrev_b32_e32 v0, 16, v0
; NO-D16-HI-NEXT: buffer_store_short v0, off, s[0:3], 0{{$}}		; NO-D16-HI-NEXT: buffer_store_short v0, off, s[0:3], 0{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_nooff(i32 %arg) #0 {		define void @store_private_hi_v2i16_nooff(i32 %arg) #0 {
entry:		entry:
; FIXME: ABI for pre-gfx9		; FIXME: ABI for pre-gfx9
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
store volatile i16 %hi, i16 addrspace(5)* null		store volatile i16 %hi, i16 addrspace(5)* null
ret void		ret void
}		}


; GCN-LABEL: {{^}}store_private_hi_v2i16_i8_nooff:		; GCN-LABEL: {{^}}store_private_hi_v2i16_i8_nooff:
; GCN: s_waitcnt		; GCN: s_waitcnt

; GFX900-NEXT: buffer_store_byte_d16_hi v0, off, s[0:3], 0{{$}}		; GFX900-MUBUF-NEXT: buffer_store_byte_d16_hi v0, off, s[0:3], 0{{$}}
		; GFX900-FLATSCR-NEXT: s_mov_b32 [[SOFF:s[0-9]+]], 0
		; GFX900-FLATSCR-NEXT: scratch_store_byte_d16_hi off, v0, [[SOFF]]{{$}}

; NO-D16-HI: v_lshrrev_b32_e32 v0, 16, v0		; NO-D16-HI: v_lshrrev_b32_e32 v0, 16, v0
; NO-D16-HI: buffer_store_byte v0, off, s[0:3], 0{{$}}		; NO-D16-HI: buffer_store_byte v0, off, s[0:3], 0{{$}}

; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @store_private_hi_v2i16_i8_nooff(i32 %arg) #0 {		define void @store_private_hi_v2i16_i8_nooff(i32 %arg) #0 {
entry:		entry:
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	entry:
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%gep = getelementptr inbounds i16, i16 addrspace(3)* %out, i64 32767		%gep = getelementptr inbounds i16, i16 addrspace(3)* %out, i64 32767
store i16 %hi, i16 addrspace(3)* %gep		store i16 %hi, i16 addrspace(3)* %gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16_to_offset:		; GCN-LABEL: {{^}}store_private_hi_v2i16_to_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_store_dword		; GFX900-MUBUF: buffer_store_dword
; GFX900-NEXT: buffer_store_short_d16_hi v0, off, s[0:3], s32 offset:4094		; GFX900-MUBUF-NEXT: buffer_store_short_d16_hi v0, off, s[0:3], s32 offset:4094
		; GFX900-FLATSCR: scratch_store_dword
		; GFX900-FLATSCR-NEXT: scratch_store_short_d16_hi off, v0, s32 offset:4094
define void @store_private_hi_v2i16_to_offset(i32 %arg) #0 {		define void @store_private_hi_v2i16_to_offset(i32 %arg) #0 {
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i16], align 2, addrspace(5)		%obj1 = alloca [4096 x i16], align 2, addrspace(5)
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%gep = getelementptr inbounds [4096 x i16], [4096 x i16] addrspace(5)* %obj1, i32 0, i32 2027		%gep = getelementptr inbounds [4096 x i16], [4096 x i16] addrspace(5)* %obj1, i32 0, i32 2027
store i16 %hi, i16 addrspace(5)* %gep		store i16 %hi, i16 addrspace(5)* %gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}store_private_hi_v2i16_i8_to_offset:		; GCN-LABEL: {{^}}store_private_hi_v2i16_i8_to_offset:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GFX900: buffer_store_dword		; GFX900-MUBUF: buffer_store_dword
; GFX900-NEXT: buffer_store_byte_d16_hi v0, off, s[0:3], s32 offset:4095		; GFX900-MUBUF-NEXT: buffer_store_byte_d16_hi v0, off, s[0:3], s32 offset:4095
		; GFX900-FLATSCR: scratch_store_dword
		; GFX900-FLATSCR-NEXT: scratch_store_byte_d16_hi off, v0, s32 offset:4095
define void @store_private_hi_v2i16_i8_to_offset(i32 %arg) #0 {		define void @store_private_hi_v2i16_i8_to_offset(i32 %arg) #0 {
entry:		entry:
%obj0 = alloca [10 x i32], align 4, addrspace(5)		%obj0 = alloca [10 x i32], align 4, addrspace(5)
%obj1 = alloca [4096 x i8], align 2, addrspace(5)		%obj1 = alloca [4096 x i8], align 2, addrspace(5)
%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*		%bc = bitcast [10 x i32] addrspace(5)* %obj0 to i32 addrspace(5)*
store volatile i32 123, i32 addrspace(5)* %bc		store volatile i32 123, i32 addrspace(5)* %bc
%value = bitcast i32 %arg to <2 x i16>		%value = bitcast i32 %arg to <2 x i16>
%hi = extractelement <2 x i16> %value, i32 1		%hi = extractelement <2 x i16> %value, i32 1
%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055		%gep = getelementptr inbounds [4096 x i8], [4096 x i8] addrspace(5)* %obj1, i32 0, i32 4055
%trunc = trunc i16 %hi to i8		%trunc = trunc i16 %hi to i8
store i8 %trunc, i8 addrspace(5)* %gep		store i8 %trunc, i8 addrspace(5)* %gep
ret void		ret void
}		}

attributes #0 = { nounwind }		attributes #0 = { nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use flat scratch instructions where availableClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 300797

llvm/lib/Target/AMDGPU/AMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

llvm/lib/Target/AMDGPU/BUFInstructions.td

llvm/lib/Target/AMDGPU/FLATInstructions.td

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/AMDGPU/SIInstrInfo.h

llvm/lib/Target/AMDGPU/SIInstrInfo.td

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.h

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/call-preserved-registers.ll

llvm/test/CodeGen/AMDGPU/callee-frame-setup.ll

llvm/test/CodeGen/AMDGPU/chain-hi-to-lo.ll

llvm/test/CodeGen/AMDGPU/fast-unaligned-load-store.private.ll

llvm/test/CodeGen/AMDGPU/flat-scratch.ll

llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll

llvm/test/CodeGen/AMDGPU/load-hi16.ll

llvm/test/CodeGen/AMDGPU/load-lo16.ll

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

llvm/test/CodeGen/AMDGPU/memcpy-fixed-align.ll

llvm/test/CodeGen/AMDGPU/multi-dword-vgpr-spill.ll

llvm/test/CodeGen/AMDGPU/non-entry-alloca.ll

llvm/test/CodeGen/AMDGPU/pei-scavenge-sgpr-gfx9.mir

llvm/test/CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir

llvm/test/CodeGen/AMDGPU/scratch-simple.ll

llvm/test/CodeGen/AMDGPU/sgpr-spill.mir

llvm/test/CodeGen/AMDGPU/spill-scavenge-offset.ll

llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll

llvm/test/CodeGen/AMDGPU/store-hi16.ll

[AMDGPU] Use flat scratch instructions where available
ClosedPublic